History log of /openbsd-current/sys/uvm/uvm_page.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.177 01-May-2024 mpi

Add per-CPU caches to the pmemrange allocator.

The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to get a
page from the current CPU's cache. So on some architectures the caches are
also used by the pmap layer.

Each cache is composed of two magazines, design is borrowed from jeff bonwick
vmem's paper and the implementation is similar to the one of pool_cache from
dlg@. However there is no depot layer and magazines are refilled directly by
the pmemrange allocator.

This version includes splvm()/splx() dances because the buffer cache flips
buffers in interrupt context. So we have to prevent recursive accesses to
per-CPU magazines.

Tested by naddy@, solene@, krw@, robert@, claudio@ and Laurence Tratt.

ok claudio@, kettenis@


# 1.176 19-Apr-2024 mpi

Revert per-CPU caches a double-free has been found by naddy@.


# 1.175 17-Apr-2024 mpi

Add per-CPU caches to the pmemrange allocator.

The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to get a
page from the current CPU's cache. So on some architectures the caches are
also used by the pmap layer.

Each cache is composed of two magazines, design is borrowed from jeff bonwick
vmem's paper and the implementation is similar to the one of pool_cache from
dlg@. However there is no depot layer and magazines are refilled directly by
the pmemrange allocator.

Tested by robert@, claudio@ and Laurence Tratt.

ok kettenis@


Revision tags: OPENBSD_7_5_BASE
# 1.174 13-Feb-2024 miod

Remove sanity checks from uvm_pagefree(). The first thing this function does
is invoke uvm_pageclean(), which performs the exact same sanity check, so
one set of checks is enough.

ok mpi@


Revision tags: OPENBSD_7_4_BASE
# 1.173 12-Aug-2023 mpi

Add sanity checks in uvm_pagelookup().

ok kettenis@


# 1.172 13-May-2023 mpi

Put back in the simplification of the aiodone daemon.

Previous "breakage" of the swap on arm64 has been found to be an issue
on one machine the rockpro/arm64 related to a deadlock built into the
sdmmc(4) stack interacting with swapping code both running under
KERNEL_LOCK().

This issue is easily reproducible on -current and entering swap when
building LLVM on a rockpro crashes the machine by memory corruption.

Tested by mlarkin@ on octeon & i386, by myself on amd64 & arm64 and by
sthen@ on i386 port bulk.

ok beck@ some time ago.

Previous commit message:

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.171 11-Apr-2023 jsg

fix double words in comments
feedback and ok jmc@ miod, ok millert@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.170 29-Aug-2022 jsg

static inline, not inline static

c99 6.11.5:
"The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature."

ok guenther@


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.176 19-Apr-2024 mpi

Revert per-CPU caches a double-free has been found by naddy@.


# 1.175 17-Apr-2024 mpi

Add per-CPU caches to the pmemrange allocator.

The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to get a
page from the current CPU's cache. So on some architectures the caches are
also used by the pmap layer.

Each cache is composed of two magazines, design is borrowed from jeff bonwick
vmem's paper and the implementation is similar to the one of pool_cache from
dlg@. However there is no depot layer and magazines are refilled directly by
the pmemrange allocator.

Tested by robert@, claudio@ and Laurence Tratt.

ok kettenis@


Revision tags: OPENBSD_7_5_BASE
# 1.174 13-Feb-2024 miod

Remove sanity checks from uvm_pagefree(). The first thing this function does
is invoke uvm_pageclean(), which performs the exact same sanity check, so
one set of checks is enough.

ok mpi@


Revision tags: OPENBSD_7_4_BASE
# 1.173 12-Aug-2023 mpi

Add sanity checks in uvm_pagelookup().

ok kettenis@


# 1.172 13-May-2023 mpi

Put back in the simplification of the aiodone daemon.

Previous "breakage" of the swap on arm64 has been found to be an issue
on one machine the rockpro/arm64 related to a deadlock built into the
sdmmc(4) stack interacting with swapping code both running under
KERNEL_LOCK().

This issue is easily reproducible on -current and entering swap when
building LLVM on a rockpro crashes the machine by memory corruption.

Tested by mlarkin@ on octeon & i386, by myself on amd64 & arm64 and by
sthen@ on i386 port bulk.

ok beck@ some time ago.

Previous commit message:

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.171 11-Apr-2023 jsg

fix double words in comments
feedback and ok jmc@ miod, ok millert@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.170 29-Aug-2022 jsg

static inline, not inline static

c99 6.11.5:
"The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature."

ok guenther@


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.175 17-Apr-2024 mpi

Add per-CPU caches to the pmemrange allocator.

The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to get a
page from the current CPU's cache. So on some architectures the caches are
also used by the pmap layer.

Each cache is composed of two magazines, design is borrowed from jeff bonwick
vmem's paper and the implementation is similar to the one of pool_cache from
dlg@. However there is no depot layer and magazines are refilled directly by
the pmemrange allocator.

Tested by robert@, claudio@ and Laurence Tratt.

ok kettenis@


Revision tags: OPENBSD_7_5_BASE
# 1.174 13-Feb-2024 miod

Remove sanity checks from uvm_pagefree(). The first thing this function does
is invoke uvm_pageclean(), which performs the exact same sanity check, so
one set of checks is enough.

ok mpi@


Revision tags: OPENBSD_7_4_BASE
# 1.173 12-Aug-2023 mpi

Add sanity checks in uvm_pagelookup().

ok kettenis@


# 1.172 13-May-2023 mpi

Put back in the simplification of the aiodone daemon.

Previous "breakage" of the swap on arm64 has been found to be an issue
on one machine the rockpro/arm64 related to a deadlock built into the
sdmmc(4) stack interacting with swapping code both running under
KERNEL_LOCK().

This issue is easily reproducible on -current and entering swap when
building LLVM on a rockpro crashes the machine by memory corruption.

Tested by mlarkin@ on octeon & i386, by myself on amd64 & arm64 and by
sthen@ on i386 port bulk.

ok beck@ some time ago.

Previous commit message:

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.171 11-Apr-2023 jsg

fix double words in comments
feedback and ok jmc@ miod, ok millert@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.170 29-Aug-2022 jsg

static inline, not inline static

c99 6.11.5:
"The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature."

ok guenther@


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.174 13-Feb-2024 miod

Remove sanity checks from uvm_pagefree(). The first thing this function does
is invoke uvm_pageclean(), which performs the exact same sanity check, so
one set of checks is enough.

ok mpi@


Revision tags: OPENBSD_7_4_BASE
# 1.173 12-Aug-2023 mpi

Add sanity checks in uvm_pagelookup().

ok kettenis@


# 1.172 13-May-2023 mpi

Put back in the simplification of the aiodone daemon.

Previous "breakage" of the swap on arm64 has been found to be an issue
on one machine the rockpro/arm64 related to a deadlock built into the
sdmmc(4) stack interacting with swapping code both running under
KERNEL_LOCK().

This issue is easily reproducible on -current and entering swap when
building LLVM on a rockpro crashes the machine by memory corruption.

Tested by mlarkin@ on octeon & i386, by myself on amd64 & arm64 and by
sthen@ on i386 port bulk.

ok beck@ some time ago.

Previous commit message:

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.171 11-Apr-2023 jsg

fix double words in comments
feedback and ok jmc@ miod, ok millert@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.170 29-Aug-2022 jsg

static inline, not inline static

c99 6.11.5:
"The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature."

ok guenther@


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.173 12-Aug-2023 mpi

Add sanity checks in uvm_pagelookup().

ok kettenis@


# 1.172 13-May-2023 mpi

Put back in the simplification of the aiodone daemon.

Previous "breakage" of the swap on arm64 has been found to be an issue
on one machine the rockpro/arm64 related to a deadlock built into the
sdmmc(4) stack interacting with swapping code both running under
KERNEL_LOCK().

This issue is easily reproducible on -current and entering swap when
building LLVM on a rockpro crashes the machine by memory corruption.

Tested by mlarkin@ on octeon & i386, by myself on amd64 & arm64 and by
sthen@ on i386 port bulk.

ok beck@ some time ago.

Previous commit message:

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.171 11-Apr-2023 jsg

fix double words in comments
feedback and ok jmc@ miod, ok millert@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.170 29-Aug-2022 jsg

static inline, not inline static

c99 6.11.5:
"The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature."

ok guenther@


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.172 13-May-2023 mpi

Put back in the simplification of the aiodone daemon.

Previous "breakage" of the swap on arm64 has been found to be an issue
on one machine the rockpro/arm64 related to a deadlock built into the
sdmmc(4) stack interacting with swapping code both running under
KERNEL_LOCK().

This issue is easily reproducible on -current and entering swap when
building LLVM on a rockpro crashes the machine by memory corruption.

Tested by mlarkin@ on octeon & i386, by myself on amd64 & arm64 and by
sthen@ on i386 port bulk.

ok beck@ some time ago.

Previous commit message:

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.171 11-Apr-2023 jsg

fix double words in comments
feedback and ok jmc@ miod, ok millert@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.170 29-Aug-2022 jsg

static inline, not inline static

c99 6.11.5:
"The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature."

ok guenther@


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.171 11-Apr-2023 jsg

fix double words in comments
feedback and ok jmc@ miod, ok millert@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.170 29-Aug-2022 jsg

static inline, not inline static

c99 6.11.5:
"The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature."

ok guenther@


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.170 29-Aug-2022 jsg

static inline, not inline static

c99 6.11.5:
"The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature."

ok guenther@


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.169 01-Aug-2022 mpi

Introduce and use uvm_pagewait() where PG_WANTED is set.

No change in behavior.

ok kn@, semarie@, kettenis@


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.168 24-Jul-2022 mpi

Revert simplification of the aiodone daemon it breaks swap on arm64.

Found the hard way by mlarkin@ and deraadt@.


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.167 11-Jul-2022 mpi

Simplify the aiodone daemon which is only used for async writes.

- Remove unused support for asynchronous read, including error conditions

- Grab the proper lock for each page that has been written to swap. This
allows to enable an assertion in uvm_page_unbusy().

- Move the uvm_anon_release() call outside of uvm_page_unbusy() and
assert for the different anon cases.

ok beck@, kettenis@


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.166 12-May-2022 mpi

Introduce uvm_pagedequeue() to reduce code duplication.

ok kettenis@


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.165 04-May-2022 mpi

Merge swap-backed and object-backed inactive page lists.

ok millert@, kettenis@


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.164 28-Apr-2022 mpi

Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.

Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.

ok kettenis@, kn@


Revision tags: OPENBSD_7_1_BASE
# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.163 12-Mar-2022 mpi

Uncompress some one line comments to reduce the difference with NetBSD.

No functionnal change.


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.162 10-Mar-2022 mpi

Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().

Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.

This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.

Found the hard way by sthen@

ok kettenis@, kn@


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.161 19-Jan-2022 mpi

Comment out an incorrect lock assertion.

The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.

Reported by Ralf Horstmann on bugs@


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.160 15-Dec-2021 mpi

Use a per-UVM object lock to serialize the lower part of the fault handler.

Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.

The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.

Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().

The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.

To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.

This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.

Ported from NetBSD, tested by many, thanks!

ok kettenis@, kn@


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.159 17-Oct-2021 patrick

km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.

ok mpi@


Revision tags: OPENBSD_7_0_BASE
# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.158 05-Sep-2021 mpi

Introduce dummy pagers for 'special' subsystems using UVM objects.

Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.

Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().

Tested by many as part of a larger diff.

ok kettenis@, beck@


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.157 21-Apr-2021 mpi

Convert remaining uvm_km_zalloc(9) to km_alloc(9).

Tested by bluhm@, jj@, kettenis@ and Scott Bennett.

ok kettenis@


Revision tags: OPENBSD_6_9_BASE
# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.156 26-Mar-2021 mpi

Remove parenthesis around return value to reduce the diff with NetBSD.

No functional change.

ok mlarkin@


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.155 19-Jan-2021 mpi

(re)Introduce locking for amaps & anons.

A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.

This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.

This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.

Tested by many, thanks!

ok kettenis@, mvs@


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.154 02-Dec-2020 mpi

Document that the page queue must only be locked if the page is managed.

ok kettenis@


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.153 01-Dec-2020 mpi

Turn uvm_pagealloc() mp-safe by checking uvmexp global with pageqlock held.

Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.

Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().

Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.

Document locking of some uvmexp fields.

ok kettenis@


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.152 27-Nov-2020 mpi

Set the correct IPL for `pageqlock' now that it is grabbed from interrupt.

Reported by AIsha Tammy.

ok kettenis@


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.151 24-Nov-2020 mpi

Grab the `pageqlock' before calling uvm_pageclean() as intended.

Document which global data structures require this lock and add some
asserts where the lock should be held.

Some code paths are still incorrect and should be revisited.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.150 22-Sep-2020 mpi

Spell inline correctly.

Reduce differences with NetBSD.

ok mvs@, kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.149 29-Nov-2019 kettenis

Split out the code that removes a page from uvm objects and clears the flags
into a separate uvm_pageclean() function and call it from uvm_pagefree().

ok mpi@, guenther@, beck@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.148 26-Feb-2019 visa

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@


Revision tags: OPENBSD_6_4_BASE
# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


# 1.147 12-May-2018 krw

Re-apply inadvertantly misplaced r1.127 from kettenis@:

"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."

ok beck@ (again)


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.146 07-Nov-2016 guenther

Split PID from TID, giving processes a PID unrelated to the TID of their
initial thread

ok jsing@ kettenis@


# 1.145 16-Sep-2016 dlg

move the vm_page struct from being stored in RB macro trees to RBT functions

vm_page structs go into three trees, uvm_objtree, uvm_pmr_addr, and
uvm_pmr_size. all these have been moved to RBT code.

this should give us a decent chunk of code space back.


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.144 30-Oct-2015 miod

Fix two (verified to be harmless) off-by-ones in bounds checks in
uvm_page_init() (causing uvmexp.npages to be sligthly wrong if
pmap_steal_memory() has been used) and uvm_page_physload().

ok guenther@ kettenis@ visa@ beck@


# 1.143 08-Oct-2015 kettenis

Lock the page queues by turning uvm_lock_pageq() and uvm_unlock_pageq() into
mtx_enter() and mtx_leave() operations. Not 100% this won't blow up but
there is only one way to find out, and we need this to make progress on
further unlocking uvm.

prodded by deraadt@


# 1.142 21-Sep-2015 visa

Drop a misleading XXX about PQ_AOBJ. Clear PQ_ANON unconditionally for
consistency with PQ_AOBJ.

Input kettenis@, ok beck@


# 1.141 21-Aug-2015 visa

Remove the unused loan_count field and the related uvm logic. Most of
the page loaning code is already in the Attic.

ok kettenis@, beck@


Revision tags: OPENBSD_5_8_BASE
# 1.140 19-Jul-2015 beck

Fix backward test that broke the cache


# 1.139 19-Jul-2015 beck

Change uvm_page[re]alloc_multi to actually use the flags passed in, and return
a value so that they may be called with UVM_PLA_NOWAIT
ok kettenis@


# 1.138 23-Apr-2015 dlg

tedu remnants of the previous attempt to implement page zeroing in
the idle thread.

ok deraadt@


# 1.137 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.136 28-Feb-2015 mlarkin

Typo in comment 'reseve' -> 'reserve'


# 1.135 08-Feb-2015 deraadt

Something is subtly wrong with this. On ramdisks, processes run out of
mappable memory (direct or via execve), perhaps because of the address
allocator behind maps and the way wiring counts work?


# 1.134 07-Feb-2015 kettenis

Tedu the old idle page zeroing code.

ok tedu@, guenther@, miod@


# 1.133 06-Feb-2015 deraadt

Clear PQ_AOBJ before calling uvm_pagefree(), clearing up one false XXX
comment (one is fixed, one is deleted).
ok kettenis beck


# 1.132 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.131 11-Jul-2014 jsg

Chuck Cranor rescinded clauses in his license
on the 2nd of February 2011 in NetBSD.

http://marc.info/?l=netbsd-source-changes&m=129658899212732&w=2
http://marc.info/?l=netbsd-source-changes&m=129659095515558&w=2
http://marc.info/?l=netbsd-source-changes&m=129659157916514&w=2
http://marc.info/?l=netbsd-source-changes&m=129665962324372&w=2
http://marc.info/?l=netbsd-source-changes&m=129666033625342&w=2
http://marc.info/?l=netbsd-source-changes&m=129666052825545&w=2
http://marc.info/?l=netbsd-source-changes&m=129666922906480&w=2
http://marc.info/?l=netbsd-source-changes&m=129667725518082&w=2


# 1.130 13-Apr-2014 tedu

compress code by turning four line comments into one line comments.
emphatic ok usual suspects, grudging ok miod


Revision tags: OPENBSD_5_5_BASE
# 1.129 23-Jan-2014 miod

unifdef -D__HAVE_VM_PAGE_MD - no functional change.


Revision tags: OPENBSD_5_4_BASE
# 1.128 09-Jul-2013 beck

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


# 1.127 21-Jun-2013 kettenis

Buffer cache pages are wired but not counted as such. Therefore we have to
set the wire count on the pages to 0 before we call uvm_pagefree() on them,
just like we do in buf_free_pages(). Otherwise the wired pages counter goes
negative. While there, also sprinkle some KASSERTs in there that
buf_free_pages() has as well.

ok beck@


# 1.126 11-Jun-2013 beck

High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.


# 1.125 30-May-2013 tedu

remove lots of comments about locking per beck's request


# 1.124 30-May-2013 tedu

remove simple_locks from uvm code. ok beck deraadt


# 1.123 27-Mar-2013 tedu

combine several atomic_clearbits calls into one. slightly faster on
machines where atomic ops aren't so simple.
ok beck deraadt miod


# 1.122 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 3 - re-merge 1.116 to 1.118


Revision tags: OPENBSD_5_3_BASE
# 1.121 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 2 - re-merge 1.119 (the WAITOK diff)


# 1.120 12-Mar-2013 deraadt

preserving main-branch topology for a perverse reason:
step 1 - backout 1.116 to 1.119


# 1.119 12-Mar-2013 beck

Fix horrible typo of mine checking for WAITOK flags, found by sthen.
This fix actually by mikeb@, this needs thorough testing to verify
it doesn't bring up other issues in what it hid.
ok deraadt@


# 1.118 06-Mar-2013 beck

Account for the size of the allocation when defending the pagedaemon reserve.
Spotted by oga@nicotinebsd.org, with help from dhill@. Fix by me.
ok miod@


# 1.117 03-Mar-2013 miod

Use local vm_physseg pointers instead of compting vm_physmem[index] gazillions
of times. No function change but makes the code a bit smaller.

ok mpi@


# 1.116 02-Mar-2013 miod

Simplify uvm_pagealloc() to only need one atomic operation on the page flags
instead of two, building upon the knowledge of the state uvm_pagealloc_pg()
leaves the uvm_page in.
ok mpi@


# 1.115 07-Feb-2013 beck

Bring back reserve enforcement and page daemon wakeup into uvm_pglistalloc,
It was removed as this function was redone to use pmemrange in mid 2010
with the result that kernel malloc and other users of this function can
consume the page daemon reserve and run us out of memory.
ok kettenis@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.114 08-Jul-2011 tedu

some machines don't boot with the previous uvm reserve enforcement diff.
back it out.


# 1.113 07-Jul-2011 oga

Move the uvm reserve enforcement from uvm_pagealloc to pmemrange.

More and more things are allocating outside of uvm_pagealloc these days making
it easy for something like the buffer cache to eat your last page with no
repercussions (other than a hung machine, of course).

ok ariane@ also ok ariane@ again after I spotted and fixed a possible underflow
problem in the calculation.


# 1.112 06-Jul-2011 beck

uvm changes for buffer cache improvements.
1) Make the pagedaemon aware of the memory ranges and size of allocations
where memory is being requested, and pass this information on to
bufbackoff(), which will later (not yet) be used to ensure that the
buffer cache gets out of the way in the right area of memory.

Note that this commit does not yet make it *do* that - as currently
the buffer cache is all in dma-able memory and it will simply back
off.

2) Add uvm_pagerealloc_multi - to be used by the buffer cache code
for reallocating pages to particular regions.

much of this work by ariane, with smatterings of me, art,and oga

ok oga@, thib@, ariane@, deraadt@


# 1.111 03-Jul-2011 oga

Rip out and burn support for UVM_HIST.

The vm hackers don't use it, don't maintain it and have to look at it all the
time. About time this 800 lines of code hit /dev/null.

``never liked it'' tedu@. ariane@ was very happy when i told her i wrote
this diff.


# 1.110 23-Jun-2011 oga

Check for the correct flag when checking to see if the page is part of an aobj.

This is no function change since aobjs never actually hit this path. (also it is
my bug from a while ago).

ok ariane@


# 1.109 23-Jun-2011 oga

Move uvm_pglistalloc and uvm_pglistfree to uvm_page.c and garbage
college uvm_pglist.c

uvm_pglistalloc and free are just thin wrappers around pmemrange these
days and don't really need their own file.

ok ariane@


# 1.108 30-May-2011 oga

Remove the freelist member from vm_physseg

The new world order of pmemrange makes this data completely redundant
(being dealt with by the pmemrange constraints instead). Remove all code
that messes with the freelist.

While touching every caller of uvm_page_physload() anyway, add the flags
argument to all callers (all but one is 0 and that one already used
PHYSLOAD_DEVICE) and remove the macro magic to allow callers to continue
without it.

Should shrink the code a bit, as well.

matthew@ pointed out some mistakes i'd made.
``freelist death, I like. Ok.' ariane@
`I agree with the general direction, go ahead and i'll fix any fallout
shortly'' miod@ (68k 88k and vax i could not check would build)


# 1.107 10-May-2011 oga

Kill vm_page_lookup_freelist.

it belongs to a world order that isn't here anymore. More importantly it
has been unused for a fair while now.

ok thib@


# 1.106 15-Apr-2011 oga

Add a bit of paranoia to uvm_pageinsert.

At various times diffs have had debugging that checked that we don't
insert a page into the tree on top of an existing page, leaking that
page's references. Until the recent hackathon (and introduction if
uvm_pagealloc_multi) the bufcache for example did a rb tree look up on
insert to check (under #ifdef DEBUG || 1) so instead just check it on
pageinsert every time, since RB_INSERT returns any duplicates so this
check is pretty much free.

``emphatically yes'' beck@


# 1.105 03-Apr-2011 beck

knf - trailing whitespace flense.
ok henning@


# 1.104 02-Apr-2011 beck

Constrain the buffer cache to use only the dma reachable region of memory.
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@


# 1.103 02-Apr-2011 ariane

Count the number of physical pages within a memory range.
Bob needs this.

ok art@ bob@ thib@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.102 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.101 27-Jun-2010 thib

uvm constraints. Add two mandatory MD symbols, uvm_md_constraints
which contains the constraints for DMA/memory allocation for each
architecture, and dma_constraints which contains the range of addresses
that are dma accessable by the system.

This is based on ariane@'s physcontig diff, with lots of bugfixes and
additions the following additions by my self:

Introduce a new function pool_set_constraints() which sets the address
range for which we allocate pages for the pool from, this is now used
for the mbuf/mbuf cluster pools to keep them dma accessible.

The !direct archs no longer stuff pages into the kernel object in
uvm_km_getpage_pla but rather do a pmap_extract() in uvm_km_putpages.

Tested heavily by my self on i386, amd64 and sparc64. Some tests on
alpha and SGI.

"commit it" beck, art, oga, deraadt
"i like the diff" deraadt


# 1.100 22-Apr-2010 oga

Committing on behalf or ariane@.

recommit pmemrange:
physmem allocator: change the view of free memory from single
free pages to free ranges. Classify memory based on region with
associated use-counter (which is used to construct a priority
list of where to allocate memory).

Based on code from tedu@, help from many.

Useable now that bugs have been found and fixed in most architecture's
pmap.c

ok by everyone who has done a pmap or uvm commit in the last year.


# 1.99 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.98 24-Mar-2010 oga

Bring back PHYSLOAD_DEVICE for uvm_page_physload.

ok kettenis@ beck@ (tentatively) and ariane@. deraadt asked for it to be
commited now.

original commit message:

extend uvm_page_physload to have the ability to add "device" pages to
the system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@ kettenis@, beck@


Revision tags: OPENBSD_4_7_BASE
# 1.97 14-Oct-2009 beck

Fix buffer cache backoff in the page daemon - deal with inactive pages to
more correctly reflect the new state of the world - that is - how many pages
can be cheaply reclaimed - which now includes clean buffer cache pages.

This change fixes situations where people would be running with a large bufcachepercent, and still notice swapping without the buffer cache backing off.

ok oga@, testing by many on tech@ and others. Thanks.


# 1.96 13-Aug-2009 deraadt

PAGEFASTRECYCLE is an option we have been using for a while to encourage
the kernel to reuse freed pages as quickly as possible, and it has been
finding bugs (some of which we have already fixed)
ok kettenis


# 1.95 06-Aug-2009 oga

reintroduce the uvm_tree commit.

Now instead of the global object hashtable, we have a per object tree.

Testing shows no performance difference and a slight code shrink. OTOH when
locking is more fine grained this should be faster due to lock contention on
uvm.hashlock.

ok thib@, art@.


# 1.94 26-Jul-2009 deraadt

stop trying to fast-recycle pages for now. a few bugs have been found and
fixed, but now it is time for a little break from the chaos.
ok kettenis


# 1.93 23-Jul-2009 kettenis

Insert free pages at the head of the page queues. Should provide better
cache locality and will pave the way for the new pmemrange allocator.
Based on hints from art@ and ariane@.

ok ariane@, deraadt@, oga@


# 1.92 22-Jul-2009 oga

Put the PG_RELEASED changes diff back in.

This has has been tested very very thoroughly on all archs we have
excepting 88k and 68k. Please see cvs log for the individual commit
messages.

ok beck@, thib@


Revision tags: OPENBSD_4_6_BASE
# 1.91 17-Jun-2009 oga

date based reversion of uvm to the 4th May.

More backouts in line with previous ones, this appears to bring us back to a
stable condition.

A machine forced to 64mb of ram cycled 10GB through swap with this diff
and is still running as I type this. Other tests by ariane@ and thib@
also seem to show that it's alright.

ok deraadt@, thib@, ariane@


# 1.90 16-Jun-2009 ariane

Backout pmemrange (which to most people is more well known as physmem
allocator).

"i can't see any obvious problems" oga


# 1.89 16-Jun-2009 oga

Backout all changes to uvm after pmemrange (which will be backed out
separately).

a change at or just before the hackathon has either exposed or added a
very very nasty memory corruption bug that is giving us hell right now.
So in the interest of kernel stability these diffs are being backed out
until such a time as that corruption bug has been found and squashed,
then the ones that are proven good may slowly return.

a quick hitlist of the main commits this backs out:

mine:
uvm_objwire
the lock change in uvm_swap.c
using trees for uvm objects instead of the hash
removing the pgo_releasepg callback.

art@'s:
putting pmap_page_protect(VM_PROT_NONE) in uvm_pagedeactivate() since
all callers called that just prior anyway.

ok beck@, ariane@.

prompted by deraadt@.


# 1.88 14-Jun-2009 deraadt

backout:
> extend uvm_page_physload to have the ability to add "device" pages to the
> system.
since it was overlayed over a system that we warned would go "in to be
tested, but may be pulled out". oga, you just made me spend 20 minutes
of time I should not have had to spend doing this.


# 1.87 07-Jun-2009 oga

extend uvm_page_physload to have the ability to add "device" pages to the
system.

This is needed in the case where you need managed pages so you can
handle faulting and pmap_page_protect() on said pages when you manage
memory in such regions (i'm looking at you, graphics cards).

these pages are flagged PG_DEV, and shall never be on the freelists,
assert this. behaviour remains unchanged in the non-device case,
specifically for all archs currently in the tree we panic if called
after bootstrap.

ok art@, kettenis@, ariane@, beck@.


# 1.86 06-Jun-2009 art

Since all callers of uvm_pagedeactivate did pmap_page_protect(.., VM_PROT_NONE)
just move that into uvm_pagedeactivate.

oga@ ok


# 1.85 03-Jun-2009 ariane

phys allocator fix: zeroed pages are not clean.


# 1.84 02-Jun-2009 oga

Instead of the global hash table with the terrible hashfunction and a
global lock, switch the uvm object pages to being kept in a per-object
RB_TREE. Right now this is approximately the same speed, but cleaner.
When biglock usage is reduced this will improve concurrency due to lock
contention..

ok beck@ art@. Thanks to jasper for the speed testing.


# 1.83 02-Jun-2009 ariane

Clear PQ_ENCRYPT flag on uvm_pagefree, because free pages are by definition
not encrypted.


# 1.82 01-Jun-2009 oga

Since we've now cleared up a lot of the PG_RELEASED setting, remove the
pgo_releasepg() hook and just free the page the "normal" way in the one
place we'll ever see PG_RELEASED and should care (uvm_page_unbusy,
called in aiodoned).

ok art@, beck@, thib@


# 1.81 01-Jun-2009 ariane

physmem allocator: change the view of free memory from single free pages
to free ranges.
Classify memory based on region with associated use-counter (which is used
to construct a priority list of where to allocate memory).

Based on code from tedu@, help from many.
Ok art@


# 1.80 08-May-2009 ariane

Clear PQ_AOBJ at pageremove: when a page is no longer part of a uvm_object,
it is also not part of an aobj.
Clear anon flags at pagefree: page is no longer part of an anon.

ok oga


# 1.79 08-May-2009 ariane

Remove static qualifier of functions that are not inline.
Makes trace in ddb useful.

ok oga


# 1.78 04-May-2009 oga

Instead of keeping two ints in the uvm structure specifically just to
sleep on them (and otherwise ignore them) sleep on the pointer to the
{aiodoned,pagedaemon}_proc members, and nuke the two extra words.

"no objections" art@, ok beck@.


# 1.77 01-May-2009 oga

uvm_page_alloc() + memset -> uvm_page_zalloc()

nothing uses this code yet, but might as well do it the right way.

"if you can't live without commiting this." miod@


# 1.76 28-Apr-2009 miod

Revert pageqlock back from a mutex to a simple_lock, as it needs to be
recursive in some cases (mostly involving swapping). A proper fix is in
the works, but this will unbreak kernels for now.


# 1.75 14-Apr-2009 oga

The use of uvm.pagedaemon_lock is incredibly inconsistent. only a
fraction of the wakeups and sleeps involved here actually grab that
lock. The remainder, on the other hand, always have the fpageq_lock
locked.

So, make this locking correct by switching the other users over to
fpageq_lock, too.

This would probably be better off being a semaphore, but for now at
least it's correct.

"ok, unless you want to implement semaphores" art@


# 1.74 13-Apr-2009 oga

Convert the page queue lock to a mutex instead of a simplelock.

Fix up the one case of lock recursion (which blatantly ignored the
comment right above it saying that we don't need to lock). The rest of
the lock usage has been checked and appears to be correct.

ok ariane@.


# 1.73 06-Apr-2009 oga

In the case where VM_PHYSSEG_MAX == 1 make vm_physseg_find and
PHYS_TO_VM_PAGE inline again. This should stop function call overhead
killing the vax and other slow archs while keeping the benefit for the
faster platforms.

suggested by miod. ok miod@, toby@.


# 1.72 06-Apr-2009 oga

Instead of doing splbio(); simple_lock(&uvm.aiodoned_lock); just replace
the simple lock with a real lock - a IPL_BIO mutex. While i'm here, make
the sleeping condition one hell of a lot simpler in the aio daemon.

some ideas from and ok art@.


# 1.71 26-Mar-2009 oga

Convert splvm() + simplelock(&uvm.hashlock); around the page hash table
into a IPL_VM blocking mutex, also slightly extend the locked area so
that it actually protects access to the page array (as the comment on
the lock declaration says it should).

ansify a few functions while i'm in the file.

"ok, even though you're sneaking in ansification in a diff. You dirty
you." art@


# 1.70 25-Mar-2009 oga

Move all of the pseudo-inline functions in uvm into C files.

By pseudo-inline, I mean that if a certain macro was defined, they would
be inlined. However, no architecture defines that, and none has for a
very very long time. Therefore mainly this just makes the code a damned
sight easier to read. Some k&r -> ansi declarations while I'm in there.

"just commit it" art@. ok weingart@.


# 1.69 24-Mar-2009 oga

vm_physseg_find and VM_PAGE_TO_PHYS are both called many times in your
average arch port. They are also inline. This does not help, de-inline them.

shaves about 1k on i386 and amd64 bsd.mp. Probably similar amounts of
most architectures.

"no issue" beck@ "Nuke nuke nuke... make them functions" weingart@ "this
is good" art@


# 1.68 23-Mar-2009 art

Processor affinity for processes.
- Split up run queues so that every cpu has one.
- Make setrunqueue choose the cpu where we want to make this process
runnable (this should be refined and less brutal in the future).
- When choosing the cpu where we want to run, make some kind of educated
guess where it will be best to run (very naive right now).
Other:
- Set operations for sets of cpus.
- load average calculations per cpu.
- sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.67 02-Jul-2008 art

Make the pagedaemon a bit happier.
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)

Testing by many, prodded by theo.


# 1.66 12-Apr-2008 miod

Prune the in-use swap encryption keys in uvm_shutdown(), per deraadt@'s idea.


# 1.65 09-Apr-2008 deraadt

Add new stub uvm_shutdown() and call it from the right place in MD boot()


Revision tags: OPENBSD_4_3_BASE
# 1.64 04-Jan-2008 miod

Only compile in uvm_page_physdump() if option DDB as it's not directly callable
and supposed to be only used from within ddb.


# 1.63 18-Dec-2007 thib

Turn the uvm_{lock/unlock}_fpageq() inlines into
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.

ok miod@,art@


# 1.62 29-Nov-2007 tedu

use a working mutex for the freepage list. ok art deraadt


Revision tags: OPENBSD_4_2_BASE
# 1.61 18-Jun-2007 pedro

Bring back Mickey's UVM anon change. Testing by thib@, beck@ and
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.


# 1.60 18-May-2007 art

Instead of checking whichqs directly, add a "sched_is_idle()" macro to
sys/sched.h and use that to check if there's something to do.

kettenis@ thib@ ok


# 1.59 13-Apr-2007 art

While splitting flags and pqflags might have been a good idea in theory
to separate locking, on most modern machines this is not enough
since operations on short types touch other short types that share the
same word in memory.

Merge pg_flags and pqflags again and now use atomic operations to change
the flags. Also bump wire_count to an int and pg_version might go
int as well, just for alignment.

tested by many, many. ok miod@


# 1.58 11-Apr-2007 art

Instead of managing pages for intrsafe maps in special objects (aka.
kmem_object) just so that we can remove them, just use pmap_extract
to get the pages to free and simplify a lot of code to not deal with
the list of intrsafe maps, intrsafe objects, etc.

miod@ ok


# 1.57 04-Apr-2007 art

Mechanically rename the "flags" and "version" fields in struct vm_page
to "pg_flags" and "pg_version", so that they are a bit easier to work with.
Whoever uses generic names like this for a popular struct obviously doesn't
read much code.

Most architectures compile and there are no functionality changes.

deraadt@ ok ("if something fails to compile, we fix that by hand")


Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.56 31-Jul-2006 mickey

fix uvmhist #2: args are always u_long so fix missing %d and %x and no %ll; no change for normal code


# 1.55 26-Jul-2006 mickey

fix fmts for UVMHIST_LOG() entries making it more useful on 64bit archs; miod@ ok


# 1.54 13-Jul-2006 deraadt

Back out the anon change. Apparently it was tested by a few, but most of
us did not see it or get a chance to test it before it was commited. It
broke cvs, in the ami driver, making it not succeed at seeing it's devices.


# 1.53 21-Jun-2006 mickey

from netbsd: make anons dynamically allocated from pool.
this results in lesse kva waste due to static preallocation of those
for every phys page and also every swap page.
tested by beck krw miod


# 1.52 27-Apr-2006 mickey

from PAE work:
as freepages being vconverted back to byte address make sure to
perform calculations in (upcoming) larger paddr_t to avoid losing
higher bits in calculation.


Revision tags: OPENBSD_3_9_BASE
# 1.51 16-Jan-2006 mickey

add another uvm histroy for physpage alloc/free and propagate a debugging pgfree check into pglist; no functional change for normal kernels; make histories uncommon


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.50 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE SMP_SYNC_A SMP_SYNC_B
# 1.49 23-Feb-2004 drahn

sync of pmap_update() calls with NetBSD. pmap_update is defined away on
all architectures but arm, where it is needed.


Revision tags: OPENBSD_3_4_BASE
# 1.48 01-Jun-2003 miod

Typo in panic message.


Revision tags: UBC_SYNC_A
# 1.47 29-Mar-2003 mickey

ubchist is not a fully cooked kadaver and though use the other well formed pdhist one until ubc gaets back. art@ ok


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_B
# 1.46 12-Oct-2002 krw

Remove more '\n's from panic() statements. Both trailing and leading.

Diff generated by Chris Kuethe.


Revision tags: OPENBSD_3_2_BASE
# 1.45 12-Sep-2002 art

Change the PMAP_PAGEIDLEZERO api to take the struct vm_page instead of the pa.


# 1.44 10-Sep-2002 art

Change the pmap_zero_page and pmap_copy_page API to take the struct vm_page *
instead of the pa. Most callers already had it handy and those who didn't
only called it for managed pages and were outside time-critical code.

This will allow us to make those functions clean and fast on sparc and
sparc64 letting us to avoid unnecessary cache flushes.

deraadt@ miod@ drahn@ ok.


# 1.43 11-Jun-2002 art

Allow MD code to define __HAVE_VM_PAGE_MD to add own members into struct vm_page.
From NetBSD.


Revision tags: OPENBSD_3_1_BASE
# 1.42 14-Mar-2002 millert

First round of __P removal in sys


# 1.41 28-Jan-2002 art

allocate vm pages with uvm_km_alloc (this code is ifdefed out anyway).


# 1.40 02-Jan-2002 miod

Back out a few more uvm changes, especially wrt swap usage.
This unbreaks m68k m88k sparc and perhaps others, which eventually froze
when hitting swap.
Tested by various people on various platforms.
ok art@


# 1.39 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.38 06-Dec-2001 art

branches: 1.38.2;
Keep track of how many pages a vnode hold with vhold and vholdrele
so that we can get back the old behavior where a vnode with cached data
is less likely to be recycled than a vnode without cached data.

XXX - This is a brute-force solution - we do it where uvmexp.vnodepages
are changed, I am not really sure it is correct but people have been
very happy with the diff so far and want this in the tree.


# 1.37 04-Dec-2001 art

Yet another sync to NetBSD uvm.
Today we add a pmap argument to pmap_update() and allocate map entries for
kernel_map from kmem_map instead of using the static entries. This should
get rid of MAX_KMAPENT panics. Also some uvm_loan problems are fixed.


# 1.36 30-Nov-2001 art

Kill uvm_pagealloc_contig. The two drivers that still used it should have
been converted to bus_dma ages ago, but since noone haven't bothered to do that
I haven't bothered to do more than to test that the kernel still builds
with those changes.


# 1.35 28-Nov-2001 art

Sync in more uvm from NetBSD. Mostly just cosmetic stuff.
Contains also support for page coloring.


# 1.34 28-Nov-2001 art

more sync to netbsd. some bugfixes in uvm_km_kmemalloc, lots of fixes in uvm_loan.


# 1.33 28-Nov-2001 art

Sync in more uvm changes from NetBSD.
This time we're getting rid of KERN_* and VM_PAGER_* error codes and
use errnos instead.


# 1.32 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.31 12-Nov-2001 art

Bring in more changes from NetBSD. Mostly pagedaemon improvements.


# 1.30 10-Nov-2001 art

Merge in some parts of the ubc work that has been done in NetBSD that are not
UBC, but prerequsites for it.

- Create a daemon that processes async I/O (swap and paging in the future)
requests that need processing in process context and that were processed
in the pagedaemon before.
- Convert some ugly ifdef DIAGNOSTIC code to less intrusive KASSERTs.
- misc other cleanups.


# 1.29 07-Nov-2001 art

Another sync of uvm to NetBSD. Just minor fiddling, no major changes.


# 1.28 07-Nov-2001 art

Add an alignment argument to uvm_map that specifies an alignment hint
for the virtual address.


# 1.27 06-Nov-2001 art

More sync to NetBSD.
- Use malloc/free instead of MALLOC/FREE for variable sized allocations.
- Move the memory inheritance code to sys/mman.h and rename from VM_* to MAP_*
- various cleanups and simplifications.


# 1.26 06-Nov-2001 art

Move the last content from vm/ to uvm/
The only thing left in vm/ are just dumb wrappers.
vm/vm.h includes uvm/uvm_extern.h
vm/pmap.h includes uvm/uvm_pmap.h
vm/vm_page.h includes uvm/uvm_page.h


# 1.25 05-Nov-2001 art

Minor sync to NetBSD.


Revision tags: OPENBSD_3_0_BASE
# 1.24 19-Sep-2001 mickey

merge vm/vm_kern.h into uvm/uvm_extern.h; art@ ok


# 1.23 25-Aug-2001 art

Default to disabled zeroing of pages in the idle loop.


# 1.22 11-Aug-2001 art

Various random fixes from NetBSD.
Including support for zeroing pages in the idle loop (not enabled yet).


# 1.21 06-Aug-2001 art

Add a new type voff_t (right now it's typedefed as off_t) used for offsets
into objects.

Gives the possibilty to mmap beyond the size of vaddr_t.

From NetBSD.


# 1.20 31-Jul-2001 art

Allocate page buckets from kernel_map. This should save a good
amount of kmem_map on machines with lots of physical memory.


# 1.19 25-Jul-2001 art

Some updates to UVM from NetBSD. Nothing really critical, just a sync.


# 1.18 19-Jul-2001 art

Missed one in PMAP_NEW fix.


# 1.17 18-Jul-2001 art

Get rid of the PMAP_NEW option by making it mandatory for all archs.
The archs that didn't have a proper PMAP_NEW now have a dummy implementation
with wrappers around the old functions.


Revision tags: OPENBSD_2_9_BASE
# 1.16 10-Apr-2001 niklas

Fix for machines which need to enlarge the kernel address space, at least
1GB i386 machines needs this. The fix is heavily based on Jason Thorpe's
found in NetBSD. Here is his original commit message:

Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.


# 1.15 22-Mar-2001 niklas

pastos in diagnostic strings


# 1.14 22-Mar-2001 smart

Sync style, typo, and comments a little closer to NetBSD. art@ ok


# 1.13 08-Mar-2001 smart

Replace thread_wakeup() with wakeup(). It is defined in vm_extern.h as a
wrapper, so this removes a dependence on the old VM system. From NetBSD.
art@ ok


# 1.12 03-Mar-2001 art

Allow the syncer to get pages from the pagedaemon reserve.
Otherwise we can end up in a situation where the syncer waits for pages
and the pagedaemon waits for buffers.


# 1.11 02-Mar-2001 art

Reserve more pages for the pagedaemon and the kernel.
With soft updates, writing out pages to disk can cause a bunch of allocations.


# 1.10 29-Jan-2001 niklas

$OpenBSD$


Revision tags: OPENBSD_2_8_BASE
# 1.9 07-Sep-2000 art

Convert bzero to memset(X, 0..) and bcopy to memcpy.
This is to match (make diffs smaller) the code in NetBSD.
new gcc inlines those functions, so this could also be a performance win.


Revision tags: OPENBSD_2_7_BASE
# 1.8 25-Apr-2000 niklas

A fix to the dreaded isadmaattach panic which hunts people playing with
large memory machines. This time I really hope we can continue quite a bit
away over the Gig.


# 1.7 16-Mar-2000 art

Bring in some new UVM code from NetBSD (not current).

- Introduce a new type of map that are interrupt safe and never allow faults
in them. mb_map and kmem_map are made intrsafe.
- Add "access protection" to uvm_vslock (to be passed down to uvm_fault and
later to pmap_enter).
- madvise(2) now works.
- various cleanups.


Revision tags: OPENBSD_2_6_BASE SMP_BASE kame_19991208
# 1.6 10-Sep-1999 mickey

branches: 1.6.4;
fixup the uvm_map() call in the uvm_pagealloc_contig() w/
right uvm_map flags values, also fix the error ondition check.
couple of spaces vs tabs in the same code spot.
art@ ok


# 1.5 03-Sep-1999 art

Change the pmap_enter api to pass down an argument that indicates
the access type that caused this mapping. This is to simplify pmaps
with mod/ref emulation (none for the moment) and in some cases speed
up pmap_is_{referenced,modified}.
At the same time, clean up some mappings that had too high protection.

XXX - the access type is incorrect in old vm, it's only used by uvm and MD code.
The actual use of this in pmap_enter implementations is not in this commit.


# 1.4 23-Aug-1999 art

sync with NetBSD from 1999.05.24 (there is a reason for this date)
Mostly cleanups, but also a few improvements to pagedaemon for better
handling of low memory and/or low swap conditions.


# 1.3 23-Jul-1999 ho

Add uvm_pagealloc_contig


Revision tags: OPENBSD_2_5_BASE
# 1.2 26-Feb-1999 art

add OpenBSD tags


# 1.1 26-Feb-1999 art

Import of uvm from NetBSD. Some local changes, some code disabled