History log of /netbsd-current/sys/uvm/uvm_km.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.165 09-Apr-2023 riastradh

uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)


# 1.164 26-Feb-2023 skrll

nkmempages should be size_t


# 1.163 12-Feb-2023 andvar

s/strucure/structure/ and s/structues/structures/ in comments.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.162 06-Aug-2022 chs

allow KMSAN to work again by restoring the limiting of kva even with
NKMEMPAGES_MAX_UNLIMITED. we used to limit kva to 1/8 of physmem
but limiting to 1/4 should be enough, and 1/4 still gives the kernel
enough kva to map all of the RAM that KMSAN has not stolen.

Reported-by: syzbot+ca3710b4c40cdd61aa72@syzkaller.appspotmail.com


# 1.161 03-Aug-2022 chs

for platforms which define NKMEMPAGES_MAX_UNLIMITED, set nkmempages
high enough to allow the kernel to map all of RAM into kmem,
so that free physical pages rather than kernel virtual space is
the limiting factor in allocating kernel memory. this gives ZFS
more flexibility in tuning how much memory to use for its ARC cache.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.160 13-Mar-2021 skrll

Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats


# 1.159 09-Jul-2020 skrll

branches: 1.159.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS


# 1.158 08-Jul-2020 skrll

Trailing whitespace


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.157 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.164 26-Feb-2023 skrll

nkmempages should be size_t


# 1.163 12-Feb-2023 andvar

s/strucure/structure/ and s/structues/structures/ in comments.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.162 06-Aug-2022 chs

allow KMSAN to work again by restoring the limiting of kva even with
NKMEMPAGES_MAX_UNLIMITED. we used to limit kva to 1/8 of physmem
but limiting to 1/4 should be enough, and 1/4 still gives the kernel
enough kva to map all of the RAM that KMSAN has not stolen.

Reported-by: syzbot+ca3710b4c40cdd61aa72@syzkaller.appspotmail.com


# 1.161 03-Aug-2022 chs

for platforms which define NKMEMPAGES_MAX_UNLIMITED, set nkmempages
high enough to allow the kernel to map all of RAM into kmem,
so that free physical pages rather than kernel virtual space is
the limiting factor in allocating kernel memory. this gives ZFS
more flexibility in tuning how much memory to use for its ARC cache.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.160 13-Mar-2021 skrll

Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats


# 1.159 09-Jul-2020 skrll

branches: 1.159.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS


# 1.158 08-Jul-2020 skrll

Trailing whitespace


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.157 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.163 12-Feb-2023 andvar

s/strucure/structure/ and s/structues/structures/ in comments.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.162 06-Aug-2022 chs

allow KMSAN to work again by restoring the limiting of kva even with
NKMEMPAGES_MAX_UNLIMITED. we used to limit kva to 1/8 of physmem
but limiting to 1/4 should be enough, and 1/4 still gives the kernel
enough kva to map all of the RAM that KMSAN has not stolen.

Reported-by: syzbot+ca3710b4c40cdd61aa72@syzkaller.appspotmail.com


# 1.161 03-Aug-2022 chs

for platforms which define NKMEMPAGES_MAX_UNLIMITED, set nkmempages
high enough to allow the kernel to map all of RAM into kmem,
so that free physical pages rather than kernel virtual space is
the limiting factor in allocating kernel memory. this gives ZFS
more flexibility in tuning how much memory to use for its ARC cache.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.160 13-Mar-2021 skrll

Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats


# 1.159 09-Jul-2020 skrll

branches: 1.159.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS


# 1.158 08-Jul-2020 skrll

Trailing whitespace


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.157 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.162 06-Aug-2022 chs

allow KMSAN to work again by restoring the limiting of kva even with
NKMEMPAGES_MAX_UNLIMITED. we used to limit kva to 1/8 of physmem
but limiting to 1/4 should be enough, and 1/4 still gives the kernel
enough kva to map all of the RAM that KMSAN has not stolen.

Reported-by: syzbot+ca3710b4c40cdd61aa72@syzkaller.appspotmail.com


# 1.161 03-Aug-2022 chs

for platforms which define NKMEMPAGES_MAX_UNLIMITED, set nkmempages
high enough to allow the kernel to map all of RAM into kmem,
so that free physical pages rather than kernel virtual space is
the limiting factor in allocating kernel memory. this gives ZFS
more flexibility in tuning how much memory to use for its ARC cache.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.160 13-Mar-2021 skrll

Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats


# 1.159 09-Jul-2020 skrll

branches: 1.159.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS


# 1.158 08-Jul-2020 skrll

Trailing whitespace


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.157 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.161 03-Aug-2022 chs

for platforms which define NKMEMPAGES_MAX_UNLIMITED, set nkmempages
high enough to allow the kernel to map all of RAM into kmem,
so that free physical pages rather than kernel virtual space is
the limiting factor in allocating kernel memory. this gives ZFS
more flexibility in tuning how much memory to use for its ARC cache.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.160 13-Mar-2021 skrll

Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats


# 1.159 09-Jul-2020 skrll

branches: 1.159.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS


# 1.158 08-Jul-2020 skrll

Trailing whitespace


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.157 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.160 13-Mar-2021 skrll

Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats


Revision tags: thorpej-futex-base
# 1.159 09-Jul-2020 skrll

Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS


# 1.158 08-Jul-2020 skrll

Trailing whitespace


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.157 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.159 09-Jul-2020 skrll

Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS


# 1.158 08-Jul-2020 skrll

Trailing whitespace


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.157 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.157 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: ad-namecache-base3
# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.156 24-Feb-2020 rin

0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.155 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.154 08-Feb-2020 maxv

Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.


Revision tags: ad-namecache-base2
# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.153 20-Jan-2020 skrll

Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.152 14-Dec-2019 ad

Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.152 14-Dec-2019 ad

Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.151 13-Dec-2019 ad

Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.150 01-Dec-2019 uwe

Add missing #include <sys/atomic.h>


# 1.149 01-Dec-2019 ad

Minor correction to previous.


# 1.148 01-Dec-2019 ad

- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.


Revision tags: phil-wifi-20191119
# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.147 14-Nov-2019 maxv

Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.146 02-Dec-2018 maxv

Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).


Revision tags: pgoyette-compat-1126
# 1.145 04-Nov-2018 mlelstv

PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.144 28-Oct-2017 pgoyette

branches: 1.144.2;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.144 28-Oct-2017 pgoyette

Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.143 01-Jun-2017 chs

branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


# 1.143 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.142 19-Mar-2017 riastradh

__diagused police


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806
# 1.141 27-Jul-2016 maxv

Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.


Revision tags: pgoyette-localcount-20160726
# 1.140 20-Jul-2016 maxv

Introduce uvm_km_protect.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.139 06-Feb-2015 maxv

branches: 1.139.2;
Kill kmeminit().


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base rmind-smpnet-base agc-symver-base tls-maxphys-base
# 1.138 29-Jan-2013 para

branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.


# 1.137 26-Jan-2013 para

revert previous commit not yet fully functional, sorry


# 1.136 26-Jan-2013 para

make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.135 07-Sep-2012 para

branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved


# 1.134 04-Sep-2012 matt

Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.


# 1.133 03-Sep-2012 matt

Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.


# 1.132 03-Sep-2012 matt

Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.


# 1.131 03-Sep-2012 matt

Shut up gcc printf warning.


# 1.130 03-Sep-2012 matt

Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc


# 1.129 03-Sep-2012 matt

Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.


# 1.128 09-Jul-2012 matt

Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.


# 1.127 03-Jun-2012 rmind

Improve the wording slightly.


Revision tags: jmcneill-usbmp-base10
# 1.126 02-Jun-2012 para

add some description about the vmem arenas, how they stack up and their purpose


Revision tags: yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.125 13-Apr-2012 yamt

uvm_km_kmem_alloc: don't hardcode kmem_va_arena


Revision tags: jmcneill-usbmp-base8
# 1.124 12-Mar-2012 bouyer

uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4
# 1.123 25-Feb-2012 rmind

uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.


Revision tags: jmcneill-usbmp-base3
# 1.122 20-Feb-2012 bouyer

When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).


# 1.121 19-Feb-2012 rmind

Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.120 10-Feb-2012 para

branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system


# 1.119 04-Feb-2012 para

improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@


# 1.118 03-Feb-2012 matt

Always allocate the kmem region. Add UVMHIST support. Approved by releng.


# 1.117 02-Feb-2012 para

- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)


# 1.116 01-Feb-2012 para

allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space


# 1.115 01-Feb-2012 matt

Use right UVM_xxx_COLORMATCH flag (even both use the same value).


# 1.114 31-Jan-2012 matt

Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.


# 1.113 29-Jan-2012 para

size kmem_arena more sanely for small memory machines


# 1.112 27-Jan-2012 para

extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.111 01-Sep-2011 matt

branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.


# 1.110 05-Jul-2011 yamt

- fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.


# 1.109 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base rmind-uvmplock-base
# 1.108 02-Feb-2011 chuck

branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.


Revision tags: jruoho-x86intr-base
# 1.107 04-Jan-2011 matt

branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.106 14-May-2010 cegger

Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.105 08-Feb-2010 joerg

branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.104 07-Nov-2009 cegger

branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 jym-xensuspend-base nick-hppapmap-base mjf-devfs2-base
# 1.103 13-Dec-2008 ad

It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.102 01-Dec-2008 ad

PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.101 04-Aug-2008 pooka

branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment


Revision tags: simonb-wapbl-nbase simonb-wapbl-base
# 1.100 16-Jul-2008 matt

Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.99 24-Mar-2008 yamt

branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.98 23-Feb-2008 chris

Add some more missing pmap_update()s following pmap_kremove()s.


Revision tags: nick-net80211-sync-base bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.97 02-Jan-2008 ad

branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base matt-mips64-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.96 21-Jul-2007 ad

branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.


# 1.95 21-Jul-2007 ad

Merge unobtrusive locking changes from the vmlocking branch.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 thorpej-atomic-base mjf-ufs-trans-base
# 1.94 12-Mar-2007 ad

branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


Revision tags: ad-audiomp-base
# 1.93 21-Feb-2007 thorpej

branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.92 01-Nov-2006 yamt

branches: 1.92.4;
remove some __unused from function parameters.


Revision tags: yamt-splraiseipl-base2
# 1.91 12-Oct-2006 uwe

More __unused (in cpp conditionals not touched by i386).


# 1.90 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.89 05-Jul-2006 drochner

branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base simonb-timecounters-base
# 1.88 25-May-2006 yamt

branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base
# 1.87 03-May-2006 yamt

branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).


Revision tags: yamt-pdpolicy-base4
# 1.86 05-Apr-2006 yamt

uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.


Revision tags: yamt-pdpolicy-base3
# 1.85 17-Mar-2006 yamt

uvm_km_check_empty: fix an assertion.


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.84 11-Dec-2005 christos

branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base ktrace-lwp-base
# 1.83 27-Jun-2005 thorpej

branches: 1.83.2;
Use ANSI function decls.


# 1.82 29-May-2005 christos

avoid shadow variables.
remove unneeded casts.


Revision tags: kent-audio2-base
# 1.81 20-Apr-2005 simonb

Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.


# 1.80 12-Apr-2005 yamt

fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.


# 1.79 01-Apr-2005 yamt

unwrap short lines.


# 1.78 01-Apr-2005 yamt

merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.


Revision tags: netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.77 26-Feb-2005 perry

branches: 1.77.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base
# 1.76 13-Jan-2005 yamt

branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.


# 1.75 12-Jan-2005 yamt

don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.


Revision tags: kent-audio1-beforemerge
# 1.74 05-Jan-2005 yamt

km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)


# 1.73 03-Jan-2005 yamt

km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.


# 1.72 01-Jan-2005 yamt

in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.


# 1.71 01-Jan-2005 yamt

introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.


# 1.70 01-Jan-2005 yamt

for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base kent-audio1-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.69 24-Mar-2004 junyoung

Drop trailing spaces.


# 1.68 10-Feb-2004 matt

Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.


# 1.67 29-Jan-2004 yamt

- split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.


# 1.66 18-Dec-2003 pk

* Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.


# 1.65 18-Dec-2003 pk

Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.


# 1.64 28-Aug-2003 pk

When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.


# 1.63 11-Aug-2003 pk

Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.


# 1.62 10-May-2003 thorpej

branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.


# 1.61 08-May-2003 thorpej

Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.60 30-Nov-2002 bouyer

Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.


Revision tags: kqueue-aftermerge kqueue-beforemerge
# 1.59 05-Oct-2002 oster

Garbage collect some leftover (and unneeded) code. OK'ed by chs.


Revision tags: kqueue-base
# 1.58 15-Sep-2002 chs

add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.


Revision tags: gehenna-devsw-base
# 1.57 14-Aug-2002 thorpej

Don't pass VM_PROT_EXEC to pmap_kenter_pa().


Revision tags: netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base eeh-devprop-base newlock-base
# 1.56 07-Mar-2002 thorpej

branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.


Revision tags: ifpoll-base thorpej-mips-cache-base
# 1.55 10-Nov-2001 lukem

add RCSIDs, and in some cases, slightly cleanup #include order


# 1.54 07-Nov-2001 chs

only acquire the lock for swpgonly if we actually need to adjust it.


# 1.53 06-Nov-2001 chs

several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf
# 1.52 15-Sep-2001 chs

branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.


Revision tags: pre-chs-ubcperf
# 1.51 10-Sep-2001 chris

Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.


Revision tags: thorpej-devvp-base
# 1.50 26-Jun-2001 thorpej

branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.


# 1.49 02-Jun-2001 chs

replace vm_map{,_entry}_t with struct vm_map{,_entry} *.


# 1.48 26-May-2001 chs

replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.


# 1.47 25-May-2001 chs

remove trailing whitespace.


Revision tags: thorpej_scsipi_beforemerge
# 1.46 24-Apr-2001 thorpej

Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.45 12-Apr-2001 thorpej

Add a __predict_true() to an extremely common case.


# 1.44 12-Apr-2001 thorpej

In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.


# 1.43 15-Mar-2001 chs

eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>


# 1.42 14-Jan-2001 thorpej

branches: 1.42.2;
splimp() -> splvm()


# 1.41 27-Nov-2000 nisimura

Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.


# 1.40 24-Nov-2000 chs

cleanup: use queue.h macros and KASSERT().


# 1.39 13-Sep-2000 thorpej

Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.


# 1.38 24-Jul-2000 jeffs

Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.


# 1.37 27-Jun-2000 mrg

remove include of <vm/vm.h>


# 1.36 26-Jun-2000 mrg

remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.35 08-May-2000 thorpej

branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.


Revision tags: chs-ubc2-newbase
# 1.34 11-Jan-2000 chs

add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base fvdl-softdep-base
# 1.33 13-Nov-1999 thorpej

Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.


Revision tags: comdex-fall-1999-base
# 1.32 12-Sep-1999 chs

branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.


Revision tags: chs-ubc2-base
# 1.31 22-Jul-1999 thorpej

Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.


# 1.30 22-Jul-1999 thorpej

0 -> FALSE in a few places.


# 1.29 18-Jul-1999 chs

allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.


# 1.28 17-Jul-1999 thorpej

Garbage-collect uvm_km_get(); nothing actually uses it.


# 1.27 04-Jun-1999 thorpej

Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.


# 1.26 26-May-1999 thorpej

Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.


# 1.25 26-May-1999 thorpej

Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.


# 1.24 25-May-1999 thorpej

Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.


# 1.23 11-Apr-1999 chs

add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.


Revision tags: netbsd-1-4-base
# 1.22 26-Mar-1999 mycroft

branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().


# 1.21 26-Mar-1999 chs

add uvmexp.swpgonly and use it to detect out-of-swap conditions.


# 1.20 25-Mar-1999 mrg

remove now >1 year old pre-release message.


# 1.19 24-Mar-1999 cgd

after discussion with chuck, nuke pgo_attach from uvm_pagerops


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.18 18-Oct-1998 chs

branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.


# 1.17 11-Oct-1998 chuck

remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks


# 1.16 28-Aug-1998 thorpej

Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.


# 1.15 28-Aug-1998 thorpej

Add a waitok boolean argument to the VM system's pool page allocator backend.


# 1.14 13-Aug-1998 eeh

Merge paddr_t changes into the main branch.


# 1.13 09-Aug-1998 perry

bzero->memset, bcopy->memcpy, bcmp->memcmp


# 1.12 01-Aug-1998 thorpej

We need to be able to specify a uvm_object to the pool page allocator, too.


# 1.11 31-Jul-1998 thorpej

Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.


Revision tags: eeh-paddr_t-base
# 1.10 24-Jul-1998 thorpej

branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.


# 1.9 09-Jun-1998 chs

correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.


# 1.8 09-Mar-1998 mrg

KNF.


# 1.7 24-Feb-1998 chuck

be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail


# 1.6 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.5 08-Feb-1998 thorpej

Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.


# 1.4 07-Feb-1998 mrg

restore rcsids


# 1.3 07-Feb-1998 chs

convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.


# 1.2 06-Feb-1998 thorpej

RCS ID police.


# 1.1 05-Feb-1998 mrg

branches: 1.1.1;
Initial revision