Cross Reference: /freebsd-9.3-release/sys/kern/kern

History log of /freebsd-9.3-release/sys/kern/kern_malloc.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 267654	19-Jun-2014	gjb	Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-9.3-release
# 262863	06-Mar-2014	dumbbell	MFC r226824: contigmalloc(9) and contigfree(9) are now implemented in terms of other more general VM system interfaces. So, their implementation can now reside in kern_malloc.c alongside the other functions that are declared in malloc.h.
# 254083	08-Aug-2013	kib	MFC r253859: Remove unused malloc type.
# 248085	09-Mar-2013	marius	MFC: r227309 (partial) Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# 239874	29-Aug-2012	jhb	MFC 238000,239584: Honor db_pager_quit in 'show malloc', 'show uma', and 'show witness'.
# 239565	22-Aug-2012	mdf	MFC r238502: Fix a bug with memguard(9) on 32-bit architectures without a VM_KMEM_MAX_SIZE. The code was not taking into account the size of the kernel_map, which the kmem_map is allocated from, so it could produce a sub-map size too large to fit. The simplest solution is to ignore VM_KMEM_MAX entirely and base the memguard map's size off the kernel_map's size, since this is always relevant and always smaller. Found by: Justin Hibbits
# 230418	21-Jan-2012	alc	MFC r226163, r228317, and r228324 Fix the handling of an empty kmem map by sysctl_kmem_map_free(). Eliminate the possibility of 32-bit arithmetic overflow in the calculation of vm_kmem_size that may occur if the system administrator has specified a vm.vm_kmem_size tunable value that exceeds the hard cap. Eliminate stale numbers from a comment. PR: 162741
# 225736	22-Sep-2011	kensmith	Copy head to stable/9 as part of 9.0-RELEASE release cycle. Approved by: re (implicit)
# 219920	23-Mar-2011	alc	Modestly increase the maximum allowed size of the kmem map on i386. Also, express this new maximum as a fraction of the kernel's address space size rather than a constant so that increasing KVA_PAGES will automatically increase this maximum. As a side-effect of this change, kern.maxvnodes will automatically increase by a proportional amount. While I'm here ensure that this change doesn't result in an unintended increase in maxpipekva on i386. Calculate maxpipekva based upon the size of the kernel address space and the amount of physical memory instead of the size of the kmem map. The memory backing pipes is not allocated from the kmem map. It is allocated from its own submap of the kernel map. In short, it has no real connection to the kmem map. (In fact, the commit messages for the maxpipekva auto-sizing talk about using the kernel map size, cf. r117325 and r117391, even though the implementation actually used the kmem map size.) Although the calculation is now done differently, the resulting value for maxpipekva should remain almost the same on i386. However, on amd64, the value will be reduced by 2/3. This is intentional. The recent change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had the unnecessary side-effect of increasing maxpipekva. This change is effectively restoring maxpipekva on amd64 to its prior value. Eliminate init_param3() since it is no longer used.
# 217916	26-Jan-2011	mdf	Explicitly wire the user buffer rather than doing it implicitly in sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT drain for extremely large amounts of data where the caller knows that appropriate references are held, and sleeping is not an issue. Inspired by: rwatson
# 213651	09-Oct-2010	avg	add kmem_map_free sysctl: query largest contiguous free range in kmem_map Suggested by: alc Reviewed by: alc MFC after: 1 week
# 213527	07-Oct-2010	avg	vm.kmem_map_size: a sysctl to query current kmem_map->size Based on a patch from Sandvine Incorporated via emaste. Reviewed by: emaste MFC after: 1 week
# 213303	30-Sep-2010	avg	kmem_size* sysctls: hint that these are also tunables MFC after: 1 week
# 212750	16-Sep-2010	mdf	Re-add r212370 now that the LOR in powerpc64 has been resolved: Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)
# 212572	13-Sep-2010	mdf	Revert r212370, as it causes a LOR on powerpc. powerpc does a few unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn
# 212370	09-Sep-2010	mdf	Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk
# 212058	31-Aug-2010	mdf	The realloc case for memguard(9) will copy too many bytes when reallocating to a smaller-sized allocation. Fix this issue. Noticed by: alc Reviewed by: alc Approved by: zml (mentor) MFC after: 3 weeks
# 211194	11-Aug-2010	mdf	Rework memguard(9) to reserve significantly more KVA to detect use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow: - randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month
# 210564	28-Jul-2010	mdf	Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma zones for each malloc bucket size. The purpose is to isolate different malloc types into hash classes, so that any buffer overruns or use-after-free will usually only affect memory from malloc types in that hash class. This is purely a debugging tool; by varying the hash function and tracking which hash class was corrupted, the intersection of the hash classes from each instance will point to a single malloc type that is being misused. At this point inspection or memguard(9) can be used to catch the offending code. Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files. The suggestion to have this on by default came from Kostik Belousov on -arch. This code is based on work by Ron Steinke at Isilon Systems. Reviewed by: -arch (mostly silence) Reviewed by: zml Approved by: zml (mentor)
# 209390	21-Jun-2010	ed	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.
# 193490	05-Jun-2009	brian	If we're passed garbage in malloc_init(), panic() rather than expecting a KASSERT to handle it. People are likely to turn off INVARIANTS RSN and loading an old module can cause garbage-in here. I saw the issue with an older nvidia driver (x11/nvidia-driver) loading into a new kernel - a crash wasn't seen 'till sysctl_kern_malloc_stats(). I was lucky that mtp->ks_shortdesc was NULL and not something horrible. While I'm here, KASSERT that malloc_uninit() isn't passed something that's not in kmemstatistics. MFC after: 3 weeks
# 191946	09-May-2009	imp	Retire kern.vm.kmem.size. It was marked as obsolete prior to 5.2, so it can go.
# 191268	19-Apr-2009	rwatson	struct malloc_type has had a 'magic' field statically initialized to M_MAGIC by MALLOC_DEFINE() for a long time; add assertions that malloc_type's passed to malloc(), free(), etc have that magic set. MFC after: 2 weeks
# 189074	26-Feb-2009	ed	Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
# 187681	25-Jan-2009	jeff	- Make the keg abstraction more complete. Permit a zone to have multiple backend kegs so it may source compatible memory from multiple backends. This is useful for cases such as NUMA or different layouts for the same memory type. - Provide a new api for adding new backend kegs to secondary zones. - Provide a new flag for adjusting the layout of zones to stagger allocations better across cache lines. Sponsored by: Nokia
# 180308	05-Jul-2008	alc	Enable the creation of a kmem map larger than 4GB. Submitted by: Tz-Huan Huang Make several variables related to kmem map auto-sizing static. Found by: CScout
# 180262	04-Jul-2008	alc	Correct an error in the comments for init_param3(). Discussed with: silby
# 179222	22-May-2008	jb	Add support for the DTrace malloc provider which can enable probes on a per-malloc type basis.
# 178933	10-May-2008	alc	Introduce a new parameter "superpage_align" to kmem_suballoc() that is used to request superpage alignment for the submap. Request superpage alignment for the kmem_map. Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.) Remove a stale comment from kmem_malloc().
# 177253	16-Mar-2008	rwatson	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# 171064	27-Jun-2007	rwatson	Use vm_offset_t for kmembase and kmemlimit rather than char *, avoiding unnecessary casts, and making it possible to compile kern_malloc.c with strict aliasing. Submitted by: rdivacky Approved by: re (kensmith)
# 170691	14-Jun-2007	rwatson	Spell statistics more correctly in comments.
# 170170	31-May-2007	attilio	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
# 170016	27-May-2007	rwatson	Remove #if 0'd check for 0-size allocations, which if enabled, called kdb_enter().
# 169667	18-May-2007	jeff	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
# 168920	20-Apr-2007	sepotvin	Add support for specifying a minimal size for vm.kmem_size in the loader via vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc
# 163698	26-Oct-2006	rwatson	Increase usefulness of "show malloc" by moving from displaying the basic counters of allocs/frees/use for each malloc type to calculating InUse, MemUse, and Requests as displayed by the userspace vmstat -m. This is more useful when debugging malloc(9)-related memory leaks, where the count of allocs/frees may not usefully reflect that current memory allocation (i.e., when highly variable size allocations occur with the same malloc type, such as with contigmalloc). MFC after: 3 days Limitations observed by: scottl
# 160599	23-Jul-2006	rwatson	Remove old kern.malloc sysctl, which generated a text representation of the kernel malloc(9) state for vmstat -m. libmemstat is now used to generate a machine-readable version which is converged by vmstat -m into a human-readable version. Not for MFC.
# 160598	23-Jul-2006	rwatson	Expand comments for malloc(9) to better describe the design and statistics / memory types model.
# 156263	03-Mar-2006	ps	Fix bug in malloc_uninit(): Releasing items from the mt_zone can not be done by a simple uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC flag. Use uma_zfree_arg instead and supply the slab. This bug caused panics in low memory situations on unloading kernel modules containing MALLOC_DEFINE(..) statements. Submitted by: ups
# 155086	31-Jan-2006	pjd	Add buffer corruption protection (RedZone) for kernel's malloc(9). It detects both: buffer underflows and buffer overflows bugs at runtime (on free(9) and realloc(9)) and prints backtraces from where memory was allocated and from where it was freed. Tested by: kris
# 153880	30-Dec-2005	pjd	Improve memguard a bit: - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.
# 153769	27-Dec-2005	pjd	In realloc(9), determine size of the original block based on UMA_SLAB_MALLOC flag. In some circumstances (I observed it when I was doing a lot of reallocs) UMA_SLAB_MALLOC can be set even if us_keg != NULL. If this is the case we have wonderful, silent data corruption, because less data is copied to the newly allocated region than should be. I'm not sure when this bug was introduced, it could be there undetected for years now, as we don't have a lot of realloc(9) consumers and it was hard to reproduce it... ...but what I know for sure, is that I don't want to know who introduce the bug:) It took me two/three days to track it down (of course most of the time I was looking for the bug in my own code).
# 152017	03-Nov-2005	pjd	Detect memory leaks when memory type is being destroyed. This is very helpful for detecting kernel modules memory leaks on unload. Discussed and reviewed by: rwatson
# 151526	20-Oct-2005	rwatson	Change format string for u_int64_t to %ju from %llu, in order to use the correct format string on 64-bit systems. Pointed out by: pjd
# 151519	20-Oct-2005	rwatson	Add a "show malloc" command to DDB, which prints out the current stats for available kernel malloc types. Quite useful for post-mortem debugging of memory leaks without a dump device configured on a panicked box. MFC after: 2 weeks
# 148644	02-Aug-2005	ru	Long overdue, keep up with mbuf.h,v 1.148.
# 148461	27-Jul-2005	pjd	Fix the way how "InUse" column in 'vmstat -m' output works: - increase number of allocations count only on successfull malloc(9), so it doesn't confuse people; - because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;' under the check as well, as for size=0 it is of course a no-op; - avoid critical_enter()/critical_exit() in case of failure in malloc_type_allocated() as there will be nothing to do. OK'ed by: rwatson MFC after: 2 days
# 147990	14-Jul-2005	rwatson	Correct build on 64-bit: cast u_int64_t to (unsigned long long) before printfing as (unsigned long long). 32-bit build on i386 didn't notice this. Whoops. Reported by: arved Tested by: sledge
# 147984	14-Jul-2005	rwatson	Introduce a new sysctl, kern.malloc_stats, which exports kernel malloc statistics via a binary structure stream: - Add structure 'malloc_type_stream_header', which defines a stream version, definition of MAXCPUS used in the stream, and a number of malloc_type records in the stream. - Add structure 'malloc_type_header', which defines the name of the malloc type being reported on. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUS malloc_type_stats structures holding per-CPU allocation information. Typical values of MAXCPUS will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here: - Bump statistics width to uint64_t, and hard code using fixed-width type in order to be more sure about structure layout in the stream. We allocate and free a lot of memory. - Add kmemcount, a counter of the number of registered malloc types, in order to avoid excessive manual counting of types. Export via a new sysctl to allow user-space code to better size buffers. - De-XXX comment on no longer maintaining the high watermark in old sysctl monitoring code. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. Likewise, similar changes to UMA.
# 147421	16-Jun-2005	kensmith	Remove a variable that became unused as a result of changes made in v1.139. This was only exposed if MALLOC_PROFILE was defined. Submitted by: Gary Jennejohn Pointy hat: rwatson Approved by: re (scottl)
# 147265	10-Jun-2005	jkoshy	Fix typo. Reviewed by: rwatson, sam
# 146747	29-May-2005	rwatson	Kernel malloc layers malloc_type allocation over one of two underlying allocators: a set of power-of-two UMA zones for small allocations, and the VM page allocator for large allocations. In order to maintain unified statistics for specific malloc types, kernel malloc maintains a separate per-type statistics pool, which can be monitored using vmstat -m. Prior to this commit, each pool of per-type statistics was protected using a per-type mutex associated with the malloc type. This change modifies kernel malloc to maintain per-CPU statistics pools for each malloc type, and protects writing those statistics using critical sections. It also moves to unsynchronized reads of per-CPU statistics when generating coalesced statistics. To do this, several changes are implemented: - In the previous world order, the statistics memory was allocated by the owner of the malloc type structure, allocated statically using MALLOC_DEFINE(). This embedded the definition of the malloc_type structure into all kernel modules. Move to a model in which a pointer within struct malloc_type points at a UMA-allocated malloc_type_internal data structure owned and maintained by kern_malloc.c, and not part of the exported ABI/API to the rest of the kernel. For the purposes of easing a possible MFC, re-use an existing pointer in 'struct malloc_type', and maintain the current malloc_type structure size, as well as layout with respect to the fields reused outside of the malloc subsystem (such as ks_shortdesc). There are several unused fields as a result of no longer requiring the mutex in malloc_type. - Struct malloc_type_internal contains an array of malloc_type_stats, of size MAXCPU. The structure defined above avoids hard-coding a kernel compile-time value of MAXCPU into kernel modules that interact with malloc. - When accessing per-cpu statistics for a malloc type, surround read - modify - update requests with critical_enter()/critical_exit() in order to avoid races during write. The per-CPU fields are written only from the CPU that owns them. - Per-CPU stats now maintained "allocated" and "freed" counters for number of allocations/frees and bytes allocated/freed, since there is no longer a coherent global notion of the totals. When coalescing malloc stats, accept a slight race between reading stats across CPUs, and avoid showing the user a negative allocation count for the type in the event of a race. The global high watermark is no longer maintained for a malloc type, as there is no global notion of the number of allocations. - While tearing up the sysctl() path, also switch to using sbufs. The current "export as text" sysctl format is retained with the same syntax. We may want to change this in the future to export more per-CPU information, such as how allocations and frees are balanced across CPUs. This change results in a substantial speedup of kernel malloc and free paths on SMP, as critical sections (where usable) out-perform mutexes due to avoiding atomic/bus-locked operations. There is also a minor improvement on UP due to the slightly lower cost of critical sections there. The cost of the change to this approach is the loss of a continuous notion of total allocations that can be exploited to track per-type high watermarks, as well as increased complexity when monitoring statistics. Due to carefully avoiding changing the ABI, as well as hardening the ABI against future changes, it is not necessary to recompile kernel modules for this change. However, MFC'ing this change to RELENG_5 will require also MFC'ing optimizations for soft critical sections, which may modify exposed kernel ABIs. The internal malloc API is changed, and modifications to vmstat in order to restore "vmstat -m" on core dumps will follow shortly. Several improvements from: bde Statistics approach discussed with: ups Tested by: scottl, others
# 144977	12-Apr-2005	rwatson	Consistently style function declarations in kern_malloc.c. MFC after: 3 days
# 140587	21-Jan-2005	bmilekic	Bring in MemGuard, a very simple and small replacement allocator designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent. Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must: 1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10. ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary. This work is partly based on some old patches from Ian Dowse.
# 139804	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 135930	29-Sep-2004	des	Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables. MFC after: 3 days
# 132379	19-Jul-2004	green	Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.
# 131927	10-Jul-2004	marcel	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().
# 129906	31-May-2004	bmilekic	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)
# 127911	05-Apr-2004	imp	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# 125091	27-Jan-2004	des	Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. To assure backward compatibility (conditional on !BURN_BRIDGES), look it up by its old name first, and log a warning (but accept the setting) if it was found. If both the old and new name are defined, the new name takes precedence. Also export vm.kmem_size as a read-only sysctl variable; I find it hard to tune a parameter when I don't know its default value, especially when that default value is computed at boot time.
# 120216	19-Sep-2003	jeff	- Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger than this are requested very infrequently and waste memory when we cache spares.
# 117879	22-Jul-2003	phk	Revert stuff which accidentally ended up in the previous commit.
# 117878	22-Jul-2003	phk	Don't attempt to inline large functions mb_alloc() and mb_free(), it more than doubles the text size of this file. GCC has wisely ignored us on this previously
# 117391	10-Jul-2003	silby	Add init_param3() to subr_param. This function is called immediately after the kernel map has been sized, and is the optimal place for the autosizing of memory allocations which occur within the kernel map to occur. Suggested by: bde
# 116187	11-Jun-2003	ps	Don't overflow when calculating vm_kmem_size. This fixes kmem_map too small panics on PAE machines which have odd > 4GB sizes (4.5 gig would render a 20MB of KVA for kmem_map instead of 200MB). Submitted by: John Cagle <john.cagle@hp.com>, jeff Reviewed by: jeff, peter, scottl, lots of USENIX folks
# 116182	10-Jun-2003	obrien	Use __FBSDID().
# 114935	12-May-2003	phk	Don't pass NULL pointer to memset if we are compiled with DIAGNOSTIC Approved by: re/rwatson
# 114713	05-May-2003	phk	Add two KASSERTS which trigger if free(9) would drag the "memuse" statistic for a malloc bucket under zero. This typically happens if you malloc(9) from one bucket and free to another.
# 114042	25-Apr-2003	phk	Update the "last malloc failure timestamp" also for simulated malloc errors.
# 112692	26-Mar-2003	rwatson	Permit debug.malloc.failure_rate to be specified using a tunable so that the feature can be enabled during the boot process. Note the continued limitation that FreeBSD fails so rapidly with this setting enabled that it's hard to narrow down particular failures for correction; we really need per-malloc type failure rates.
# 112689	26-Mar-2003	rwatson	Add a new kernel option, MALLOC_MAKE_FAILURES, which compiles in a debugging feature causing M_NOWAIT allocations to fail at a specified rate. This can be useful for detecting poor handling of M_NOWAIT: the most frequent problems I've bumped into are unconditional deference of the pointer even though it's NULL, and hangs as a result of a lost event where memory for the event couldn't be allocated. Two sysctls are added: debug.malloc.failure_rate How often to generate a failure: if set to 0 (default), this feature is disabled. Otherwise, the frequency of failures -- I've been using 10 (one in ten mallocs fails), but other popular settings might be much lower or much higher. debug.malloc.failure_count Number of times a coerced malloc failure has occurred as a result of this feature. Useful for tracking what might have happened and whether failures are being generated. Useful possible additions: tying failure rate to malloc type, printfs indicating the thread that experienced the coerced failure. Reviewed by: jeffr, jhb
# 112066	10-Mar-2003	phk	PHCC[1]: I had commented the #ifdef INVARIANTS checks out to make sure I ran this code in all kernels and forgot to comment the #ifdefs back in before I committed. Spotted by: bmilekic [1] PHCC = Pointy Hat Correction Commit
# 112063	10-Mar-2003	phk	Make malloc and mbuf allocation mode flags nonoverlapping. Under INVARIANTS whine if we get incompatible flags. Submitted by: imp
# 111164	20-Feb-2003	bmilekic	o Allow "buckets" in mb_alloc to be differently sized (according to compile-time constants). That is, a "bucket" now is not necessarily a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ worth of mbufs, clusters. o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce {mbuf,clust}_lowm, which currently has no effect but will be used to set the low watermarks. o Fix netstat so that it can deal with the differently-sized buckets and teach it about the low watermarks too. o Make sure the per-cpu stats for an absent CPU has mb_active set to 0, explicitly. o Get rid of the allocate refcounts from mbuf map mess. Instead, just malloc() the refcounts in one shot from mbuf_init() o Clean up / update comments in subr_mbuf.c
# 111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
# 110186	01-Feb-2003	phk	Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not have M_ZERO specified with 0x70. (malloc_flags=J for the kernel :-)
# 109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 106305	01-Nov-2002	phk	Introduce malloc_last_fail() which returns the number of seconds since malloc(9) failed last time. This is intended to help code adjust memory usage to the current circumstances. A typical use could be: if (malloc_last_fail() < 60) reduce_cache_by_one();
# 103531	18-Sep-2002	jeff	- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH. - Remove all instances of the mallochash. - Stash the slab pointer in the vm page's object pointer when allocating from the kmem_obj. - Use the overloaded object pointer to find slabs for malloced memory.
# 97655	31-May-2002	robert	- Replace the bandaid introduced in revision 1.110 with a better solution. - Add braces for a ``for'' statement containing a single multi-line statement.
# 97009	20-May-2002	jake	Add a bandaid so that sysctl kern.malloc works on sparc64.
# 97005	20-May-2002	jhb	Fix the td_intr_nesting_level check to work ok if a flag like M_ZERO is passed in with M_WAITOK to malloc().
# 95931	02-May-2002	jeff	Hide a pointer to the malloc_type bucket at the end of the freed memory. If this memory is modified after it has been freed we can now report it's previous owner.
# 95923	02-May-2002	jeff	malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the mallochash. Mallochash is going to go away as soon as I introduce the kfree/kmalloc api and partially overhaul the malloc wrapper. This can't happen until all users of the malloc api that expect memory to be aligned on the size of the allocation are fixed.
# 95899	02-May-2002	jeff	Remove the temporary alignment check in free(). Implement the following checks on freed memory in the bucket path: - Slab membership - Alignment - Duplicate free This previously was only done if we skipped the buckets. This code will slow down INVARIANTS a bit, but it is smp safe. The checks were moved out of the normal path and into hooks supplied in uma_dbg.
# 95824	30-Apr-2002	jeff	Convert longs to u_longs in stats. This will hold off wrap arounds for a while longer.
# 95771	30-Apr-2002	jeff	Add a new UMA debugging facility. This will overwrite freed memory with 0xdeadc0de and then check for it just before memory is handed off as part of a new request. This will catch any post free/pre alloc modification of memory, as well as introduce errors for anything that tries to dereference it as a pointer. This code takes the form of special init, fini, ctor and dtor routines that are specificly used by malloc. It is in a seperate file because additional debugging aids will want to live here as well.
# 95766	30-Apr-2002	jeff	Move the implementation of M_ZERO into UMA so that it can be passed to uma_zalloc and friends. Remove this functionality from the malloc wrapper. Document this change in uma.h and adjust variable names in uma_core.
# 95743	29-Apr-2002	rwatson	Re-add the 16384 bucket also. Submitted by: green
# 95742	29-Apr-2002	rwatson	Revert a portion of kern_malloc.c:1.99, which (in addition to adding malloc profiling) also modified the set of pre-defined buckets for the memory allocator. For reasons unknown to me, this resulted in extensive memory corruption in the kernel, in particular on SMP boxes, so I'm committing this work-around until Jeff gets a chance to debug it properly. David Wolfskill pointed me at this commit as the one that might be a problem; I've been running this code on two dual-processor burn-in boxes for about 12 hours now, and the rate of panics due to memory corruption has dropped to zero (from one every five minutes). Hopefully not treading on the toes of: jeff
# 95319	23-Apr-2002	phk	Add a basic sanity check on pointers passed to free(9). Should be improved by: jeff
# 94730	15-Apr-2002	jeff	Finish adding support code for sysctl kern.mprof. This dumps some malloc information related to bucket size effeciency. Three things are printed on each row: Size is the size the user actually asked for rounded to 16 bytes. Requests is the number of times this size was asked for. Real Size is the size we actually handed out. At the end the total memory used and total waste is displayed. Currently my system displays about 33% wasted memory. The intent of this code is to gather statistics for tuning the malloc bucket sizes. It is not intended to be run with INVARIANTS and it is not entirely mp safe. It can be enabled via 'options MALLOC_PROFILE' which was commited earlier.
# 94729	15-Apr-2002	jeff	Remove malloc_type's ks_limit. Updated the kmemzones logic such that the ks_size bitmap can be used as an index into it to report the size of the zone used. Create the kern.malloc sysctl which replaces the kvm mechanism to report similar data. This will provide an easy place for statistics aggregation if malloc_type statistics become per cpu data. Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing the malloc buckets.
# 93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 92723	19-Mar-2002	alfred	Remove __P.
# 92654	19-Mar-2002	jeff	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
# 92194	12-Mar-2002	archie	Add realloc() and reallocf(), and make free(NULL, ...) acceptable. Reviewed by: alfred
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 81398	10-Aug-2001	jhb	- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter
# 81089	03-Aug-2001	bmilekic	Rename mb_init() mbuf subsystem initialization routine to mbuf_init(), in order to avoid namespace collision with subr_mchain.c's mb_init(). This wasn't "fatal" as the mbuf initialization routine mb_init() was local to subr_mbuf.c which in turn didn't pull in subr_mchain.c's mb_init() declaration, but it should deffinately be changed now before it creates headache.
# 81088	03-Aug-2001	jake	Remove some code that appears to have endian problems with INVARIANTS. This is #if BIG_ENDIAN, but is only necessary if malloc types are shorts, not struct malloc_type * like they are now.
# 78592	22-Jun-2001	bmilekic	Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/
# 77900	08-Jun-2001	peter	"Fix" the previous initial attempt at fixing TUNABLE_INT(). This time around, use a common function for looking up and extracting the tunables from the kernel environment. This saves duplicating the same function over and over again. This way typically has an overhead of 8 bytes + the path string, versus about 26 bytes + the path string.
# 77853	07-Jun-2001	peter	Back out part of my previous commit. This was a last minute change and I botched testing. This is a perfect example of how NOT to do this sort of thing. :-(
# 77843	06-Jun-2001	peter	Make the TUNABLE_() macros look and behave more consistantly like the SYSCTL_() macros. TUNABLE_INT_DECL() was an odd name because it didn't actually declare the int, which is what the name suggests it would do.
# 76166	01-May-2001	markm	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# 75686	18-Apr-2001	bmilekic	Fix inconsistency in setup of kernel_map: we need to make sure that we also reserve _adequate_ space for the mb_map submap; i.e. we need space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to rounddown, and not roundup, so that we are consistent. Pointed out by: bde
# 72200	09-Feb-2001	bmilekic	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 71859	31-Jan-2001	bp	Let M_PANIC go back to the private tree as its intention isn't understood well for now.
# 71799	29-Jan-2001	bp	Add M_PANIC flag to the list of available flags passed to malloc(). With this flag set malloc() will panic if memory allocation failed. This usable only in critical places where failed allocation is fatal. Reviewed by: peter
# 71707	27-Jan-2001	peter	p->p_intr_nesting_level is MI now and initialized to 0 in kern_fork.c, so it should be save to KASSERT() on it even on an arch that may not use it.
# 71501	23-Jan-2001	jhb	Don't grab Giant when calling kmem_alloc/kmem_free as this is just encouraging other people to follow the same practice. If this is going to be done, then it should be done inside of those two functions instead.
# 71337	21-Jan-2001	jake	Make intr_nesting_level per-process, rather than per-cpu. Setup interrupt threads to run with it always >= 1, so that malloc can detect M_WAITOK from "interrupt" context. This is also necessary in order to context switch from sched_ithd() directly. Reviewed By: peter
# 71320	21-Jan-2001	jasone	Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex initialization until after malloc() is safe to call, then iterate through all mutexes and complete their initialization. This change is necessary in order to avoid some circular bootstrapping dependencies.
# 70861	10-Jan-2001	jake	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
# 67384	20-Oct-2000	phk	Introduce the M_ZERO flag to malloc(9) Instead of: foo = malloc(sizeof(foo), M_WAIT); bzero(foo, sizeof(foo)); You can now (and please do) use: foo = malloc(sizeof(foo), M_WAIT \| M_ZERO); In the future this will enable us to do idle-time pre-zeroing of malloc-space.
# 67354	20-Oct-2000	jhb	- machine/mutex.h -> sys/mutex.h - Use MUTEX_DECLARE() and MTX_COLD for the malloc_mtx mutex
# 66281	22-Sep-2000	jasone	Don't #include <sys/proc.h>, since machine/mutex.h does it now.
# 65856	14-Sep-2000	jhb	Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just use struct mtx, struct witness, and struct witness_blessed. Requested by: bde
# 65710	11-Sep-2000	jasone	Add malloc_mtx to protect malloc and friends, so that they're thread-safe. Reviewed by: peter
# 65663	09-Sep-2000	jasone	Back out the addition of malloc_mtx. It was incompletely conceived, and will be done correctly in the future.
# 65649	09-Sep-2000	jasone	Add a mutex to the malloc interfaces so that it can safely be called without owning the Giant lock.
# 62248	29-Jun-2000	bp	Move #ifdef to the right place.
# 62231	29-Jun-2000	bp	If kernel compiled with INVARIANTS: On unload, remove references from freelist to memory type defined by module. Print a warning if module defines and allocate its own memory type, but didn't free it all on unload. Reviewed by: peter
# 61689	14-Jun-2000	bde	sys/malloc.h: Order the SYSINIT() for MALLOC_DEFINE() correctly so that malloc() doesn't have to waste time initializing itself. The (SI_SUB_KMEM, SI_ORDER_ANY) order was shared with syscons' SYSINIT() for scmeminit(), and scmeminit() calls malloc(), so malloc() initialization was not always complete on the first call to malloc(). kern/kern_malloc.c: - Removed self-initialization in malloc(). - Removed half-baked sanity check in free(). Trust MALLOC_DEFINE().
# 58063	14-Mar-2000	kuriyama	Print "previous type" correctly when INVARIANTS is defined. Reviewed by: current@FreeBSD.org
# 57263	16-Feb-2000	dillon	Fix null-pointer dereference crash when the system is intentionally run out of KVM through a mmap()/fork() bomb that allocates hundreds of thousands of vm_map_entry structures. Add panic to make null-pointer dereference crash a little more verbose. Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number of mmap()'d spaces (discrete vm_map_entry's in the process). The value defaults to around 9000 for a 128MB machine. The test is scaled for the number of processes sharing a vmspace (aka linux threads). Setting the value to 0 disables the feature. PR: kern/16573 Approved by: jkh
# 56720	28-Jan-2000	dg	Fixed sign and overflow bugs that caused the allocation size of the kernel malloc region (kmem_map) to be wrong and semi-random on systems with more than 1GB of RAM. This is not a complete fix, but is sufficient for machines with 4GB or less of memory. A complete fix will require some changes to the getenv stuff so that 64bit values can be passed around. NOT FIXED: machines with more than 4GB of RAM (e.g. some large Alphas) since we're still using ints to hold some of the values. Reviewed by: bde
# 53541	22-Nov-1999	shin	KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP for IPv6 yet) With this patch, you can assigne IPv6 addr automatically, and can reply to IPv6 ping. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
# 51906	03-Oct-1999	phk	Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.
# 51401	19-Sep-1999	phk	KASSERT that we cannot use M_WAITOK in interrupt context. Reviewed by: bde
# 51167	11-Sep-1999	bde	Get rid of MALLOC_INSTANTIATE and MALLOC_MAKE_TYPE(). Just handle the 3 malloc types declared in <sys/malloc.h> like other global malloc types.
# 50477	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 48579	05-Jul-1999	msmith	Move the initialisation/tuning of nmbclusters from param.c/machdep.c into uipc_mbuf.c. This reduces three sets of identical tunable code to one set, and puts the initialisation with the mbuf code proper. Make NMBUFs tunable as well. Move the nmbclusters sysctl here as well. Move the initialisation of maxsockets from param.c to uipc_socket2.c, next to its corresponding sysctl. Use the new tunable macros for the kern.vm.kmem.size tunable (this should have been in a separate commit, whoops).
# 47067	12-May-1999	bde	Fixed corruption of the kmemstatistcs list. The first malloc() with malloc type at the tail of the list changed the list from linear to circular. This seemed to cause surprisingly few problems, but it now causes weird output from `vmstat -m', probably because a more important malloc type is now at the tail of the list. Fix it by abusing ks_limit instead of ks_next as a flag for being on the list. Don't forget to clear the flag when a malloc type is uninit'ed. Uninit'ing is still fundamentally broken -- it loses history.
# 46568	06-May-1999	peter	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
# 43301	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# 43012	21-Jan-1999	msmith	Allow VM_KMEM_SIZE to be tuned from the kernel environment. This tuning value completely overrides any value precalculated by the kernel.
# 42957	21-Jan-1999	dillon	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
# 42453	09-Jan-1999	eivind	KNFize, by bde.
# 42408	08-Jan-1999	eivind	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
# 41054	10-Nov-1998	peter	Have MALLOC_DECLARE() initialize malloc types explicitly, and have them removed at module unload (if in a module of course). However; this introduces a new dependency on <sys/kernel.h> for things that use MALLOC_DECLARE(). Bruce told me it is better to add sys/kernel.h to the handful of files that need it rather than add an extra include to sys/malloc.h for kernel compiles. Updates to follow in subsequent commits.
# 40648	25-Oct-1998	phk	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.
# 38354	15-Aug-1998	bde	Use [u]intptr_t instead of [u_]long for casts between pointers and integers. Don't forget to cast to (void *) as well.
# 37951	29-Jul-1998	bde	Fixed printf format errors.
# 34266	08-Mar-1998	julian	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
# 33756	23-Feb-1998	dyson	Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden in a way identically as before.) I had problems with the system properly handling the number of vnodes when there is alot of system memory, and the default VM_KMEM_SIZE. Two new options "VM_KMEM_SIZE_SCALE" and "VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems with greater than 128MB.
# 33181	09-Feb-1998	eivind	Staticize.
# 33134	06-Feb-1998	eivind	Back out DIAGNOSTIC changes.
# 33109	05-Feb-1998	dyson	1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.
# 33108	04-Feb-1998	eivind	Turn DIAGNOSTIC into a new-style option.
# 32702	22-Jan-1998	dyson	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
# 31549	05-Dec-1997	dyson	Some fixes from John Hood: 1) Fix the initialization of malloc structure that changed due to perf opt. 2) Remove unneeded include. 3) An initialization assert added to malloc. Submitted by: John Hood <cgull@smoke.marlboro.vt.us>
# 30817	28-Oct-1997	phk	Remove the long description from the in-kernel datastructure. Put a magic field in there instead, to help catch uninitialized malloc types.
# 30354	12-Oct-1997	phk	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
# 30306	11-Oct-1997	phk	Freeing with unknown type is a panic kind of thing.
# 30299	11-Oct-1997	phk	Remove a debug printf entirely.
# 30298	11-Oct-1997	peter	Disable an extremely annoying printf.
# 30281	10-Oct-1997	phk	Rename "struct kmemstats" to "struct malloc_type" it makes more sense now. Fix type argument to hashinit() and phashinit()
# 30278	10-Oct-1997	phk	Make malloc more extensible. The malloc type is now a pointer to the struct kmemstats that describes the type. This allows subsystems to declare their malloc types locally and <sys/malloc.h> doesn't need tweaked everytime somebody gets an idea. You can even have a type local to a lkm... I don't know if we really need the longdesc, comments welcome. TODO: There is a single nit in ext2fs, that will be fixed later, and I intend to remove all unused malloc types and distribute the rest closer to their use.
# 29508	16-Sep-1997	bde	Fixed staticization. buckets[] was staticized but was still declared extern in <sys/malloc.h> and it should not have been staticized for the !(KMEMSTATS \|\| DIAGNOSTIC) case. Fixed the !(KMEMSTATS \|\| DIAGNOSTIC) case. The MALLOC() and FREE() macros are evil, but code generally doesn't allow for this and some code involving else clauses did not compile. Finished staticization.
# 29041	02-Sep-1997	bde	Removed unused #includes.
# 27899	04-Aug-1997	dyson	Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of a simple, clean zone type allocator. This new allocator will also be used for machine dependent pmap PV entries.
# 26887	24-Jun-1997	dg	Killed bogus kernacc() call in malloc() DIAGNOSTIC code. kernacc() by it's nature, locks the kernal_map, and this is deadly if kernal_map had been locked previous to a (net) interrupt.
# 22975	22-Feb-1997	peter	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 17428	04-Aug-1996	phk	The check for multiple freed items were bogus. fixed.
# 15817	18-May-1996	dyson	Minor performance improvement to kern_malloc.c that increases the probability of reuse of recently freed memory. This improves cache hit stats on cached memory, and improves at least fork speed consistancy.
# 15722	10-May-1996	wollman	Allocate mbufs from a separate submap so that NMBCLUSTERS works as expected.
# 15543	02-May-1996	phk	removed: CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei() ptei() kvtopte() ptetov() ispt() ptetoav() &c &c new: NPDEPG Major macro cleanup.
# 15538	02-May-1996	phk	First pass at cleaning up macros relating to pages, clusters and all that.
# 13703	29-Jan-1996	dg	Implement what I mentioned in rev 1.18: limit per-bucket allocations to 60% of physical memory or 60% of malloc area size, whichever is smaller.
# 13702	29-Jan-1996	dg	Fixed two bugs in the calculation of the malloc area (kmem_map) size: 1) The calculation didn't account for NMBCLUSTERS, so if a large number of clusters was specified, it would leave little or no space for kernel malloc. 2) It was bogusly restricted to v_page_count. This doesn't take into account the sparseness of the malloc area and would have caused problems on machines with small amounts of memory. It should probably instead be changed to set the malloc limit to be constrained by the amount of memory, but I didn't do this.
# 12819	14-Dec-1995	phk	A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
# 12662	07-Dec-1995	dg	Untangled the vm.h include file spaghetti.
# 12569	02-Dec-1995	bde	Finished (?) cleaning up sysinit stuff.
# 10653	09-Sep-1995	dg	Fixed init functions argument type - caddr_t -> void *. Fixed a couple of compiler warnings.
# 10358	28-Aug-1995	julian	Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular.. NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases.. certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task) The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.
# 8876	30-May-1995	rgrimes	Remove trailing whitespace.
# 7875	16-Apr-1995	dg	Make vegetarian and animal rights people happy and use 0xdeadc0de instead of 0xdeadbeef as the fill pattern. Decreased MAX_COPY to 64 (256 was a bit overzealous in most cases).
# 7170	19-Mar-1995	dg	Removed redundant newlines that were in some panic strings.
# 7009	11-Mar-1995	dg	Added some additional DIAGNOSTIC code that makes sure that freed memory addresses and types are with the valid range. Increased MAX_COPY to 256 (used to verify no freed memory use with DIAGNOSTIC).
# 6127	02-Feb-1995	dg	Calling semantics for kmem_malloc() have been changed...and the third argument is now more than just a single flag. (kern_malloc.c) Used new M_KERNEL value for socket allocations that previous were "M_NOWAIT". Note that this will change when we clean up the M_ namespace mess. Submitted by: John Dyson
# 5455	09-Jan-1995	dg	These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman
# 5131	17-Dec-1994	dg	Changed splimp to splhigh to close a potential hole that could lead to corrupted malloc data structures caused by frees occurring at other than splimp.
# 3451	09-Oct-1994	dg	Got rid of map.h. It's a leftover from the rmap code, and we use rlists. Changed swapmap into swaplist.
# 3308	02-Oct-1994	phk	All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
# 1817	02-Aug-1994	dg	Added $Id$
# 1549	25-May-1994	rgrimes	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# 1542	24-May-1994	rgrimes	This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
# 1541	24-May-1994	rgrimes	BSD 4.4 Lite Kernel Sources