#
267654 |
|
19-Jun-2014 |
gjb |
Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
240947 |
|
26-Sep-2012 |
alc |
MFC r240862 Address a race condition that was introduced in r238212. Unless the page queues lock is acquired before the page lock is released, there is no guarantee that the page will still be in that same page queue when vm_page_requeue() is called.
|
#
240760 |
|
20-Sep-2012 |
alc |
MFC r237168 The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer.
|
#
240579 |
|
16-Sep-2012 |
eadler |
MFC r240518: Correct double "the the"
Approved by: cperciva (implicit)
|
#
240149 |
|
05-Sep-2012 |
kib |
MFC r238791: Do not requeue held page or page for which locking failed, just leave them alone.
Process the act_count updates for the held pages in the vm_pageout loop over the inactive queue, instead of refusing to do anything with such page.
Clarify the intent of the addl_page_shortage counter and change its use for pages which are not processed in the loop according to the description.
|
#
240147 |
|
05-Sep-2012 |
kib |
MFC r238732 (by alc): Addendum to r238604. If the inactive queue scan isn't restarted, then the variable "addl_page_shortage_init" isn't needed.
|
#
240143 |
|
05-Sep-2012 |
kib |
MFC r238604: Do not restart scan of the inactive queue when non-inactive page is found. Rather, we shall not find such pages on inactive queue at all.
|
#
240142 |
|
05-Sep-2012 |
kib |
MFC r238212: Drop page queues mutex on each iteration of vm_pageout_scan over the inactive queue, unless busy page is found.
MFC r238258: Avoid vm page queues lock leak after r238212.
|
#
239897 |
|
30-Aug-2012 |
kib |
MFC r238180: Style.
|
#
236921 |
|
11-Jun-2012 |
kib |
MFC r235362: Assert that fictitious or unmanaged pages do not appear on active/inactive lists.
|
#
233728 |
|
31-Mar-2012 |
kib |
MFC r233100: In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag if the filesystem performed short write and we are skipping the page due to this.
Propogate write error from the pager back to the callers of vm_pageout_flush(). Report the failure to write a page from the requested range as the FALSE return value from vm_object_page_clean(), and propagate it back to msync(2) to return EIO to usermode.
While there, convert the clearobjflags variable in the vm_object_page_clean() and arguments of the helper functions to boolean.
PR: kern/165927
|
#
231023 |
|
05-Feb-2012 |
nwhitehorn |
MFC r230247:
Revert r212360 now that PowerPC can handle large sparse arguments to pmap_remove() (changed in r228412).
|
#
225736 |
|
22-Sep-2011 |
kensmith |
Copy head to stable/9 as part of 9.0-RELEASE release cycle.
Approved by: re (implicit)
|
#
225418 |
|
06-Sep-2011 |
kib |
Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced.
Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
|
#
223825 |
|
06-Jul-2011 |
trasz |
All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
|
#
223729 |
|
02-Jul-2011 |
alc |
Initialize marker pages as held rather than fictitious/wired. Marking the page as held is more useful as a safety precaution in case someone forgets to check for PG_MARKER.
Reviewed by: kib
|
#
220387 |
|
06-Apr-2011 |
trasz |
In vm_daemon(), do not skip processes stopped with SIGSTOP.
|
#
220386 |
|
06-Apr-2011 |
trasz |
Add RACCT_RSS.
Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
|
#
219968 |
|
24-Mar-2011 |
jhb |
Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.
MFC after: 1 week
|
#
219727 |
|
18-Mar-2011 |
trasz |
In vm_daemon(), when iterating over all processes in the system, skip those which are not yet fully initialized (i.e. ones with p_state == PRS_NEW). Without it, we could panic in _thread_lock_flags().
Note that there may be other instances of FOREACH_PROC_IN_SYSTEM() that require similar fix.
Reported by: pho, keramida Discussed with: kib
|
#
217478 |
|
16-Jan-2011 |
alc |
Shift responsibility for synchronizing access to the page's act_count field to the object's lock.
Reviewed by: kib@
|
#
216899 |
|
02-Jan-2011 |
alc |
Release the page lock early in vm_pageout_clean(). There is no reason to hold this lock until the end of the function.
With the aforementioned change to vm_pageout_clean(), page locks don't need to support recursive (MTX_RECURSE) or duplicate (MTX_DUPOK) acquisitions.
Reviewed by: kib
|
#
215471 |
|
18-Nov-2010 |
kib |
vm_pageout_flush() might cache the pages that finished write to the backing storage. Such pages might be then reused, racing with the assert in vm_object_page_collect_flush() that verified that dirty pages from the run (most likely, pages with VM_PAGER_AGAIN status) are write-protected still. In fact, the page indexes for the pages that were removed from the object page list should be ignored by vm_object_page_clean().
Return the length of successfully written run from vm_pageout_flush(), that is, the count of pages between requested page and first page after requested with status VM_PAGER_AGAIN. Supply the requested page index in the array to vm_pageout_flush(). Use the returned run length to forward the index of next page to clean in vm_object_page_clean().
Reported by: avg Reviewed by: alc MFC after: 1 week
|
#
212360 |
|
09-Sep-2010 |
nwhitehorn |
On architectures with non-tree-based page tables like PowerPC, every page in a range must be checked when calling pmap_remove(). Calling pmap_remove() from vm_pageout_map_deactivate_pages() with the entire range of the map could result in attempting to demap an extraordinary number of pages (> 10^15), so iterate through each map entry and unmap each of them individually.
MFC after: 6 weeks
|
#
209651 |
|
02-Jul-2010 |
alc |
Push down the acquisition of the page queues lock into vm_pageout_page_stats(). In particular, avoid acquiring the page queues lock unless iterating over the active queue.
|
#
209647 |
|
02-Jul-2010 |
alc |
With the demise of page coloring, the page queue macros no longer serve any useful purpose. Eliminate them.
Reviewed by: kib
|
#
209610 |
|
30-Jun-2010 |
alc |
Simplify entry to vm_pageout_clean(). Expect the page to be locked. Previously, the caller unlocked the page, and vm_pageout_clean() immediately reacquired the page lock. Also, assert rather than test that the page is neither busy nor held. Since vm_pageout_clean() is called with the object and page locked, the page can't have changed state since the caller verified that the page is neither busy nor held.
|
#
209407 |
|
21-Jun-2010 |
alc |
Introduce vm_page_next() and vm_page_prev(), and use them in vm_pageout_clean(). When iterating over a range of pages, these functions can be cheaper than vm_page_lookup() because their implementation takes advantage of the vm_object's memq being ordered.
Reviewed by: kib@ MFC after: 3 weeks
|
#
209173 |
|
14-Jun-2010 |
alc |
Eliminate checks for a page having a NULL object in vm_pageout_scan() and vm_pageout_page_stats(). These checks were recently introduced by the first page locking commit, r207410, but they are not needed. At the same time, eliminate some redundant accesses to the page's object field. (These accesses should have neen eliminated by r207410.)
Make the assertion in vm_page_flag_set() stricter. Specifically, only managed pages should have PG_WRITEABLE set.
Add a comment documenting an assertion to vm_page_flag_clear().
It has long been the case that fictitious pages have their wire count permanently set to one. Add comments to vm_page_wire() and vm_page_unwire() documenting this. Add assertions to these functions as well.
Update the comment describing vm_page_unwire(). Much of the old comment had little to do with vm_page_unwire(), but a lot to do with _vm_page_deactivate(). Move relevant parts of the old comment to _vm_page_deactivate().
Only pages that belong to an object can be paged out. Therefore, it is pointless for vm_page_unwire() to acquire the page queues lock and enqueue such pages in one of the paging queues. Generally speaking, such pages are immediately freed after the call to vm_page_unwire(). Previously, it was the call to vm_page_free() that reacquired the page queues lock and removed these pages from the paging queues. Now, we will never acquire the page queues lock for this case. (It is also worth noting that since both vm_page_unwire() and vm_page_free() occurred with the page locked, the page daemon never saw the page with its object field set to NULL.)
Change the panic with vm_page_unwire() to provide a more precise message.
Reviewed by: kib@
|
#
208990 |
|
10-Jun-2010 |
alc |
Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(), simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
|
#
208504 |
|
24-May-2010 |
alc |
Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed().
Reviewed by: kib (an earlier version)
|
#
207796 |
|
08-May-2010 |
alc |
Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write().
Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.)
Switch to a per-processor counter for the total number of pages cached.
|
#
207759 |
|
07-May-2010 |
jkim |
Fix a typo in the previous commit.
|
#
207752 |
|
07-May-2010 |
kib |
One more use for vm_pageout_init_marker().
Reviewed by: alc
|
#
207694 |
|
06-May-2010 |
kib |
Add a helper function vm_pageout_page_lock(), similar to tegge' vm_pageout_fallback_object_lock(), to obtain the page lock while having page queue lock locked, and still maintain the page position in a queue.
Use the helper to lock the page in the pageout daemon and contig launder iterators instead of skipping the page if its lock is contested. Skipping locked pages easily causes pagedaemon or launder to not make a progress with page cleaning.
Proposed and reviewed by: alc
|
#
207541 |
|
02-May-2010 |
alc |
Eliminate an assignment that was made redundant by r207410.
|
#
207540 |
|
02-May-2010 |
alc |
Defer the acquisition of the page and page queues locks in vm_pageout_object_deactivate_pages().
|
#
207452 |
|
30-Apr-2010 |
kmacy |
push up dropping of the page queue lock to avoid holding it in vm_pageout_flush
|
#
207448 |
|
30-Apr-2010 |
kmacy |
- don't check hold_count without the page lock held - don't leak the page lock if m->object is NULL (assuming that that check will in fact even be valid when m->object is protected by the page lock)
|
#
207410 |
|
29-Apr-2010 |
kmacy |
On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
|
#
207374 |
|
29-Apr-2010 |
alc |
Simplify the inner loop of vm_pageout_object_deactivate_pages(). Rather than checking each page for PG_UNMANAGED, check the vm object's type. Only OBJT_PHYS can have unmanaged pages. Eliminate a pointless counter. The vm object is locked, that lock is never released by the inner loop, and the set of pages contained by the vm object is not changed by the inner loop. Therefore, the counter serves no purpose.
|
#
206885 |
|
20-Apr-2010 |
alc |
Eliminate an unnecessary call to pmap_remove_all(). If a page belongs to an object whose reference count is zero, then that page cannot possibly be mapped.
|
#
206814 |
|
18-Apr-2010 |
alc |
Remove a nonsensical test from vm_pageout_clean(). A page can't be in the inactive queue and have a non-zero wire count.
Reviewed by: kib MFC after: 3 weeks
|
#
206264 |
|
06-Apr-2010 |
kib |
When OOM searches for a process to kill, ignore the processes already killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible.
This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch.
In collaboration with: pho Reviewed by: alc MFC after: 4 weeks
|
#
202529 |
|
17-Jan-2010 |
kib |
When a vnode-backed vm object is referenced, it increments the vnode reference count, and decrements it on dereference. If referenced object is deallocated, object type is reset to OBJT_DEAD. Consequently, all vnode references that are owned by object references are never released. vunref() the vnode in vm object deallocation code for OBJT_VNODE appropriate number of times to prevent leak.
Add an assertion to the vm_pageout() to make sure that we never get reference on the vnode but then do not execute code to release it.
In collaboration with: pho Reviewed by: alc MFC after: 3 weeks
|
#
195840 |
|
24-Jul-2009 |
jhb |
Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver.
Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
|
#
194806 |
|
24-Jun-2009 |
alc |
The bits set in a page's dirty mask are a subset of the bits set in its valid mask. Consequently, there is no need to perform a bit-wise and of the page's dirty and valid masks in order to determine which parts of a page are dirty and valid.
Eliminate an unnecessary #include.
|
#
192962 |
|
28-May-2009 |
alc |
Revise vm_pageout_scan()'s handling of partially dirty pages. Specifically, rather than unconditionally making partially dirty pages fully dirty, only make partially dirty pages fully dirty if the pmap says that the page has been modified.
(This change is also a small optimization. It eliminate an unnecessary call to pmap_is_modified() on pages that are mapped read only.)
Suggested by: tegge
|
#
192261 |
|
17-May-2009 |
alc |
Eliminate a pointless call to pmap_clear_reference() from vm_pageout_scan(). If the page belongs to an object with a reference count of zero, then it can't have any managed mappings on which to clear a reference bit.
|
#
191626 |
|
28-Apr-2009 |
kib |
Use the acquired reference to the vmspace instead of direct dereferencing of p->p_vmspace in a place where it was missed in r191277.
Noted by: pluknet gmail com
|
#
191277 |
|
19-Apr-2009 |
kib |
In both pageout oom handler and vm_daemon, acquire the reference to the vmspace of the examined process instead of directly accessing its vmspace, that may change. Also, as an optimization, check for P_INEXEC flag before examining the process.
Reported and tested by: pho (previous version) Reviewed by: alc MFC after: 3 week
|
#
191263 |
|
19-Apr-2009 |
alc |
Calling pmap_clear_modify() after calling pmap_remove_write() is pointless. The latter function already clears the modified status from each of the page's mappings.
|
#
185012 |
|
16-Nov-2008 |
kib |
Instead of forcing vn_start_write() to reset mp back to NULL for the failed calls with non-NULL vp, explicitely clear mp after failure.
Tested by: stass Reviewed by: tegge PR: 123768 MFC after: 1 week
|
#
183474 |
|
29-Sep-2008 |
kib |
Move the code for doing out-of-memory grass from vm_pageout_scan() into the separate function vm_pageout_oom(). Supply a parameter for vm_pageout_oom() describing a reason for the call.
Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone is exhausted.
Reviewed by: alc Tested by: pho, jhb MFC after: 2 weeks
|
#
183236 |
|
21-Sep-2008 |
alc |
Prevent an integer overflow in vm_pageout_page_stats() on machines with a large number of physical pages.
PR: 126158 Submitted by: Dmitry Tejblum MFC after: 3 days
|
#
181239 |
|
03-Aug-2008 |
trhodes |
Fill in a few sysctl descriptions.
Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc
|
#
177414 |
|
19-Mar-2008 |
alc |
Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent migration to vm/vm_page.c.
|
#
177368 |
|
19-Mar-2008 |
jeff |
- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
|
#
177253 |
|
16-Mar-2008 |
rwatson |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr.
MFC after: 1 month Discussed with: imp, rink
|
#
173918 |
|
25-Nov-2007 |
alc |
Make contigmalloc(9)'s page laundering more robust. Specifically, use vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered.
MFC after: 6 weeks
|
#
173853 |
|
22-Nov-2007 |
alc |
Add a read/write sysctl for reconfiguring the maximum number of physical pages that can be wired.
Submitted by: Eugene Grosbein PR: 114654 MFC after: 6 weeks
|
#
172317 |
|
25-Sep-2007 |
alc |
Change the management of cached pages (PQ_CACHE) in two fundamental ways:
(1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed.
Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
|
#
172207 |
|
17-Sep-2007 |
jeff |
- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM.
Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
|
#
172188 |
|
15-Sep-2007 |
alc |
Correct an assertion in vm_pageout_flush(). Specifically, if a page's status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate.
Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week
|
#
171150 |
|
02-Jul-2007 |
alc |
In the previous revision, when I replaced the unconditional acquisition of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode.
In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock.
This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object.
Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks
|
#
171048 |
|
26-Jun-2007 |
alc |
Eliminate the use of Giant from vm_daemon(). Replace the unconditional use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT().
Approved by: re (kensmith) MFC after: 3 weeks
|
#
170905 |
|
18-Jun-2007 |
alc |
Eliminate unnecessary checks from vm_pageout_clean(): The page that is passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED.
Reviewed by: tegge
|
#
170816 |
|
16-Jun-2007 |
alc |
Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map.
Approved by: re
|
#
170658 |
|
13-Jun-2007 |
alc |
Eliminate dead code: We have not performed pageouts on the kernel object in this millenium.
|
#
170517 |
|
10-Jun-2007 |
attilio |
Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments
Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)
|
#
170307 |
|
04-Jun-2007 |
jeff |
Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization.
Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
|
#
170292 |
|
04-Jun-2007 |
attilio |
Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs).
Reviewed by: alc, bde Approved by: jeff (mentor)
|
#
170170 |
|
31-May-2007 |
attilio |
Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately.
Requested by: alc Approved by: jeff (mentor)
|
#
169667 |
|
18-May-2007 |
jeff |
- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
|
#
166544 |
|
07-Feb-2007 |
alc |
Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the vm page queue free mutex instead of the vm page queue mutex.
|
#
166074 |
|
17-Jan-2007 |
delphij |
Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.
|
#
163604 |
|
22-Oct-2006 |
alc |
Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.
|
#
160889 |
|
01-Aug-2006 |
alc |
Complete the transition from pmap_page_protect() to pmap_remove_write(). Originally, I had adopted sparc64's name, pmap_clear_write(), for the function that is now pmap_remove_write(). However, this function is more like pmap_remove_all() than like pmap_clear_modify() or pmap_clear_reference(), hence, the name change.
The higher-level rationale behind this change is described in src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm trying to clean up and fix our support for execute access.
Reviewed by: marcel@ (ia64)
|
#
155790 |
|
17-Feb-2006 |
tegge |
Expand scope of marker to reduce the number of page queue scan restarts.
|
#
155784 |
|
17-Feb-2006 |
tegge |
Check return value from nonblocking call to vn_start_write().
|
#
155320 |
|
04-Feb-2006 |
alc |
Remove an unnecessary call to pmap_remove_all(). The given page is not mapped because its contents are invalid.
|
#
154889 |
|
27-Jan-2006 |
alc |
Use the new macros abstracting the page coloring/queues implementation. (There are no functional changes.)
|
#
153940 |
|
31-Dec-2005 |
netchild |
MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible)
MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's)
Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not.
Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)
|
#
152224 |
|
09-Nov-2005 |
alc |
Reimplement the reclamation of PV entries. Specifically, perform reclamation synchronously from get_pv_entry() instead of asynchronously as part of the page daemon. Additionally, limit the reclamation to inactive pages unless allocation from the PV entry zone or reclamation from the inactive queue fails. Previously, reclamation destroyed mappings to both inactive and active pages. get_pv_entry() still, however, wakes up the page daemon when reclamation occurs. The reason being that the page daemon may move some pages from the active queue to the inactive queue, making some new pages available to future reclamations.
Print the "reclaiming PV entries" message at most once per minute, but don't stop printing it after the fifth time. This way, we do not give the impression that the problem has gone away.
Reviewed by: tegge
|
#
148909 |
|
09-Aug-2005 |
tegge |
Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE due to the vm object being locked.
When a process writes large amounts of data to a file, the vm object associated with that file can contain most of the physical pages on the machine. If the process is preempted while holding the lock on the vm object, pagedaemon would be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to other vm objects.
Temporarily unlock the page queues lock while locking vm objects to avoid lock order violation. Detect and handle relevant page queue changes.
This change depends on both the lock portion of struct vm_object and normal struct vm_page being type stable.
Reviewed by: alc
|
#
139825 |
|
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
#
139779 |
|
06-Jan-2005 |
alc |
Revise the part of vm_pageout_scan() that moves pages from the cache queue to the free queue. With this change, if a page from the cache queue belongs to a locked object, it is simply skipped over rather than moved to the inactive queue.
|
#
137243 |
|
05-Nov-2004 |
alc |
During traversal of the inactive queue, try locking the page's containing object before accessing the page's flags or the object's reference count.
|
#
137168 |
|
03-Nov-2004 |
alc |
The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol.
This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.
|
#
137091 |
|
30-Oct-2004 |
alc |
During traversal of the active queue by vm_pageout_page_stats(), try locking the page's containing object before accessing the page's flags.
|
#
137060 |
|
30-Oct-2004 |
alc |
Add an assignment statement that I omitted from the previous revision.
|
#
136996 |
|
27-Oct-2004 |
alc |
During traversal of the active queue, try locking the page's containing object before accessing the page's flags or the object's reference count. If the trylock fails, handle the page as though it is busy.
|
#
132336 |
|
18-Jul-2004 |
alc |
Remove the GIANT_REQUIRED preceding pmap_remove() in vm_pageout_map_deactivate_pages().
|
#
132220 |
|
15-Jul-2004 |
alc |
Push down the acquisition and release of the page queues lock into pmap_protect() and pmap_remove(). In general, they require the lock in order to modify a page's pv list or flags. In some cases, however, pmap_protect() can avoid acquiring the lock.
|
#
132040 |
|
12-Jul-2004 |
alc |
Remove an unused and unimplemented sysctl. (For the record, it was marked as unimplemented in revision 1.129 nearly six years ago.)
|
#
131027 |
|
24-Jun-2004 |
alc |
Call vm_pageout_page_stats() with the page queues lock held.
|
#
131023 |
|
24-Jun-2004 |
alc |
Remove spl calls.
|
#
130551 |
|
15-Jun-2004 |
julian |
Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
|
#
129143 |
|
12-May-2004 |
alc |
Cache queue pages are not mapped. Thus, the pmap_remove_all() by vm_pageout_scan()'s loop for freeing cache queue pages is unnecessary.
|
#
126585 |
|
04-Mar-2004 |
bde |
Minor style fixes. In vm_daemon(), don't fetch the rss limit long before it is needed.
|
#
126088 |
|
21-Feb-2004 |
alc |
Eliminate the second, unnecessary call to pmap_page_protect() near the end of vm_pageout_flush(). Instead, assert that the page is still write protected.
Discussed with: tegge
|
#
125798 |
|
14-Feb-2004 |
alc |
- Correct a long-standing race condition in vm_page_try_to_cache() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251. - Simplify the code surrounding the fix to this same race condition in vm_pageout.c's revision 1.251. There should be no behavioral change. Reviewed by: tegge
MFC after: 7 days
|
#
125662 |
|
10-Feb-2004 |
alc |
Correct a long-standing race condition in the inactive queue scan. (See the added comment for low-level details.) The effect of this race condition is a panic "vm_page_cache: caching a dirty page, ..."
Reviewed by: tegge MFC after: 7 days
|
#
125454 |
|
04-Feb-2004 |
jhb |
Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
|
#
121455 |
|
24-Oct-2003 |
alc |
- Push down Giant from vm_pageout() to vm_pageout_scan(), freeing vm_pageout_page_stats() from Giant. - Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the vm object to be locked on entry. (All of the pager routines now expect this.)
|
#
121351 |
|
22-Oct-2003 |
alc |
- Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from vm_pageout_scan(). Rationale: I don't like leaving a busy page in the cache queue with neither the vm object nor the vm page queues lock held. - Assert that the page is active in vm_pageout_page_stats().
|
#
121321 |
|
22-Oct-2003 |
alc |
- Assert that every page found in the active queue is an active page.
|
#
121226 |
|
18-Oct-2003 |
alc |
- Increase the object lock's scope in vm_contig_launder() so that access to the object's type field and the call to vm_pageout_flush() are synchronized. - The above change allows for the eliminaton of the last parameter to vm_pageout_flush(). - Synchronize access to the page's valid field in vm_pageout_flush() using the containing object's lock.
|
#
121150 |
|
17-Oct-2003 |
alc |
- Synchronize access to a vm page's valid field using the containing vm object's lock. - Release the vm object and vm page queues locks around vput().
|
#
120217 |
|
19-Sep-2003 |
alc |
Merge vm_pageout_free_page_calc() into vm_pageout(), eliminating some unneeded code.
|
#
120150 |
|
17-Sep-2003 |
alc |
When calling vget() on a vnode-backed vm object, acquire the vnode interlock before releasing the vm object's lock.
|
#
119596 |
|
30-Aug-2003 |
alc |
- Add vm object locking to the part of vm_pageout_scan() that launders dirty pages. - Remove some unused variables.
|
#
118931 |
|
15-Aug-2003 |
alc |
Extend the scope of the page queues lock in vm_pageout_scan() to cover the traversal of the PQ_INACTIVE queue.
|
#
118390 |
|
03-Aug-2003 |
phk |
Change the layout policy of the swap_pager from a hardcoded width striping to a per device round-robin algorithm.
Because of the policy of not attempting to retain previous swap allocation on page-out, this means that a newly added swap device almost instantly takes its 1/N share of the I/O load but it takes somewhat longer for it to assume it's 1/N share of the pages if there is plenty of space on the other devices.
Change the 8G total swapspace limitation to 8G per device instead by using a per device blist rather than one global blist. This reduces the memory footprint by 75% (typically a couple hundred kilobytes) for the common case with one swapdevice but NSWAPDEV=4.
Remove the compile time constant limit of number of swap devices, there is no limit now. Instead of a fixed size array, store the per swapdev structure in a TAILQ.
Total swap space is still addressed by a 32 bit page number and therefore the upper limit is now 2^42 bytes = 16TB (for i386).
We still do not allocate the first page of each device in order to give some amount of protection to any bsdlabel at the start of the device.
A new device is appended after the existing devices in the swap space, no attempt is made to fill in holes left behind by swapoff (this can trivially be changed should it ever become a problem).
The sysctl vm.nswapdev now reflects the number of currently configured swap devices.
Rename vm_swap_size to swap_pager_avail for consistency with other exported names.
Change argument type for vm_proc_swapin_all() and swap_pager_isswapped() to be a struct swdevt pointer rather than an index.
Not changed: we are still using blists to manage the free space, but since the swapspace is no longer fragmented by the striping different resource managers might fare better.
|
#
117303 |
|
07-Jul-2003 |
alc |
- Complete the vm object locking in vm_pageout_object_deactivate_pages(). - Change vm_pageout_object_deactivate_pages()'s first parameter from a vm_map_t to a pmap_t. - Change vm_pageout_object_deactivate_pages()'s and vm_pageout_map_deactivate_pages()'s last parameter from a vm_pindex_t to a long. Since the number of pages in an address space doesn't require 64 bits on an i386, vm_pindex_t is overkill.
|
#
117038 |
|
29-Jun-2003 |
alc |
Add vm object locking to vm_pageout_map_deactivate_pages().
|
#
117001 |
|
28-Jun-2003 |
alc |
- Add vm object locking to vm_pageout_clean().
|
#
116226 |
|
11-Jun-2003 |
obrien |
Use __FBSDID().
|
#
115146 |
|
18-May-2003 |
das |
If we seem to be out of VM, don't allow the pagedaemon to kill processes in the first pass. Among other things, this will give us a chance to launder vnode-backed pages before concluding that we need more swap. This is particularly useful for systems that have no swap.
While here, update a comment and remove some long-unused code.
Reported by: Lucky Green <shamrock@cypherpunks.to> Suggested by: dillon Approved by: re (rwatson)
|
#
114649 |
|
04-May-2003 |
alc |
Avoid a lock-order reversal and implement vm_object locking in vm_pageout_page_free().
|
#
114273 |
|
30-Apr-2003 |
alc |
Eliminate an unused parameter from vm_pageout_object_deactivate_pages().
|
#
113955 |
|
24-Apr-2003 |
alc |
- Acquire the vm_object's lock when performing vm_object_page_clean(). - Add a parameter to vm_pageout_flush() that tells vm_pageout_flush() whether its caller has locked the vm_object. (This is a temporary measure to bootstrap vm_object locking.)
|
#
113869 |
|
22-Apr-2003 |
jhb |
Lock the proc to check p_flag and several other related tests in vm_daemon(). We don't need to hold sched_lock as long now as a result.
|
#
113765 |
|
20-Apr-2003 |
alc |
- Lock the vm_object when performing vm_object_pip_wakeup(). - Merge two identical cases in a switch statement.
|
#
113739 |
|
20-Apr-2003 |
alc |
- Lock the vm_object when performing vm_object_pip_add().
|
#
112881 |
|
31-Mar-2003 |
wes |
Add a facility allowing processes to inform the VM subsystem they are critical and should not be killed when pageout is looking for more memory pages in all the wrong places.
Reviewed by: arch@ Sponsored by: St. Bernard Software
|
#
112167 |
|
12-Mar-2003 |
das |
- When the VM daemon is out of swap space and looking for a process to kill, don't block on a map lock while holding the process lock. Instead, skip processes whose map locks are held and find something else to kill. - Add vm_map_trylock_read() to support the above.
Reviewed by: alc, mike (mentor)
|
#
110597 |
|
09-Feb-2003 |
alc |
Add a comment describing how pagedaemon_wakeup() should be used and synchronized.
Suggested by: tegge
|
#
110225 |
|
02-Feb-2003 |
alc |
- It's more accurate to say that vm_paging_needed() returns TRUE than a positive number. - In pagedaemon_wakeup(), set vm_pages_needed to 1 rather than incrementing it to accomplish the same.
|
#
110218 |
|
01-Feb-2003 |
alc |
- Convert vm_pageout()'s tsleep()s to msleep()s with the page queue lock.
|
#
110207 |
|
01-Feb-2003 |
alc |
- Remove (some) unnecessary explicit initializations to zero. - Style changes to vm_pageout(): declarations and white-space.
|
#
109223 |
|
14-Jan-2003 |
alc |
- Update vm_pageout_deficit using atomic operations. It's a simple counter outside the scope of existing locks. - Eliminate a redundant clearing of vm_pageout_deficit.
|
#
109216 |
|
14-Jan-2003 |
alc |
Make vm_pageout_page_free() static.
|
#
108600 |
|
03-Jan-2003 |
phk |
Avoid extern decls in .c files by putting them in the vm/swap_pager.h include file where they belong. Share the dmmax_mask variable.
|
#
108361 |
|
28-Dec-2002 |
dillon |
vm_pager_put_pages() takes VM_PAGER_* flags, not OBJPC_* flags. It just so happens that OBJPC_SYNC has the same value as VM_PAGER_PUT_SYNC so no harm done. But fix it :-)
No operational changes.
MFC after: 1 day
|
#
108011 |
|
18-Dec-2002 |
alc |
Hold the page queues lock when performing vm_page_flag_set().
|
#
107436 |
|
01-Dec-2002 |
alc |
Hold the page queues lock when calling pmap_protect(); it updates fields of the vm_page structure. Nearby, remove an unnecessary semicolon and return statement.
Approved by: re (blanket)
|
#
107433 |
|
30-Nov-2002 |
alc |
Increase the scope of the page queue lock in vm_pageout_scan().
Approved by: re (blanket)
|
#
107185 |
|
23-Nov-2002 |
alc |
Assert that the page queues lock rather than Giant is held in vm_pageout_page_free().
Approved by: re
|
#
107136 |
|
21-Nov-2002 |
jeff |
- Add an event that is triggered when the system is low on memory. This is intended to be used by significant memory consumers so that they may drain some of their caches.
Inspired by: phk Approved by: re Tested on: x86, alpha
|
#
107039 |
|
18-Nov-2002 |
alc |
Remove vm_page_protect(). Instead, use pmap_page_protect() directly.
|
#
106981 |
|
16-Nov-2002 |
alc |
Now that pmap_remove_all() is exported by our pmap implementations use it directly.
|
#
106838 |
|
13-Nov-2002 |
alc |
Move pmap_collect() out of the machine-dependent code, rename it to reflect its new location, and add page queue and flag locking.
Notes: (1) alpha, i386, and ia64 had identical implementations of pmap_collect() in terms of machine-independent interfaces; (2) sparc64 doesn't require it; (3) powerpc had it as a TODO.
|
#
106720 |
|
10-Nov-2002 |
alc |
When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than indirectly through vm_page_protect(). The one remaining page flag that is updated by vm_page_protect() is already being updated by our various pmap implementations.
Note: A later commit will similarly change the VM_PROT_READ case and eliminate vm_page_protect().
|
#
104964 |
|
12-Oct-2002 |
jeff |
- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c
Reviewed by: -arch
|
#
103925 |
|
24-Sep-2002 |
jeff |
- Get rid of the unused LK_NOOBJ.
|
#
103767 |
|
21-Sep-2002 |
jake |
Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
|
#
103216 |
|
11-Sep-2002 |
julian |
Completely redo thread states.
Reviewed by: davidxu@freebsd.org
|
#
101655 |
|
10-Aug-2002 |
alc |
o Lock page queue accesses by vm_page_activate().
|
#
100797 |
|
28-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_free(). o Increment cnt.v_dfree inside vm_pageout_page_free() rather than at each call.
|
#
100779 |
|
27-Jul-2002 |
alc |
o Require that the page queues lock is held on entry to vm_pageout_clean() and vm_pageout_flush(). o Acquire the page queues lock before calling vm_pageout_clean() or vm_pageout_flush().
|
#
100740 |
|
27-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_activate() and vm_page_deactivate() in vm_pageout_object_deactivate_pages(). o Apply some style fixes to vm_pageout_object_deactivate_pages().
|
#
100542 |
|
23-Jul-2002 |
alc |
o Extend the scope of the page queues lock in vm_pageout_scan() to cover the traversal of the cache queue.
|
#
100415 |
|
20-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_try_to_cache(). (The accesses in kern/vfs_bio.c are already locked.) o Assert that the page queues lock is held in vm_page_try_to_cache().
|
#
100413 |
|
20-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_cache() in vm_fault() and vm_pageout_scan(). (The others are already locked.) o Assert that the page queues lock is held in vm_page_cache().
|
#
100407 |
|
20-Jul-2002 |
alc |
o Lock accesses to the active page queue in vm_pageout_scan() and vm_pageout_page_stats().
|
#
99072 |
|
29-Jun-2002 |
julian |
Part 1 of KSE-III
The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
95610 |
|
28-Apr-2002 |
alc |
o Introduce and use vm_map_trylock() to replace several direct uses of lockmgr(). o Add missing synchronization to vmspace_swap_count(): Obtain a read lock on the vm_map before traversing it.
|
#
92748 |
|
20-Mar-2002 |
jeff |
Remove references to vm_zone.h and switch over to the new uma API.
|
#
92727 |
|
19-Mar-2002 |
alfred |
Remove __P.
|
#
92654 |
|
19-Mar-2002 |
jeff |
This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator.
Reviewed by: arch@
|
#
92588 |
|
18-Mar-2002 |
green |
Back out the modification of vm_map locks from lockmgr to sx locks. The best path forward now is likely to change the lockmgr locks to simple sleep mutexes, then see if any extra contention it generates is greater than removed overhead of managing local locking state information, cost of extra calls into lockmgr, etc.
Additionally, making the vm_map lock a mutex and respecting it properly will put us much closer to not needing Giant magic in vm.
|
#
92246 |
|
13-Mar-2002 |
green |
Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate. While doing this, move it earlier in the sysinit boot process so that the VM system can use it.
After that, the system is now able to use sx locks instead of lockmgr locks in the VM system. To accomplish this, some of the more questionable uses of the locks (such as testing whether they are owned or not, as well as allowing shared+exclusive recursion) are removed, and simpler logic throughout is used so locks should also be easier to understand.
This has been tested on my laptop for months, and has not shown any problems on SMP systems, either, so appears quite safe. One more user of lockmgr down, many more to go :)
|
#
92029 |
|
10-Mar-2002 |
eivind |
- Remove a number of extra newlines that do not belong here according to style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs.
Reviewed by: alc
|
#
91403 |
|
27-Feb-2002 |
silby |
Fix a horribly suboptimal algorithm in the vm_daemon.
In order to determine what to page out, the vm_daemon checks reference bits on all pages belonging to all processes. Unfortunately, the algorithm used reacted badly with shared pages; each shared page would be checked once per process sharing it; this caused an O(N^2) growth of tlb invalidations. The algorithm has been changed so that each page will be checked only 16 times.
Prior to this change, a fork/sleepbomb of 1300 processes could cause the vm_daemon to take over 60 seconds to complete, effectively freezing the system for that time period. With this change in place, the vm_daemon completes in less than a second. Any system with hundreds of processes sharing pages should benefit from this change.
Note that the vm_daemon is only run when the system is under extreme memory pressure. It is likely that many people with loaded systems saw no symptoms of this problem until they reached the point where swapping began.
Special thanks go to dillon, peter, and Chuck Cranor, who helped me get up to speed with vm internals.
PR: 33542, 20393 Reviewed by: dillon MFC after: 1 week
|
#
90935 |
|
19-Feb-2002 |
silby |
Changes to make the OOM killer much more effective:
- Allow the OOM killer to target processes currently locked in memory. These very often are the ones doing the memory hogging. - Drop the wakeup priority of processes currently sleeping while waiting for their page fault to complete. In order for the OOM killer to work well, the killed process and other system processes waiting on memory must be allowed to wakeup first.
Reviewed by: dillon MFC after: 1 week
|
#
90033 |
|
31-Jan-2002 |
dillon |
GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid buffer cache lockups for over a year now.
MFC after: 0 days
|
#
88318 |
|
20-Dec-2001 |
dillon |
Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout.
Hopefully MFC: before the 4.5 release
|
#
85272 |
|
21-Oct-2001 |
dillon |
Syntax cleanup and documentation, no operational changes.
MFC after: 1 day
|
#
84933 |
|
14-Oct-2001 |
tegge |
Don't remove all mappings of a swapped out process if the vm map contained wired entries. vm_fault_unwire() depends on the mapping being intact.
Reviewed by: dillon
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
79263 |
|
04-Jul-2001 |
dillon |
Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc). Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system.
vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)
|
#
79242 |
|
04-Jul-2001 |
dillon |
whitespace / register cleanup
|
#
79224 |
|
04-Jul-2001 |
dillon |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
#
78521 |
|
20-Jun-2001 |
jhb |
Don't lock around swap_pager_swap_init() that is only called once during the pagedaemon's startup code since it calls malloc which results in lock order reversals.
|
#
78481 |
|
19-Jun-2001 |
jhb |
Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for now. The proc locking isn't actually safe yet and won't be until the proc locking is finished.
|
#
77948 |
|
09-Jun-2001 |
dillon |
Two fixes to the out-of-swap process termination code. First, start killing processes a little earlier to avoid a deadlock. Second, when calculating the 'largest process' do not just count RSS. Instead count the RSS + SWAP used by the process. Without this the code tended to kill small inconsequential processes like, oh, sshd, rather then one of the many 'eatmem 200MB' I run on a whim :-). This fix has been extensively tested on -stable and somewhat tested on -current and will be MFCd in a few days.
Shamed into fixing this by: ps
|
#
77093 |
|
23-May-2001 |
jhb |
- Add in several asserts of vm_mtx. - Assert Giant in vm_pageout_scan() for the vnode hacking that it does. - Don't hold vm_mtx around vget() or vput(). - Lock Giant when calling vm_pageout_scan() from the pagedaemon. Also, lock curproc while setting the P_BUFEXHAUST flag. - For now we still hold Giant for all of the vm_daemon. When process limits are locked we will be only need Giant for swapout_procs().
|
#
76827 |
|
18-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
#
76773 |
|
17-May-2001 |
jhb |
During the code to pick a process to kill when memory is exhausted, keep the process in question locked as soon as we find it and determine it to be eligible until we actually kill it. To avoid deadlock, we don't block on the process lock but skip any process that is already locked during our search.
|
#
76166 |
|
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
#
74927 |
|
28-Mar-2001 |
jhb |
Convert the allproc and proctree locks from lockmgr locks to sx locks.
|
#
72200 |
|
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
#
71999 |
|
04-Feb-2001 |
phk |
Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details.
Created with: sed(1) Reviewed by: md5(1)
|
#
71572 |
|
24-Jan-2001 |
jhb |
- Catch up to proc flag changes. - Minimal proc locking. - Use queue macros.
|
#
70374 |
|
26-Dec-2000 |
dillon |
This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit.
Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things.
NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
|
#
69972 |
|
13-Dec-2000 |
tanimura |
- If swap metadata does not fit into the KVM, reduce the number of struct swblock entries by dividing the number of the entries by 2 until the swap metadata fits.
- Reject swapon(2) upon failure of swap_zone allocation.
This is just a temporary fix. Better solutions include: (suggested by: dillon)
o reserving swap in SWAP_META_PAGES chunks, and o swapping the swblock structures themselves.
Reviewed by: alfred, dillon
|
#
69947 |
|
12-Dec-2000 |
jake |
- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
|
#
69847 |
|
11-Dec-2000 |
dillon |
Be less conservative with a recently added KASSERT. Certain edge cases with file fragments and read-write mmap's can lead to a situation where a VM page has odd dirty bits, e.g. 0xFC - due to being dirtied by an mmap and only the fragment (representing a non-page-aligned end of file) synced via a filesystem buffer. A correct solution that guarentees consistent m->dirty for the file EOF case is being worked on. In the mean time we can't be so conservative in the KASSERT.
|
#
69516 |
|
02-Dec-2000 |
jhb |
Protect p_stat with sched_lock.
|
#
69022 |
|
22-Nov-2000 |
jake |
Protect the following with a lockmgr lock:
allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid
Reviewed by: jhb Obtained from: BSD/OS and netbsd
|
#
68885 |
|
18-Nov-2000 |
dillon |
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory situations prior to now.
The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode being locked.
Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op.
In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
|
#
65557 |
|
06-Sep-2000 |
jasone |
Major update to the way synchronization is done in the kernel. Highlights include:
* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.)
* Per-CPU idle processes.
* Interrupts are run in their own separate kernel threads and can be preempted (i386 only).
Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
|
#
62976 |
|
11-Jul-2000 |
mckusick |
Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed.
Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
|
#
61081 |
|
29-May-2000 |
dillon |
This is a cleanup patch to Peter's new OBJT_PHYS VM object type and sysv shared memory support for it. It implements a new PG_UNMANAGED flag that has slightly different characteristics from PG_FICTICIOUS.
A new sysctl, kern.ipc.shm_use_phys has been added to enable the use of physically-backed sysv shared memory rather then swap-backed. Physically backed shm segments are not tracked with PV entries, allowing programs which use a large shm segment as a rendezvous point to operate without eating an insane amount of KVM in the PV entry management. Read: Oracle.
Peter's OBJT_PHYS object will also allow us to eventually implement page-table sharing and/or 4MB physical page support for such segments. We're half way there.
|
#
61058 |
|
29-May-2000 |
dillon |
Fix bug in vm_pageout_page_stats() that always resulted in a full scan of the active queue. This fix is not expected to have any noticeable impact on performance.
Noticed by: Rik van Riel <riel@conectiva.com.br>
|
#
60757 |
|
21-May-2000 |
peter |
Checkpoint of a new physical memory backed object type, that does not have pv_entries. This is intended for very special circumstances, eg: a certain database that has a 1GB shm segment mapped into 300 processes. That would consume 2GB of kvm just to hold the pv_entries alone. This would not be used on systems unless the physical ram was available, as it's not pageable.
This is a work-in-progress, but is a useful and functional checkpoint. Matt has got some more fixes for it that will be committed soon.
Reviewed by: dillon
|
#
60755 |
|
21-May-2000 |
peter |
Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry.
Inspired by: John Dyson's 1998 patches.
Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha).
Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits).
Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files.
Reviewed by: dillon
|
#
58714 |
|
28-Mar-2000 |
dillon |
Misattribution - the excellent SPLASSERT work is being done by Paul Saab <paul@mu.org>, of course!
|
#
58708 |
|
27-Mar-2000 |
dillon |
Add necessary spl protection for swapper. The problem was located by Alfred while testing his SPLASSERT stuff. This is not a complete fix, more protections are probably needed.
|
#
58705 |
|
27-Mar-2000 |
charnier |
Revert spelling mistake I made in the previous commit Requested by: Alan and Bruce
|
#
58634 |
|
26-Mar-2000 |
charnier |
Spelling
|
#
54444 |
|
11-Dec-1999 |
eivind |
Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked.
This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS.
Discussed with: grog, mch, peter, phk Reviewed by: peter
|
#
52647 |
|
30-Oct-1999 |
alc |
The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to eliminate an extra (useless) level of indirection in half of the page queue accesses and (2) to use a single name for each queue throughout, instead of, e.g., "vm_page_queue_active" in some places and "vm_page_queues[PQ_ACTIVE]" in others.
Reviewed by: dillon
|
#
52635 |
|
29-Oct-1999 |
phk |
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs.
This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
#
51337 |
|
17-Sep-1999 |
dillon |
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
Replace various VM related page count calculations strewn over the VM code with inlines to aid in readability and to reduce fragility in the code where modules depend on the same test being performed to properly sleep and wakeup.
Split out a portion of the page deactivation code into an inline in vm_page.c to support vm_page_dontneed().
add vm_page_dontneed(), which handles the madvise MADV_DONTNEED feature in a related commit coming up for vm_map.c/vm_object.c. This code prevents degenerate cases where an essentially active page may be rotated through a subset of the paging lists, resulting in premature disposal.
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
50136 |
|
21-Aug-1999 |
alc |
Remove two unused variable declarations.
|
#
49937 |
|
16-Aug-1999 |
alc |
vm_pageout_clean: Remove dead code.
Submitted by: dillon
|
#
48544 |
|
03-Jul-1999 |
mckusick |
The buffer queue mechanism has been reformulated. Instead of having QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN, QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean and dirty buffers have been separated. Empty buffers with KVM assignments have been separated from truely empty buffers. getnewbuf() has been rewritten and now operates in a 100% optimal fashion. That is, it is able to find precisely the right kind of buffer it needs to allocate a new buffer, defragment KVM, or to free-up an existing buffer when the buffer cache is full (which is a steady-state situation for the buffer cache).
Buffer flushing has been reorganized. Previously buffers were flushed in the context of whatever process hit the conditions forcing buffer flushing to occur. This resulted in processes blocking on conditions unrelated to what they were doing. This also resulted in inappropriate VFS stacking chains due to multiple processes getting stuck trying to flush dirty buffers or due to a single process getting into a situation where it might attempt to flush buffers recursively - a situation that was only partially fixed in prior commits. We have added a new daemon called the buf_daemon which is responsible for flushing dirty buffers when the number of dirty buffers exceeds the vfs.hidirtybuffers limit. This daemon attempts to dynamically adjust the rate at which dirty buffers are flushed such that getnewbuf() calls (almost) never block.
The number of nbufs and amount of buffer space is now scaled past the 8MB limit that was previously imposed for systems with over 64MB of memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed somewhat. The number of physical buffers has been increased with the intention that we will manage physical I/O differently in the future.
reassignbuf previously attempted to keep the dirtyblkhd list sorted which could result in non-deterministic operation under certain conditions, such as when a large number of dirty buffers are being managed. This algorithm has been changed. reassignbuf now keeps buffers locally sorted if it can do so cheaply, and otherwise gives up and adds buffers to the head of the dirtyblkhd list. The new algorithm is deterministic but not perfect. The new algorithm greatly reduces problems that previously occured when write_behind was turned off in the system.
The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive P_BUFEXHAUST bit. This bit allows processes working with filesystem buffers to use available emergency reserves. Normal processes do not set this bit and are not allowed to dig into emergency reserves. The purpose of this bit is to avoid low-memory deadlocks.
A small race condition was fixed in getpbuf() in vm/vm_pager.c.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
|
#
48391 |
|
01-Jul-1999 |
peter |
Slight reorganization of kernel thread/process creation. Instead of using SYSINIT_KT() etc (which is a static, compile-time procedure), use a NetBSD-style kthread_create() interface. kproc_start is still available as a SYSINIT() hook. This allowed simplification of chunks of the sysinit code in the process. This kthread_create() is our old kproc_start internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work the same as the NetBSD one.
One thing I'd like to do shortly is get rid of nfsiod as a user initiated process. It makes sense for the nfs client code to create them on the fly as needed up to a user settable limit. This means that nfsiod doesn't need to be in /sbin and is always "available". This is a fair bit easier to do outside of the SYSINIT_KT() framework.
|
#
48252 |
|
26-Jun-1999 |
peter |
There isn't much point waking up a daemon that hasn't existed since softupdates came in. Try calling speedup_syncer() instead..
|
#
45960 |
|
23-Apr-1999 |
dt |
Make pmap_collect() an official pmap interface.
|
#
45365 |
|
06-Apr-1999 |
peter |
Don't forcibly kill processes that are locked in-core via PHOLD - it was just checking P_NOSWAP before.
|
#
44675 |
|
11-Mar-1999 |
julian |
Stop the mfs from trying to swap out crucial bits of the mfs as this can lead to deadlock. Submitted by: Mat dillon <dillon@freebsd.org>
|
#
44156 |
|
19-Feb-1999 |
luoqi |
Eliminate a possible numerical overflow.
|
#
44146 |
|
19-Feb-1999 |
luoqi |
Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
|
#
43752 |
|
07-Feb-1999 |
dillon |
Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with PQ_FREE. There is little operational difference other then the kernel being a few kilobytes smaller and the code being more readable.
* vm_page_select_free() has been *greatly* simplified. * The PQ_ZERO page queue and supporting structures have been removed * vm_page_zero_idle() revamped (see below)
PG_ZERO setting and clearing has been migrated from vm_page_alloc() to vm_page_free[_zero]() and will eventually be guarenteed to remain tracked throughout a page's life ( if it isn't already ).
When a page is freed, PG_ZERO pages are appended to the appropriate tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended. When locating a new free page, PG_ZERO selection operates from within vm_page_list_find() ( get page from end of queue instead of beginning of queue ) and then only occurs in the nominal critical path case. If the nominal case misses, both normal and zero-page allocation devolves into the same _vm_page_list_find() select code without any specific zero-page optimizations.
Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been added and zero-page tracking adjusted to conform with the other changes. Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free pages. We may wish to increase both parameters as time permits. The hysteresis is designed to avoid silly zeroing in borderline allocation/free situations.
|
#
43748 |
|
07-Feb-1999 |
dillon |
Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to attempt to optimize forks but were essentially given-up on due to problems and replaced with an explicit dup of the vm_map_entry structure. Prior to the removal, they were entirely unused.
|
#
43138 |
|
24-Jan-1999 |
dillon |
Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL to use the vm_page_dirty() inline.
The inline can thus do sanity checks ( or not ) over all cases.
|
#
43127 |
|
23-Jan-1999 |
dillon |
Added warning printf ( needs INVARIANTS ) when busy cache page is found while trying to free memory.
|
#
43123 |
|
23-Jan-1999 |
dillon |
It is possible for a page in the cache to be busy. vm_pageout.c was not checking for this condition while it tried to free cache pages. Fixed.
|
#
42976 |
|
21-Jan-1999 |
dillon |
Reorganized some of the low memory testing code to make it more useful.
Removed call to vm_object_collapse(), which can block. This was being called without the pageout code holding any sort of reference on the vm_object or vm_page_t structures being manipulated. Since this code can block, it was possible for other kernel code to shred the state the pageout code was assuming remained intact.
Fixed potential blocking condition in vm_pageout_page_free() ( which could cause a deadlock in a low-memory situation ).
Currently there is a hack in-place to deal with clean filesystem meta-data polluting the inactive page queue. John doesn't like the hack, and neither do I.
Revamped and commented a portion of the pageout loop.
Added protection against potential memory deadlocks with OBJT_VNODE when using VOP_ISLOCKED(). The problem is that vp->v_data can be NULL which causes VOP_ISLOCKED() to return a less informed answer.
remove vm_pager_sync() -- none of the pagers use it any more ( the old swapper used to. The new one does not ).
|
#
42957 |
|
21-Jan-1999 |
dillon |
This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
|
#
40794 |
|
31-Oct-1998 |
peter |
Add John Dyson's SYSCTL descriptions, and an export of more stats to a sysctl hierarchy (vm.stats.*). SYSCTL descriptions are only present in source, they do not get compiled into the binaries taking up memory.
|
#
40648 |
|
25-Oct-1998 |
phk |
Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.
|
#
39770 |
|
29-Sep-1998 |
abial |
Make #define NO_SWAPPING a normal kernel config option.
Reviewed by: jkh
|
#
38799 |
|
04-Sep-1998 |
dfr |
Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.
|
#
38517 |
|
24-Aug-1998 |
dfr |
Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).
Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable.
Reviewed by: Bruce Evans <bde@zeta.org.au>
|
#
38135 |
|
06-Aug-1998 |
dfr |
Protect all modifications to paging_in_progress with splvm(). The i386 managed to avoid corruption of this variable by luck (the compiler used a memory read-modify-write instruction which wasn't interruptable) but other architectures cannot.
With this change, I am now able to 'make buildworld' on the alpha (sfx: the crowd goes wild...)
|
#
37545 |
|
10-Jul-1998 |
alex |
Removed unnecessary test from if/else construct.
PR: 7233 Submitted by: Stefan Eggers <seggers@semyam.dinoco.de>
|
#
36582 |
|
02-Jun-1998 |
dyson |
Correct sleep priority.
|
#
34961 |
|
30-Mar-1998 |
phk |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences.
Reviewed by: bde
|
#
34611 |
|
15-Mar-1998 |
dyson |
Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!!
pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.)
pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances.
vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.)
vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust.
vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true.
vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.)
vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities.
vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.)
vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.)
10) Modify FFS to use vtruncbuf.
vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.)
vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly.
vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.)
vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.
|
#
34321 |
|
08-Mar-1998 |
dyson |
Quell unneeded pageout daemon activity.
|
#
34206 |
|
07-Mar-1998 |
dyson |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
#
33936 |
|
01-Mar-1998 |
dyson |
1) Use a more consistent page wait methodology. 2) Do not unnecessarily force page blocking when paging pages out. 3) Further improve swap pager performance and correctness, including fixing the paging in progress deadlock (except in severe I/O error conditions.) 4) Enable vfs_ioopt=1 as a default. 5) Fix and enable the page prezeroing in SMP mode.
All in all, SMP systems especially should show a significant improvement in "snappyness."
|
#
33784 |
|
24-Feb-1998 |
dyson |
Correct some severe VM tuning problems for small systems (<=16MB), and improve tuning on larger systems. (A couple of the VM tuning params for small systems were so badly chosen that the system could hang under load.)
The broken tuning was originaly my fault.
|
#
33758 |
|
23-Feb-1998 |
dyson |
Significantly improve the efficiency of the swap pager, which appears to have declined due to code-rot over time. The swap pager rundown code has been clean-up, and unneeded wakeups removed. Lots of splbio's are changed to splvm's. Also, set the dynamic tunables for the pageout daemon to be more sane for larger systems (thereby decreasing the daemon overheadla.)
|
#
33181 |
|
09-Feb-1998 |
eivind |
Staticize.
|
#
33134 |
|
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
#
33109 |
|
05-Feb-1998 |
dyson |
1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.
|
#
33108 |
|
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
#
32937 |
|
31-Jan-1998 |
dyson |
Change the busy page mgmt, so that when pages are freed, they MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes."
Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.
|
#
32702 |
|
22-Jan-1998 |
dyson |
VM level code cleanups.
1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
|
#
32585 |
|
17-Jan-1998 |
dyson |
Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management.
The system should be much more stable, but all subtile bugs aren't fixed yet.
|
#
32454 |
|
11-Jan-1998 |
dyson |
Fix some vnode management problems, and better mgmt of vnode free list. Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock.
PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.
|
#
32286 |
|
06-Jan-1998 |
dyson |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does.
When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex.
When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
#
32071 |
|
28-Dec-1997 |
dyson |
Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include:
1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync.
Be gentle, and please give me feedback asap.
|
#
31970 |
|
24-Dec-1997 |
dyson |
Support running with inadequate swap space. Additionally, the code will complain with a suggestion of increasing it.
|
#
31563 |
|
06-Dec-1997 |
dyson |
Support an optional, sysctl enabled feature of idle process swapout. This is apparently useful for large shell systems, or systems with long running idle processes. To enable the feature:
sysctl -w vm.swap_idle_enabled=1
Please note that some of the other vm sysctl variables have been renamed to be more accurate. Submitted by: Much of it from Matt Dillon <dillon@best.net>
|
#
31550 |
|
05-Dec-1997 |
dyson |
Add new (very useful) tunable for pageout daemon. The flag changes the maximum pageout rate:
sysctl -w vm.vm_maxlaunder=n
1 < n < inf.
If paging heavily on large systems, it is likely that a performance improvement can be achieved by increasing the parameter. On a large system, the parm is 32, but numbers as large as 128 can make a big difference. If paging is expensive, you might try decreasing the number to 1-8.
|
#
31542 |
|
04-Dec-1997 |
dyson |
Support applications that need to resist or deny use of swap space.
sysctl -w vm.defer_swap_pageouts=1 Causes the system to resist the use of swap space. In low memory conditions, performance will decrease. sysctl -w vm.disable_swap_pageouts=1 Causes the system to mostly disable the use of swap space. In low memory conditions, the system will likely start killing processes.
|
#
30701 |
|
25-Oct-1997 |
dyson |
Support garbage collecting the pmap pv entries. The management doesn't happen until the system would have nearly failed anyway, so no signficant overhead is added. This helps large systems with lots of processes.
|
#
30139 |
|
06-Oct-1997 |
dyson |
Improve management of pages moving from the inactive to active queue. Additionally, add some much needed comments.
|
#
28992 |
|
01-Sep-1997 |
bde |
Removed unused #includes.
|
#
27716 |
|
27-Jul-1997 |
dyson |
Add the ability for the pageout daemon to measure stats on memory usage before the system is out of memory. The daemon does a minimal amount of work that increases as the system becomes more likely to run out of memory and page in/out.
The default tuning is fairly low in background CPU usage, and sysctl variables have been added to enable flexable operation. This is an experimental feature that will likely be changed and improved over time.
|
#
23157 |
|
27-Feb-1997 |
bde |
Removed a wrong LK_INTERLOCK flag.
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21754 |
|
16-Jan-1997 |
dyson |
Change the map entry flags from bitfields to bitmasks. Allows for some code simplification.
|
#
21733 |
|
15-Jan-1997 |
bde |
Removed redundant spl0()'s from kernel processes. They were work-arounds for a bug in fork().
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
21530 |
|
11-Jan-1997 |
dyson |
Slightly correct the code that moves pages from the active to the inactive queue. This is only a minor performance improvement, but will not affect perf on machines that don't have ref bits.
|
#
21258 |
|
03-Jan-1997 |
dyson |
Undo the collapse breakage (swap space usage problem.)
|
#
21157 |
|
01-Jan-1997 |
dyson |
Guess what? We left alot of the old collapse code that is not needed anymore with the "full" collapse fix that we added about 1yr ago!!! The code has been removed by optioning it out for now, so we can put it back in ASAP if any problems are found.
|
#
20007 |
|
28-Nov-1996 |
dyson |
Make the kernel smaller with at worst a neutral effect on perf by de-inlining some VM calls. (Actually, I measured a small improvement.)
|
#
18526 |
|
28-Sep-1996 |
dyson |
Reviewed by: Submitted by: Obtained from:
|
#
18169 |
|
08-Sep-1996 |
dyson |
Addition of page coloring support. Various levels of coloring are afforded. The default level works with minimal overhead, but one can also enable full, efficient use of a 512K cache. (Parameters can be generated to support arbitrary cache sizes also.)
|
#
17334 |
|
30-Jul-1996 |
dyson |
Backed out the recent changes/enhancements to the VM code. The problem with the 'shell scripts' was found, but there was a 'strange' problem found with a 486 laptop that we could not find. This commit backs the code back to 25-jul, and will be re-entered after the snapshot in smaller (more easily tested) chunks.
|
#
17294 |
|
27-Jul-1996 |
dyson |
This commit is meant to solve a couple of VM system problems or performance issues.
1) The pmap module has had too many inlines, and so the object file is simply bigger than it needs to be. Some common code is also merged into subroutines. 2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls. Unfortunately, a few have needed to be added also. The removal caused the need for more vm_page_lookups. I added lookup hints to minimize the need for the page table lookup operations. 3) Removal of some bogus performance improvements, that mostly made the code more complex (tracking individual page table page updates unnecessarily). Those improvements actually hurt 386 processors perf (not that people who worry about perf use 386 processors anymore :-)). 4) Changed pv queue manipulations/structures to be TAILQ's. 5) The pv queue code has had some performance problems since day one. Some significant scalability issues are resolved by threading the pv entries from the pmap AND the physical address instead of just the physical address. This makes certain pmap operations run much faster. This does not affect most micro-benchmarks, but should help loaded system performance *significantly*. DG helped and came up with most of the solution for this one. 6) Most if not all pmap bit operations follow the pattern: pmap_test_bit(); pmap_clear_bit(); That made for twice the necessary pv list traversal. The pmap interface now supports only pmap_tc_bit type operations: pmap_[test/clear]_modified, pmap_[test/clear]_referenced. Additionally, the modified routine now takes a vm_page_t arg instead of a phys address. This eliminates a PHYS_TO_VM_PAGE operation. 7) Several rewrites of routines that contain redundant code to use common routines, so that there is a greater likelihood of keeping the cache footprint smaller.
|
#
17004 |
|
08-Jul-1996 |
dyson |
Back-off on the previous commit, specifically remove the look-ahead optimization on the active queue scan. I will do this correctly later.
|
#
17003 |
|
08-Jul-1996 |
dyson |
Fix a problem with the pageout daemon RSS limiting, where it degrades performance to LRU or worse when RSS limiting takes effect. Also, make an end condition in the active queue scan more efficient in the case where pages are removed from the active queue as a side effect of a pmap operation.
|
#
16834 |
|
29-Jun-1996 |
dg |
Make sure we have an object in the map entry before trying to trim pages from it.
|
#
16750 |
|
26-Jun-1996 |
dyson |
This commit does a couple of things: Re-enables the RSS limiting, and the routine is now tail-recursive, making it much more safe (eliminates the possiblity of kernel stack overflow.) Also, the RSS limiting is a little more intelligent about finding the likely objects that are pushing the process over the limit.
Added some sysctls that help with VM system tuning.
New sysctl features: 1) Enable/disable lru pageout algorithm. vm.pageout_algorithm = 0, default algorithm that works well, especially using X windows and heavy memory loading. Can have adverse effects, sometimes slowing down program loading.
vm.pageout_algorithm = 1, close to true LRU. Works much better than clock, etc. Does not work as well as the default algorithm in general. Certain memory "malloc" type benchmarks work a little better with this setting.
Please give me feedback on the performance results associated with these.
2) Enable/disable swapping. vm.swapping_enabled = 1, default.
vm.swapping_enabled = 0, useful for cases where swapping degrades performance.
The config option "NO_SWAPPING" is still operative, and takes precedence over the sysctl. If "NO_SWAPPING" is specified, the sysctl still exists, but "vm.swapping_enabled" is hard-wired to "0".
Each of these can be changed "on the fly."
|
#
16664 |
|
24-Jun-1996 |
dyson |
Remove RSS limiting until I rewrite the code to be non-recursive. The code can overrun the kernel stack under very stressful conditions.
|
#
16415 |
|
17-Jun-1996 |
dyson |
Several bugfixes/improvements: 1) Make it much less likely to miss a wakeup in vm_page_free_wakeup 2) Create a new entry point into pmap: pmap_ts_referenced, eliminates the need to scan the pv lists twice in many cases. Perhaps there is alot more to do here to work on minimizing pv list manipulation 3) Minor improvements to vm_pageout including the use of pmap_ts_ref. 4) Major changes and code improvement to pmap. This code has had several serious bugs in page table page manipulation. In order to simplify the problem, and hopefully solve it for once and all, page table pages are no longer "managed" with the pv list stuff. Page table pages are only (mapped and held/wired) or (free and unused) now. Page table pages are never inactive, active or cached. These changes have probably fixed the hold count problems, but if they haven't, then the code is simpler anyway for future bugfixing. 5) The pmap code has been sorely in need of re-organization, and I have taken a first (of probably many) steps. Please tell me if you have any ideas.
|
#
16026 |
|
30-May-1996 |
dyson |
This commit is dual-purpose, to fix more of the pageout daemon queue corruption problems, and to apply Gary Palmer's code cleanups. David Greenman helped with these problems also. There is still a hang problem using X in small memory machines.
|
#
15980 |
|
29-May-1996 |
dyson |
Correct some unfortunately chosen constants, otherwise, not enough pages are calculated for deferred allocation of swap pager data structures. This is a follow-on to the previous commit to this file.
|
#
15979 |
|
29-May-1996 |
dyson |
After careful review by David Greenman and myself, David had found a case where blocking can occur, thereby giving other process's a chance to modify the queue where a page resides. This could cause numerous process and system failures.
|
#
15905 |
|
26-May-1996 |
dyson |
Fix a couple of problems in the pageout_scan routine. First, there is a condition when blocking can occur, and the daemon did not check properly for a page remaining on the expected queue. Additionally, the inactive target was being set much too large for small memory machines. It is now being calculated based upon the amount of user memory available on every pageout daemon run. Another problem was that if memory was very low, the pageout daemon could fail repeatedly to traverse the inactive queue.
|
#
15889 |
|
24-May-1996 |
dyson |
Add apparently needed splvm protection to the active queue, and eliminate an unnecessary test for dirty pages if it is already known to be dirty.
|
#
15809 |
|
18-May-1996 |
dyson |
This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:
More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.
|
#
15203 |
|
11-Apr-1996 |
bde |
Fixed a spl hog. The vmdaemon process ran entirely at splhigh. It sometimes disabled clock interrupts for 60 msec or more on a P133. Clock interrupts were lost ...
Reviewed by: dyson
|
#
14865 |
|
28-Mar-1996 |
dyson |
VM performance improvements, and reorder some operations in VM fault in anticipation of a fix in pmap that will allow the mlock system call to work without panicing the system.
|
#
14531 |
|
11-Mar-1996 |
hsu |
For Lite2: proc LIST changes. Reviewed by: davidg & bde
|
#
14429 |
|
09-Mar-1996 |
dyson |
Fix a calculation for a paging parameter.
|
#
14178 |
|
22-Feb-1996 |
dg |
Add a "NO_SWAPPING" option to disable swapping. This was originally done to help diagnose a problem on wcarchive (where the kernel stack was sometimes not present), but is useful in its own right since swapping actually reduces performance on some systems (such as wcarchive). Note: swapping in this context means making the U pages pageable and has nothing to do with generic VM paging, which is unaffected by this option.
Reviewed by: <dyson>
|
#
13788 |
|
31-Jan-1996 |
dg |
Improved killproc() log message and made it and the other similar message tolerant of p_ucred being invalid. Starting using killproc() where appropriate.
|
#
13490 |
|
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
12820 |
|
14-Dec-1995 |
phk |
Another mega commit to staticize things.
|
#
12767 |
|
11-Dec-1995 |
dyson |
Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
|
#
12662 |
|
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
#
12423 |
|
20-Nov-1995 |
phk |
Remove unused vars & funcs, make things static, protoize a little bit.
|
#
12110 |
|
05-Nov-1995 |
dyson |
Greatly simplify the msync code. Eliminate complications in vm_pageout for msyncing. Remove a bug that manifests itself primarily on NFS (the dirty range on the buffers is not set on msync.)
|
#
11709 |
|
23-Oct-1995 |
dyson |
Get rid of machine-dependent NBPG and replace with PAGE_SIZE.
|
#
11317 |
|
07-Oct-1995 |
dg |
Fix argument passing to the "freeer" routine. Added some prototypes. (bde) Moved extern declaration of swap_pager_full into swap_pager.h and out of the various files that reference it. (davidg)
Submitted by: bde & davidg
|
#
11260 |
|
06-Oct-1995 |
phk |
Avoid a 64bit divide.
|
#
10653 |
|
09-Sep-1995 |
dg |
Fixed init functions argument type - caddr_t -> void *. Fixed a couple of compiler warnings.
|
#
10358 |
|
28-Aug-1995 |
julian |
Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular..
NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases..
certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task)
The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.
|
#
9507 |
|
13-Jul-1995 |
dg |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
#
9468 |
|
10-Jul-1995 |
dg |
swapout_threads() -> swapout_procs().
|
#
8876 |
|
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
#
8692 |
|
21-May-1995 |
dg |
Changes to fix the following bugs:
1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping.
Reviewed by: David Greenman Submitted by: John Dyson
|
#
8416 |
|
10-May-1995 |
dg |
Changed "handle" from type caddr_t to void *; "handle" is several different types of pointers, and "char *" is a bad choice for the type.
|
#
7904 |
|
17-Apr-1995 |
dg |
Fixed a logic bug that caused the vmdaemon to not wake up when intended.
Submitted by: John Dyson
|
#
7888 |
|
16-Apr-1995 |
dg |
Removed obsolete/unused variable declarations. Killed externs and included appropriate include files.
|
#
7883 |
|
16-Apr-1995 |
dg |
Moved some zero-initialized variables into .bss. Made code intended to be called only from DDB #ifdef DDB. Removed some completely unused globals.
|
#
7695 |
|
09-Apr-1995 |
dg |
Changes from John Dyson and myself:
Fixed remaining known bugs in the buffer IO and VM system.
vfs_bio.c: Fixed some race conditions and locking bugs. Improved performance by removing some (now) unnecessary code and fixing some broken logic. Fixed process accounting of # of FS outputs. Properly handle NFS interrupts (B_EINTR).
(various) Replaced calls to clrbuf() with calls to an optimized routine called vfs_bio_clrbuf().
(various FS sync) Sync out modified vnode_pager backed pages.
ffs_vnops.c: Do two passes: Sync out file data first, then indirect blocks.
vm_fault.c: Fixed deadly embrace caused by acquiring locks in the wrong order.
vnode_pager.c: Changed to use buffer I/O system for writing out modified pages. This should fix the problem with the modification date previous not getting updated. Also dramatically simplifies the code. Note that this is going to change in the future and be implemented via VOP_PUTPAGES().
vm_object.c: Fixed a pile of bugs related to cleaning (vnode) objects. The performance of vm_object_page_clean() is terrible when dealing with huge objects, but this will change when we implement a binary tree to keep the object pages sorted.
vm_pageout.c: Fixed broken clustering of pageouts. Fixed race conditions and other lockup style bugs in the scanning of pages. Improved performance.
|
#
7427 |
|
28-Mar-1995 |
dg |
Fixed typo...using wrong variable in page_shortage calculation.
|
#
7424 |
|
28-Mar-1995 |
dg |
Fixed "pages freed by daemon" statistic (again).
|
#
7090 |
|
16-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
#
7015 |
|
12-Mar-1995 |
dg |
Deleted vm_object_setpager().
|
#
6816 |
|
01-Mar-1995 |
dg |
Various changes from John and myself that do the following:
New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that are used to reduce the actual number of wakeups. New function vm_page_protect which is used in conjuction with some new page flags to reduce the number of calls to pmap_page_protect. Minor changes to reduce unnecessary spl nesting. Rewrote vm_page_alloc() to improve readability. Various other mostly cosmetic changes.
|
#
6709 |
|
25-Feb-1995 |
bde |
Don't use __P(()) in a function definition.
|
#
6625 |
|
22-Feb-1995 |
dg |
vm_page.c: Use request==VM_ALLOC_NORMAL rather than object!=kmem_object in deciding if the caller is "important" in vm_page_alloc(). Also established a new low threshold for non-interrupt allocations via cnt.v_interrupt_free_min.
vm_pageout.c: Various algorithmic cleanup. Some calculations simplified. Initialize cnt.v_interrupt_free_min to 2 pages.
Submitted by: John Dyson
|
#
6618 |
|
22-Feb-1995 |
dg |
Only do object paging_in_progress wakeups if someone is waiting on this condition.
Submitted by: John Dyson
|
#
6580 |
|
20-Feb-1995 |
dg |
Moved ACT_MAX, ACT_ADVANCE, and ACT_DECLINE to vm_page.h.
|
#
6357 |
|
14-Feb-1995 |
phk |
YF fix.
|
#
6258 |
|
09-Feb-1995 |
dg |
Minor algorithmic adjustments that reduce the CPU consumption of the pagedaemon in half while not reducing its effectiveness.
Submitted by: me & John
|
#
6129 |
|
02-Feb-1995 |
dg |
swap_pager.c: Fixed long standing bug in freeing swap space during object collapses. Fixed 'out of space' messages from printing out too often. Modified to use new kmem_malloc() calling convention. Implemented an additional stat in the swap pager struct to count the amount of space allocated to that pager. This may be removed at some point in the future. Minimized unnecessary wakeups.
vm_fault.c: Don't try to collect fault stats on 'swapped' processes - there aren't any upages to store the stats in. Changed read-ahead policy (again!).
vm_glue.c: Be sure to gain a reference to the process's map before swapping. Be sure to lose it when done.
kern_malloc.c: Added the ability to specify if allocations are at interrupt time or are 'safe'; this affects what types of pages can be allocated.
vm_map.c: Fixed a variety of map lock problems; there's still a lurking bug that will eventually bite.
vm_object.c: Explicitly initialize the object fields rather than bzeroing the struct. Eliminated the 'rcollapse' code and folded it's functionality into the "real" collapse routine. Moved an object_unlock() so that the backing_object is protected in the qcollapse routine. Make sure nobody fools with the backing_object when we're destroying it. Added some diagnostic code which can be called from the debugger that looks through all the internal objects and makes certain that they all belong to someone.
vm_page.c: Fixed a rather serious logic bug that would result in random system crashes. Changed pagedaemon wakeup policy (again!).
vm_pageout.c: Removed unnecessary page rotations on the inactive queue. Changed the number of pages to explicitly free to just free_reserved level.
Submitted by: John Dyson
|
#
5973 |
|
28-Jan-1995 |
dg |
Completed the fix for attempting to page out pages via the device_pager.
Submitted by: John Dyson
|
#
5841 |
|
24-Jan-1995 |
dg |
Added ability to detect sequential faults and DTRT. (swap_pager.c) Added hook for pmap_prefault() and use symbolic constant for new third argument to vm_page_alloc() (vm_fault.c, various) Changed the way that upages and page tables are held. (vm_glue.c) Fixed architectural flaw in allocating pages at interrupt time that was introduced with the merged cache changes. (vm_page.c, various) Adjusted some algorithms to acheive better paging performance and to accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c) Fixed pbuf handling problem, changed policy on handling read-behind page. (vnode_pager.c)
Submitted by: John Dyson
|
#
5464 |
|
10-Jan-1995 |
dg |
Fixed some formatting weirdness that I overlooked in the previous commit.
|
#
5455 |
|
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
#
5348 |
|
02-Jan-1995 |
ats |
Submitted by: Ben Jackson just a missing newline in a kernel printf added.
|
#
4810 |
|
25-Nov-1994 |
dg |
These changes fix a couple of lingering VM problems:
1. The pageout daemon used to block under certain circumstances, and we needed to add new functionality that would cause the pageout daemon to block more often. Now, the pageout daemon mostly just gets rid of pages and kills processes when the system is out of swap. The swapping, rss limiting and object cache trimming have been folded into a new daemon called "vmdaemon". This new daemon does things that need to be done for the VM system, but can block. For example, if the vmdaemon blocks for memory, the pageout daemon can take care of it. If the pageout daemon had blocked for memory, it was difficult to handle the situation correctly (and in some cases, was impossible).
2. The collapse problem has now been entirely fixed. It now appears to be impossible to accumulate unnecessary vm objects. The object collapsing now occurs when ref counts drop to one (where it is more likely to be more simple anyway because less pages would be out on disk.) The original fixes were incomplete in that pathological circumstances could still be contrived to cause uncontrolled growth of swap. Also, the old code still, under steady state conditions, used more swap space than necessary. When using the new code, users will generally notice a significant decrease in swap space usage, and theoretically, the system should be leaving fewer unused pages around competing for memory.
Submitted by: John Dyson
|
#
4537 |
|
17-Nov-1994 |
dg |
Don't ever try to kill off process 1 - even if we are out of swap space and it's the candidate pig.
|
#
4447 |
|
14-Nov-1994 |
dg |
Set laundry flag when transitioning an inactive page from clean to dirty. This fixes a performance bug where pages would sometimes not be paged out when they could be.
Submitted by: John Dyson
|
#
4203 |
|
06-Nov-1994 |
dg |
Added support for starting the experimental "vmdaemon" system process. Enabled via REL2_1.
Added support for doing object collapses "on the fly". Enabled via REL2_1a.
Improved object collapses so that they can happen in more cases. Improved sensing of modified pages to fix an apparant race condition and improve clustered pageout opportunities. Fixed an "oops" with not restarting page scan after a potential block in vm_pageout_clean() (not doing this can result in strange behavior in some cases).
Submitted by: John Dyson & David Greenman
|
#
3839 |
|
25-Oct-1994 |
dg |
#if 0'd out the object cache trimming code - there are multiple ways that the pageout daemon can deadlock otherwise.
Submitted by: John Dyson
|
#
3815 |
|
23-Oct-1994 |
dg |
Fixed object cache trimming policy so it actually works.
Submitted by: John Dyson
|
#
3814 |
|
23-Oct-1994 |
dg |
Adjusted reserved levels to fix a deadlock condition.
Submitted by: John Dyson
|
#
3766 |
|
22-Oct-1994 |
dg |
Various changes to allow operation without any swapspace configured. Note that this is intended for use only in floppy situations and is done at the sacrifice of performance in that case (in ther words, this is not the best solution, but works okay for this exceptional situation).
Submitted by: John Dyson
|
#
3692 |
|
18-Oct-1994 |
dg |
Fix the remaining vmmeter counters. They all now work correctly.
|
#
3612 |
|
15-Oct-1994 |
dg |
1) Some of the counters in the vmmeter struct don't fit well into the Mach VM scheme of things, so I've changed them to be more appropriate. page in/ous are now associated with the pager that did them. Nuked v_fault as the only fault of interest that wouldn't be already counted in v_trap is a VM fault, and this is counted seperately. 2) Implemented most of the remaining counters and corrected the counting of some that were done wrong. They are all almost correct now...just a few minor ones left to fix.
|
#
3567 |
|
13-Oct-1994 |
dg |
Fixed an object reference count problem that was caused by a call to vm_object_lookup() being outside of some parens. The bug was introduced via some recently added code.
Reviewed by: John Dyson
|
#
3449 |
|
08-Oct-1994 |
phk |
Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc. Reviewed by: davidg
|
#
3407 |
|
07-Oct-1994 |
phk |
Cosmetics. Unused vars and other warnings.
|
#
3373 |
|
05-Oct-1994 |
dg |
Fixed minor bug caused by some missing parens that can result in slightly reduced paging performance by missing a clustering opportunity. Found by Poul-Henning Kamp with gcc -Wall.
|
#
3347 |
|
04-Oct-1994 |
dg |
Fixed bug related to proper sensing of page modification that we inadvertantly introduced in pre-1.1.5. This could cause page modifications to go unnoticed during certain extreme low memory/high paging rate conditions.
Submitted by: John Dyson and David Greenman
|
#
2692 |
|
12-Sep-1994 |
dg |
Fixed a bug I introduced when fixing the rss limit code. Changed swapout policy to be a bit more selective about what processes get swapped out.
Reviewed by: John Dyson
|
#
2688 |
|
12-Sep-1994 |
dg |
Don't deactivate pages in 0-refcount objects. Added a couple of missing paging stats. Fixed problem with free_reserved becoming depleted during certain swap_pager operations.
Submitted by: John Dyson, with a little help from me
|
#
2521 |
|
06-Sep-1994 |
dg |
Simple changes to paging algorithms...but boy do they make a difference. FreeBSD's paging performance has never been better. Wow.
Submitted by: John Dyson
|
#
2413 |
|
30-Aug-1994 |
dg |
Fixed bug caused by change of rlimit variables to quad_t's. The bug was in using min() to calculate the minimum of rss_cur,rss_max - since these are now quad_t's and min() takes u_ints...the comparison later for exceeding the rss limit was always true - resulting in rather serious page thrashing. Now using new qmin() function for this purpose.
Fixed another bug where PG_BUSY pages would sometimes be paged out (bad!). This was caused by the PG_BUSY flag not being included in a comparison.
|
#
2112 |
|
18-Aug-1994 |
wollman |
Fix up some sloppy coding practices:
- Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
#
1887 |
|
06-Aug-1994 |
dg |
Incorporated post 1.1.5 work from John Dyson. This includes performance improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/ pmap_kremove. These routine allow fast mapping of pages for those architectures that have "normal" MMUs. Also included is a fix to the pageout daemon to properly check a queue end condition.
Submitted by: John Dyson
|
#
1827 |
|
04-Aug-1994 |
dg |
Integrated VM system improvements/fixes from FreeBSD-1.1.5.
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1810 |
|
01-Aug-1994 |
dg |
Removed all code related to the pagescan daemon, and changed 'act_count' adjustments to compensate for a world without the pagescan daemon.
|
#
1687 |
|
06-Jun-1994 |
dg |
Don't move the page's position in the active queue if it is busy or held. John has noticed some stability problems when doing this.
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|