290362 |
04-Nov-2015 |
glebius |
o Fix regressions related to SA-15:25 upgrade of NTP. [1] o Fix kqueue write events never fired for files greater 2GB. [2] o Fix kpplications exiting due to segmentation violation on a correct memory address. [3]
PR: 204046 [1] PR: 204203 [1] Errata Notice: FreeBSD-EN-15:19.kqueue [2] Errata Notice: FreeBSD-EN-15:20.vm [3] Approved by: so |
273099 |
14-Oct-2014 |
kib |
MFC r272907: Make MAP_NOSYNC handling in the vm_fault() read-locked object path compatible with write-locked path.
Approved by: re (marius) |
273007 |
12-Oct-2014 |
alc |
MFS: r272543 (r271351 on HEAD) Fix a boundary case error in vm_reserv_alloc_contig().
Approved by: re (kib) |
272461 |
03-Oct-2014 |
gjb |
Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
272221 |
27-Sep-2014 |
smh |
MFC r272071: Fix ticks wrap issue of lowmem test in vm_pageout_scan
Approved by: re (kib) Sponsored by: Multiplay
|
272202 |
27-Sep-2014 |
kib |
MFC r272036: Avoid calling vm_map_pmap_enter() for the MADV_WILLNEED on the wired entry, the pages must be already mapped.
Approved by: re (gjb)
|
271925 |
21-Sep-2014 |
kib |
MFC r271586: Fix mis-spelling of bits and types names in the vnode_pager_putpages().
Approved by: re (delphij)
|
270996 |
03-Sep-2014 |
alc |
This is a direct commit to account for the renaming of 'cnt' to 'vm_cnt' in HEAD but not stable/10.
|
270995 |
03-Sep-2014 |
alc |
MFC r270666 Back in the days when the kernel was single threaded, testing "vm_paging_target() > 0" was a reasonable way of determining if the inactive queue scan met its target. However, now that other threads can be allocating pages while the inactive queue scan is running, it's an unreliable method. The effect of it being unreliable is that we can start swapping out processes when we didn't intend to.
This issue has existed since the kernel was multithreaded, but the changes to the inactive queue target in 10.0-RELEASE have made its effects visible.
This change introduces a more direct method for determining if the inactive queue scan met its target that is not affected by the actions of other threads.
|
270920 |
01-Sep-2014 |
kib |
Fix a leak of the wired pages when unwiring of the PROT_NONE-mapped wired region. Rework the handling of unwire to do the it in batch, both at pmap and object level.
All commits below are by alc.
MFC r268327: Introduce pmap_unwire().
MFC r268591: Implement pmap_unwire() for powerpc.
MFC r268776: Implement pmap_unwire() for arm.
MFC r268806: pmap_unwire(9) man page.
MFC r269134: When unwiring a region of an address space, do not assume that the underlying physical pages are mapped by the pmap. This fixes a leak of the wired pages on the unwiring of the region mapped with no access allowed.
MFC r269339: In the implementation of the new function pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic.
MFC r269365: Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed by the combination of r268591 and r269134: When we attempt to add the wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing. (They only set the wired attribute on newly created mappings.)
MFC r269433: Handle wiring failures in vm_map_wire() with the new functions pmap_unwire() and vm_object_unwire(). Retire vm_fault_{un,}wire(), since they are no longer used.
MFC r269438: Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable "rv" is uninitialized.
MFC r269485: Retire pmap_change_wiring().
Reviewed by: alc
|
270630 |
25-Aug-2014 |
kib |
MFC r270011: Implement 'fast path' for the vm page fault handler.
MFC r270387 (by alc): Relax one of the conditions for mapping a page on the fast path.
Approved by: re (gjb)
|
270629 |
25-Aug-2014 |
kib |
MFC r261647 (by alc): Don't call vm_fault_prefault() on zero-fill faults.
|
270628 |
25-Aug-2014 |
kib |
MFC r261412 (by alc): Make prefaulting more aggressive on hard faults.
|
270627 |
25-Aug-2014 |
kib |
MFC r269978 (by alc): Avoid pointless (but harmless) actions on unmanaged pages.
|
270440 |
24-Aug-2014 |
kib |
MFC r269746: Adapt vm_page_aflag_set(PGA_WRITEABLE) to the locking of pmap_enter(PMAP_ENTER_NOSLEEP).
|
270439 |
24-Aug-2014 |
kib |
Merge the changes to pmap_enter(9) for sleep-less operation (requested by flag). The ia64 pmap.c changes are direct commit, since ia64 is removed on head.
MFC r269368 (by alc): Retire PVO_EXECUTABLE.
MFC r269728: Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused).
MFC r269759 (by alc): Update the text of a KASSERT() to reflect the changes in r269728.
MFC r269822 (by alc): Change {_,}pmap_allocpte() so that they look for the flag PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table page allocation.
MFC r270151 (by alc): Replace KASSERT that no PV list locks are held with a conditional unlock.
Reviewed by: alc Approved by: re (gjb) Sponsored by: The FreeBSD Foundation
|
270205 |
20-Aug-2014 |
kib |
MFC r269907: Fix leaks of unqueued unwired pages.
|
269915 |
13-Aug-2014 |
kib |
MFC r269643: Weaken the requirement for the vm object lock by only asserting locked object in vm_pager_page_unswapped(), instead of locked exclusively.
|
269914 |
13-Aug-2014 |
kib |
MFC r269642: Add wrappers to assert that vm object is unlocked and for try upgrade.
|
269174 |
28-Jul-2014 |
kib |
MFC r268615: Add OBJ_TMPFS_NODE flag.
MFC r268616: Set the OBJ_TMPFS_NODE flag for vm_object of VREG tmpfs node.
MFC r269053: Correct assertion. tmpfs vm object is always at the bottom of the shadow chain.
|
269072 |
24-Jul-2014 |
kib |
MFC r267213 (by alc): Add a page size field to struct vm_page.
Approved by: alc
|
267956 |
27-Jun-2014 |
kib |
MFC r267664: Assert that the new entry is inserted into the right location in the map entries list, and that it does not overlap with the previous and next entries.
|
267901 |
26-Jun-2014 |
kib |
MFC r267630: Add MAP_EXCL flag for mmap(2).
|
267899 |
26-Jun-2014 |
kib |
MFC r267766: Use correct names for the flags.
|
267772 |
23-Jun-2014 |
kib |
MFC r267254: Make mmap(MAP_STACK) search for the available address space.
MFC r267497 (by alc): Use local variable instead of sgrowsiz.
|
267751 |
22-Jun-2014 |
mav |
MFC r267391: Introduce new "256 Bucket" zone to split requests and reduce congestion on "128 Bucket" zone lock.
|
267750 |
22-Jun-2014 |
mav |
MFC r267387: Allocating new bucket for bucket zone, never take it from the zone itself, since it will almost certanly fail. Take next bigger zone instead.
This situation should not happen with original bucket zones configuration: "32 Bucket" zone uses "64 Bucket" and vice versa. But if "64 Bucket" zone lock is congested, zone may grow its bucket size and start biting itself.
|
267059 |
04-Jun-2014 |
kib |
MFC r266780: Remove the assert which can be triggered by the userspace.
|
266607 |
24-May-2014 |
kib |
MFC r266491: Remove redundand loop.
|
266591 |
23-May-2014 |
alc |
MFC r259107 Eliminate a redundant parameter to vm_radix_replace().
Improve the wording of the comment describing vm_radix_replace().
|
266589 |
23-May-2014 |
alc |
MFC r265886, r265948 With the new-and-improved vm_fault_copy_entry() (r265843), we can always avoid soft page faults when adding write access to user wired entries in vm_map_protect(). Previously, we only avoided the soft page fault when the underlying pages were copy-on-write. In other words, we avoided the pages faults that might sleep on page allocation, but not the trivial page faults to update the physical map.
On a fork allow read-only wired pages to be copy-on-write shared between the parent and child processes. Previously, we copied these pages even though they are read only. However, the reason for copying them is historical and no longer exists. In recent times, vm_map_protect() has developed the ability to copy pages when write access is added to wired copy-on-write pages. So, in this case, copy-on-write sharing of wired pages is not to be feared. It is not going to lead to copy-on-write faults on wired memory.
|
266582 |
23-May-2014 |
kib |
MFC r266464: In execve(2), postpone the free of old vmspace until the threads are resumed and exited.
|
266492 |
21-May-2014 |
pho |
MFC r265534:
msync(2) must return ENOMEM and not EINVAL when the address is outside the allowed range or when one or more pages are not mapped. This according to The Open Group Base Specifications Issue 7.
Sponsored by: EMC / Isilon storage division
|
266315 |
17-May-2014 |
alc |
MFC r265850 About 9% of the pmap_protect() calls being performed by vm_map_copy_entry() are unnecessary. Eliminate the unnecessary calls.
|
266304 |
17-May-2014 |
kib |
MFC r265843: For the upgrade case in vm_fault_copy_entry(), when the entry does not need COW and is writeable, do not create a new backing object for the entry.
MFC r265887: Fix locking.
|
266302 |
17-May-2014 |
kib |
MFC r265825: When printing the map with the ddb 'show procvm' command, do not dump page queues for the backing objects.
|
266299 |
17-May-2014 |
kib |
MFC r265824: Print the entry address in addition to the object.
|
265945 |
13-May-2014 |
alc |
MFC r265418 Prior to r254304, a separate function, vm_pageout_page_stats(), was used to periodically update the reference status of the active pages. This function was called, instead of vm_pageout_scan(), when memory was not scarce. The objective was to provide up to date reference status for active pages in case memory did become scarce and active pages needed to be deactivated.
The active page queue scan performed by vm_pageout_page_stats() was virtually identical to that performed by vm_pageout_scan(), and so r254304 eliminated vm_pageout_page_stats(). Instead, vm_pageout_scan() is called with the parameter "pass" set to zero. The intention was that when pass is zero, vm_pageout_scan() would only scan the active queue. However, the variable page_shortage can still be greater than zero when memory is not scarce and vm_pageout_scan() is called with pass equal to zero. Consequently, the inactive queue may be scanned and dirty pages laundered even though that was not intended by r254304. This revision fixes that.
|
265944 |
13-May-2014 |
alc |
MFC r260567 Correctly update the count of stuck pages, "addl_page_shortage", in vm_pageout_scan(). There were missing increments in two less common cases.
Don't conflate the count of stuck pages and the pageout deficit provided by vm_page_alloc{,_contig}().
Handle held pages consistently in the inactive queue scan. In the more common case, we did not move the page to the tail of the queue. Whereas, in the less common case, we did. There's no particular reason to move the page in the less common case, so remove it.
Perform the calculation of the page shortage for the active queue scan a little earlier, before the active queue lock is acquired. The correctness of this calculation doesn't depend on the active queue lock being held.
Eliminate a redundant variable, "pcount". Use the more descriptive variable, "maxscan", in its place.
Apply a few nearby style fixes, e.g., eliminate stray whitespace and excess parentheses.
|
265932 |
12-May-2014 |
des |
MFH (r264966): add sysctl OIDs for actual swap zone size and capacity
|
265435 |
06-May-2014 |
kib |
MFC r265100: Fix the comparision for the end of range in vm_phys_fictitious_reg_range().
|
265311 |
04-May-2014 |
kib |
MFC r265002: Fix vm_fault_copy_entry() operation on upgrade; allow it to find the pages in the shadowed objects.
|
263875 |
28-Mar-2014 |
kib |
MFC r263475: Fix two issues with /dev/mem access on amd64, both causing kernel page faults.
First, for accesses to direct map region should check for the limit by which direct map is instantiated.
Second, for accesses to the kernel map, use a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing.
MFC r263498: Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c.
|
263684 |
24-Mar-2014 |
kib |
MFC r263471: Initialize vm_map_entry member wiring_thread on the map entry creation.
|
263360 |
19-Mar-2014 |
kib |
MFC r263095: Initialize paddr to handle the case of zero size.
|
263359 |
19-Mar-2014 |
kib |
MFC r263092: Do not vdrop() the tmpfs vnode until it is unlocked. The hold reference might be the last, and then vdrop() would free the vnode.
|
262739 |
04-Mar-2014 |
glebius |
Merge r261722, r261723, r261724, r261725 from head: several minor improvements for UMA_ZPCPU_ZONE zones.
|
262737 |
04-Mar-2014 |
glebius |
Merge 261593 from head: Provide macros that allow easily export uma(9) zone limits and current usage via sysctl(9).
|
262291 |
21-Feb-2014 |
attilio |
MFC r261867: Use the right index to free swapspace after vm_page_rename().
|
262127 |
17-Feb-2014 |
dim |
MFC r261896:
After r251709, avoid a clang 3.4 warning about an unused static const variable (uma_max_ipers), when asserts are disabled.
Reviewed by: glebius
|
261999 |
16-Feb-2014 |
marcel |
MFC r259908: For ia64, use pmap_remove_pages() and not pmap_remove().
|
260306 |
04-Jan-2014 |
mav |
MFC r258716: - Add bucket size column to `show uma` DDB command. - Add `show umacache` command to show alike stats for cache-only UMA zones.
|
260305 |
04-Jan-2014 |
mav |
MFC r258693: Make UMA to not blindly force offpage slab header allocation for large (> PAGE_SIZE) zones. If zone is not multiple to PAGE_SIZE, there may be enough space for the header at the last page, so we may avoid extra header memory allocation and hash table update/lookup.
ZFS creates bunch of odd-sized UMA zones (5120, 6144, 7168, 10240, 14336). This change gives good use to at least some of otherwise lost memory there.
|
260304 |
04-Jan-2014 |
mav |
MFC r258691: Don't count bucket allocation failures for UMA zones as their own failures. There are good reasons for this to happen, such as recursion prevention, etc. and they are not fatal since buckets are just an optimization mechanism. Real bucket allocation failures are any way counted by the bucket zones themselves, and we don't need double accounting there.
|
260303 |
04-Jan-2014 |
mav |
MFC r258340, r258497: Implement mechanism to safely but slowly purge UMA per-CPU caches.
This is a last resort for very low memory condition in case other measures to free memory were ineffective. Sequentially cycle through all CPUs and extract per-CPU cache buckets into zone cache from where they can be freed.
|
260302 |
04-Jan-2014 |
mav |
MFC r258338: Grow UMA zone bucket size also on lock congestion during item free.
Lock congestion is the same, whether it happens on alloc or free, so handle it equally. Now that we have back pressure, there is no problem to grow buckets a bit faster. Any way growth is much slower then in 9.x.
|
260301 |
04-Jan-2014 |
mav |
MFC r258337: Add two new UMA bucket zones to store 3 and 9 items per bucket.
These new buckets make bucket size self-tuning more soft and precise. Without them there are buckets for 1, 5, 13, 29, ... items. While at bigger sizes difference about 2x is fine, at smallest ones it is 5x and 2.6x respectively. New buckets make that line look like 1, 3, 5, 9, 13, 29, reducing jumps between steps, making algorithm work softer, allocating and freeing memory in better fitting chunks. Otherwise there is quite a big gap between allocating 128K and 5x128K of RAM at once.
|
260300 |
04-Jan-2014 |
mav |
MFC r258336: Implement soft pressure on UMA cache bucket sizes.
Every time system detects low memory condition decrease bucket sizes for each zone by one item. As result, higher memory pressure will push to smaller bucket sizes and so smaller per-CPU caches and so more efficient memory use.
Before this change there was no force to oppose buckets growth as result of practically inevitable zone lock conflicts, and after some run time per-CPU caches could consume enough RAM to kill the system.
|
260280 |
04-Jan-2014 |
glebius |
Merge r258690 by mav from head: Fix bug introduced at r252226, when udata argument passed to bucket_alloc() was used without making sure first that it was really passed for us.
On some of my systems this bug made user argument passed by ZFS code to uma_zalloc_arg() unexpectedly block UMA per-CPU caches for those zones.
|
260081 |
30-Dec-2013 |
kib |
MFC r259951: Do not coalesce stack entry. Pass MAP_STACK_GROWS_DOWN and MAP_STACK_GROWS_UP flags to vm_map_insert() from vm_map_stack()
|
259991 |
28-Dec-2013 |
dim |
MFC r259893:
In sys/vm/vm_pageout.c, since vm_pageout_worker() takes a void * as argument, cast the incoming 0 argument to void *, to silence a warning from clang 3.4 ("expression which evaluates to zero treated as a null pointer constant of type 'void *' [-Wnon-literal-null-conversion]").
|
259499 |
17-Dec-2013 |
kib |
MFC r258039: Avoid overflow for the page counts.
MFC r258365: Revert back to use int for the page counts. Rearrange the checks to correctly handle overflowing address arithmetic.
|
259299 |
13-Dec-2013 |
kib |
MFC r258367: Verify for zero-length requests and act as if it is always successfull without performing any action on the address space.
|
259297 |
13-Dec-2013 |
kib |
MFC r258366: Add assertions to cover all places in the wiring and unwiring code where MAP_ENTRY_IN_TRANSITION is set or cleared.
|
259296 |
13-Dec-2013 |
kib |
MFC r257899: If filesystem declares that it supports shared locking for writes, use shared vnode lock for VOP_PUTPAGES() as well.
|
258911 |
04-Dec-2013 |
rodrigc |
MFC r258737
In keg_dtor(), print out the keg name in the "Freed UMA keg was not empty" message printed to the console. This makes it easier to track down the source of certain memory leaks.
Suggested by: adrian Approved by: re (gjb)
|
258037 |
12-Nov-2013 |
kib |
MFC r257680: Do not coalesce if the swap object belongs to tmpfs vnode.
Approved by: re (glebius)
|
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
256275 |
10-Oct-2013 |
alc |
Tidy up the output of "sysctl vm.phys_free".
Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division
|
255793 |
22-Sep-2013 |
alc |
Both the vm_map and vmspace zones are defined as "no free". So, there is no point in defining a fini function for these zones.
Reviewed by: kib Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division
|
255732 |
20-Sep-2013 |
neel |
Merge the following changes from projects/bhyve_npt_pmap: - add fields to 'struct pmap' that are required to manage nested page tables. - add a parameter to 'vmspace_alloc()' that can be used to override the default pmap initialization routine 'pmap_pinit()'.
These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap' in anticipation of the upcoming KBI freeze for 10.0.
Reviewed by: kib@, alc@ Approved by: re (glebius)
|
255724 |
20-Sep-2013 |
alc |
The pmap function pmap_clear_reference() is no longer used. Remove it.
pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists.
Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division
|
255708 |
19-Sep-2013 |
jhb |
Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes.
Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
|
255626 |
17-Sep-2013 |
kib |
PG_SLAB no longer serves a useful purpose, since m->object is no longer abused to store pointer to slab. Remove it.
Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (hrs)
|
255608 |
16-Sep-2013 |
kib |
Remove zero-copy sockets code. It only worked for anonymous memory, and the equivalent functionality is now provided by sendfile(2) over posix shared memory filedescriptor.
Remove the cow member of struct vm_page, and rearrange the remaining members. While there, make hold_count unsigned.
Requested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (delphij)
|
255566 |
14-Sep-2013 |
kib |
If the last page of the file is partially full and whole valid portion is invalidated, invalidate the whole page. Otherwise, partially valid page appears on a page queue, which is wrong. This could only happen for the last page, because only then buffer which triggered invalidation could not cover the whole page.
Reported and tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (delphij) MFC after: 2 weeks
|
255497 |
12-Sep-2013 |
jhb |
Fix an off-by-one error when populating mincore(2) entries for skipped entries. lastvecindex references the last valid byte, so the new bytes should come after it.
Approved by: re (kib) MFC after: 1 week
|
255426 |
09-Sep-2013 |
jhb |
Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux.
To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE.
Reviewed by: alc Approved by: re (kib)
|
255396 |
08-Sep-2013 |
kib |
Drain for the xbusy state for two places which potentially do pmap_remove_all(). Not doing the drain allows the pmap_enter() to proceed in parallel, making the pmap_remove_all() effects void.
The race results in an invalidated page mapped wired by usermode.
Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (glebius)
|
255244 |
05-Sep-2013 |
kib |
The vm_page_trysbusy() should not fail when shared busy counter or VPB_BIT_WAITERS flag were changed between reading of busy_lock and the cas. The vm_page_sbusy(), which is the only user of vm_page_trysbusy() in the tree, panics on the failure, which in these cases is transient and do not mean that the current page state prevents sbusying.
Retry the operation inside vm_page_trysbusy() if cas failed, only return a failure when VPB_BIT_SHARED is cleared.
Reported and tested by: pho Reviewed by: attilio Sponsored by: The FreeBSD Foundation
|
255219 |
05-Sep-2013 |
pjd |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; };
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements.
The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
|
255097 |
31-Aug-2013 |
mckusick |
Fix bug introduced in rewrite of keg_free_slab in -r251894. The consequence of the bug is that fini calls are not done when a slab is freed by a call-back from the page daemon. It went unnoticed for two months because fini is little used.
I spotted the bug while reading the code to learn how it works so I could write it up for the next edition of the Design and Implementation of FreeBSD book.
No MFC needed as this code exists only in HEAD.
Reviewed by: kib, jeff Tested by: pho
|
255028 |
29-Aug-2013 |
alc |
Significantly reduce the cost, i.e., run time, of calls to madvise(..., MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2).
Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.
Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise().
Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
|
254911 |
26-Aug-2013 |
glebius |
Remove comment that is no longer relevant since r254182.
|
254719 |
23-Aug-2013 |
alc |
Addendum to r254141: The call to vm_radix_insert() in vm_page_cache() can reclaim the last preexisting cached page in the object, resulting in a call to vdrop(). Detect this scenario so that the vnode's hold count is correctly maintained. Otherwise, we panic.
Reported by: scottl Tested by: pho Discussed with: attilio, jeff, kib
|
254667 |
22-Aug-2013 |
kib |
Revert r254501. Instead, reuse the type stability of the struct pmap which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release().
Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
|
254649 |
22-Aug-2013 |
kib |
Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic.
Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation
|
254622 |
21-Aug-2013 |
jeff |
- Eliminate the vm object lock from the active queue scan. It is not necessary since we do not free or cache the page from active anymore. Document the one possible race that is harmless.
Sponsored by: EMC / Isilon Storage Division Discussed with: alc
|
254599 |
21-Aug-2013 |
alc |
Addendum to r254141: Allow recursion on the free pages queues lock in vm_page_alloc_freelist().
Reported and tested by: sbruno Sponsored by: EMC / Isilon Storage Division
|
254544 |
20-Aug-2013 |
jeff |
- Increase the active lru refresh interval to 10 minutes. This has been shown to negatively impact some workloads and the goal is only to eliminate worst case behaviors for very long periods of paging inactivity. Eventually we should determine a more complex scaling factor for this feature. - Rate limit low memory callback handlers to limit thrashing. Set the default to 10 seconds.
Sponsored by: EMC / Isilon Storage Division
|
254543 |
19-Aug-2013 |
jeff |
- Use an arbitrary but reasonably large import size for kva on architectures that don't support superpages. This keeps the number of spans and internal fragmentation lower. - When the user asks for alignment from vmem_xalloc adjust the imported size by 2*align to be certain we can satisfy the allocation. This comes at the expense of potential failures when the backend can't supply enough memory but could supply the requested size and alignment.
Sponsored by: EMC / Isilon Storage Division
|
254439 |
17-Aug-2013 |
kib |
Remove the arbitrary binding of the pagedaemon threads to the domains, update the comment accordingly and make it more precise.
Requested and reviewed by: jeff (previous version)
|
254430 |
16-Aug-2013 |
jhb |
Add new mmap(2) flags to permit applications to request specific virtual address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent.
Reviewed by: alc MFC after: 1 month
|
254387 |
15-Aug-2013 |
jeff |
- Fix bug in r254304. Use the ACTIVE pq count for the active list processing, not inactive. This was the result of a bad merge.
Reported by: pho Sponsored by: EMC / Isilon Storage Division
|
254362 |
15-Aug-2013 |
attilio |
On the recovery path for vm_page_alloc(), if a page had been requested wired, unwind back the wiring bits otherwise we can end up freeing a page that is considered wired.
Sponsored by: EMC / Isilon storage division Reported by: alc
|
254307 |
13-Aug-2013 |
jeff |
- Add a statically allocated memguard arena since it is needed very early on. - Pass the appropriate flags to vmem_xalloc() when allocating space for the arena from kmem_arena.
Sponsored by: EMC / Isilon Storage Division
|
254304 |
13-Aug-2013 |
jeff |
Improve pageout flow control to wakeup more frequently and do less work while maintaining better LRU of active pages.
- Change v_free_target to include the quantity previously represented by v_cache_min so we don't need to add them together everywhere we use them. - Add a pageout_wakeup_thresh that sets the free page count trigger for waking the page daemon. Set this 10% above v_free_min so we wakeup before any phase transitions in vm users. - Adjust down v_free_target now that we're willing to accept more pagedaemon wakeups. This means we process fewer pages in one iteration as well, leading to shorter lock hold times and less overall disruption. - Eliminate vm_pageout_page_stats(). This was a minor variation on the PQ_ACTIVE segment of the normal pageout daemon. Instead we now process 1 / vm_pageout_update_period pages every second. This causes us to visit the whole active list every 60 seconds. Previously we would only maintain the active LRU when we were short on pages which would mean it could be woefully out of date.
Reviewed by: alc (slight variant of this) Discussed with: alc, kib, jhb Sponsored by: EMC / Isilon Storage Division
|
254228 |
11-Aug-2013 |
attilio |
Correct the recovery logic in vm_page_alloc_contig: what is really needed on this code snipped is that all the pages that are already fully inserted gets fully freed, while for the others the object removal itself might be skipped, hence the object might be set to NULL.
Sponsored by: EMC / Isilon storage division Reported by: alc, kib Reviewed by: alc
|
254182 |
10-Aug-2013 |
kib |
Different consumers of the struct vm_page abuse pageq member to keep additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder.
Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples.
Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig().
Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation
|
254168 |
09-Aug-2013 |
zont |
Remove unused definition for CTL_VM_NAMES.
Suggested by: bde
|
254163 |
09-Aug-2013 |
jhb |
Revert the addition of VPO_BUSY and instead update vm_page_replace() to properly unbusy the page.
Submitted by: alc
|
254150 |
09-Aug-2013 |
obrien |
Add missing 'VPO_BUSY' from r254141 to fix kernel build break.
|
254141 |
09-Aug-2013 |
attilio |
On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes.
In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those.
vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again.
Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl
|
254138 |
09-Aug-2013 |
attilio |
The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it.
Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag
The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code.
Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
|
254065 |
07-Aug-2013 |
kib |
Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread.
Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division.
Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation
|
254025 |
07-Aug-2013 |
jeff |
Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
254017 |
07-Aug-2013 |
markj |
Fill in the description fields for M_FICT_PAGES.
Reviewed by: kib MFC after: 3 days
|
253953 |
05-Aug-2013 |
attilio |
Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism.
Before this patch is reinserted we need to break this ordering.
Sponsored by: EMC / Isilon storage division Reported by: kib
|
253939 |
04-Aug-2013 |
attilio |
The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages
Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()).
After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks).
Fixing such primitive can bring to complete removal of the page hold mechanism.
Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho
|
253775 |
29-Jul-2013 |
zont |
Unbreak sysctl ABI changes introduced in r253662
Requested by: bde
|
253697 |
26-Jul-2013 |
jeff |
Improve page LRU quality and simplify the logic.
- Don't short-circuit aging tests for unmapped objects. This biases against unmapped file pages and transient mappings. - Always honor PGA_REFERENCED. We can now use this after soft busying to lazily restart the LRU. - Don't transition directly from active to cached bypassing the inactive queue. This frees recently used data much too early. - Rename actcount to act_delta to be more consistent with use and meaning.
Reviewed by: kib, alc Sponsored by: EMC / Isilon Storage Division
|
253662 |
26-Jul-2013 |
zont |
Remove define and documentation for vm_pageout_algorithm missed in r253587
|
253636 |
25-Jul-2013 |
kientzle |
Clear entire map structure including locks so that the locks don't accidentally appear to have been already initialized.
In particular, this fixes a consistent kernel crash on armv6 with: panic: lock "vm map (user)" 0xc09cc050 already initialized that appeared with r251709.
PR: arm/180820
|
253604 |
24-Jul-2013 |
avg |
rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST
Also directly call swapper() at the end of mi_startup instead of relying on swapper being the last thing in sysinits order.
Rationale:
- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage - "scheduler" was misleading, the function swaps in the swapped out processes - another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be invoked depending on its relative order with scheduler; this was not obvious and the bug actually used to exist
Reviewed by: kib (ealier version) MFC after: 14 days
|
253591 |
24-Jul-2013 |
glebius |
Since r251709 a slab no longer use 8-bit indicies to manage items, thus remove a stale comment.
Reviewed by: jeff
|
253587 |
24-Jul-2013 |
jeff |
- Remove the long obsolete 'vm_pageout_algorithm' experiment.
Discussed with: alc Sponsored by: EMC / Isilon Storage Division
|
253583 |
23-Jul-2013 |
jeff |
- Correct a stale comment. We don't have vclean() anymore. The work is done by vgonel() and destroy_vobject() should only be called once from VOP_INACTIVE().
Sponsored by: EMC / Isilon Storage Division
|
253565 |
23-Jul-2013 |
glebius |
Revert r249590 and in case if mp_ncpus isn't initialized use MAXCPU. This allows us to init counter zone at early stage of boot.
Reviewed by: kib Tested by: Lytochkin Boris <lytboris gmail.com>
|
253556 |
22-Jul-2013 |
jlh |
Fix previous commit when option RACCT is not used.
MFC after: 7 days
|
253554 |
22-Jul-2013 |
jlh |
Fix a panic in the racct code when munlock(2) is called with incorrect values.
The racct code in sys_munlock() assumed that the boundaries provided by the userland were correct as long as vm_map_unwire() returned successfully. However the latter contains its own logic and sometimes manages to do something out of those boundaries, even if they are buggy. This change makes the racct code to use the accounting done by the vm layer, as it is done in other places such as vm_mlock().
Despite fixing the panic, Alan Cox pointed that this code is still race-y though: two simultaneous callers will produce incorrect values.
Reviewed by: alc MFC after: 7 days
|
253471 |
19-Jul-2013 |
jhb |
Be more aggressive in using superpages in all mappings of objects: - Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for vm_map_find() that will try to alter the alignment of a mapping to match any existing superpage mappings of the object being mapped. If no suitable address range is found with the necessary alignment, vm_map_find() will fall back to using the simple first-fit strategy (VMFS_ANY_SPACE). - Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE.
Reviewed by: alc (earlier version) MFC after: 2 weeks
|
253221 |
11-Jul-2013 |
kib |
When swap pager allocates metadata in the pagedaemon context, allow it to drain the reserve. This was broken in r243040, causing deadlock. Note that VM_WAIT call in case of uma_zalloc() failure from pagedaemon would only wait for the v_pageout_free_min anyway.
Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation
|
253191 |
11-Jul-2013 |
kib |
The vm_fault() should not be allowed to proceed on the map entry which is being wired now. The entry wired count is changed to non-zero in advance, before the map lock is dropped. This makes the vm_fault() to perceive the entry as wired, and breaks the fragment which moves the wire count from the shadowed page, to the upper page, making the code unwiring non-wired page.
On the other hand, the vm_fault() calls from vm_fault_wire() should be allowed to proceed, so only drain MAP_ENTRY_IN_TRANSITION from vm_fault() when wiring_thread is not current.
Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
253190 |
11-Jul-2013 |
kib |
The mlockall() or VM_MAP_WIRE_HOLESOK does not interact properly with parallel creation of the map entries, e.g. by mmap() or stack growing. It also breaks when other entry is wired in parallel.
The vm_map_wire() iterates over the map entries in the region, and assumes that map entries it finds are marked as in transition before, also that any entry marked as in transition, are marked by the current invocation of vm_map_wire(). This is not true for new entries in the holes.
Add the thread owner of the MAP_ENTRY_IN_TRANSITION flag to struct vm_map_entry. In vm_map_wire() and vm_map_unwire(), only process the entries which transition owner is the current thread.
Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
253189 |
11-Jul-2013 |
kib |
Never remove user-wired pages from an object when doing msync(MS_INVALIDATE). The vm_fault_copy_entry() requires that object range which corresponds to the user-wired vm_map_entry, is always fully populated.
Add OBJPR_NOTWIRED flag for vm_object_page_remove() to request the preserving behaviour, use it when calling vm_object_page_remove() from vm_object_sync().
Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
253188 |
11-Jul-2013 |
kib |
In the vm_page_set_invalid() function, do not assert that the page is not busy, since its only caller brelse() can legitimately call it on busy page. This happens for VOP_PUTPAGES() on filesystems that use buffers and which VOP_WRITE() method marked the buffer containing page as non-cacheable.
Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
253095 |
09-Jul-2013 |
kib |
Fix typo in comment.
MFC after: 3 days
|
252653 |
03-Jul-2013 |
neel |
vm_phys_fictitious_reg_range() was losing the 'memattr' because it would be reset by pmap_page_init() right after being initialized in vm_page_initfake().
The statement above is with reference to the amd64 implementation of pmap_page_init().
Fix this by calling 'pmap_page_init()' in 'vm_page_initfake()' before changing the 'memattr'.
Reviewed by: kib MFC after: 2 weeks
|
252358 |
28-Jun-2013 |
davide |
Remove a spurious keg lock acquisition.
|
252330 |
28-Jun-2013 |
jeff |
- Add a general purpose resource allocator, vmem, from NetBSD. It was originally inspired by the Solaris vmem detailed in the proceedings of usenix 2001. The NetBSD version was heavily refactored for bugs and simplicity. - Use this resource allocator to allocate the buffer and transient maps. Buffer cache defrags are reduced by 25% when used by filesystems with mixed block sizes. Ultimately this may permit dynamic buffer cache sizing on low KVA machines.
Discussed with: alc, kib, attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
252226 |
26-Jun-2013 |
jeff |
- Resolve bucket recursion issues by passing a cookie with zone flags through bucket_alloc() to uma_zalloc_arg() and uma_zfree_arg(). - Make some smaller buckets for large zones to further reduce memory waste. - Implement uma_zone_reserve(). This holds aside a number of items only for callers who specify M_USE_RESERVE. buckets will never be filled from reserve allocations.
Sponsored by: EMC / Isilon Storage Division
|
252161 |
24-Jun-2013 |
glebius |
Typo in comment.
|
252040 |
20-Jun-2013 |
jeff |
- Add a per-zone lock for zones without kegs. - Be more explicit about zone vs keg locking. This functionally changes almost nothing. - Add a size parameter to uma_zcache_create() so we can size the buckets. - Pass the zone to bucket_alloc() so it can modify allocation flags as appropriate. - Fix a bug in zone_alloc_bucket() where I missed an address of operator in a failure case. (Found by pho)
Sponsored by: EMC / Isilon Storage Division
|
251983 |
19-Jun-2013 |
jeff |
- Persist the caller's flags in the bucket allocation flags so we don't lose a M_NOVM when we recurse into a bucket allocation.
Sponsored by: EMC / Isilon Storage Division
|
251901 |
18-Jun-2013 |
des |
Fix a bug that allowed a tracing process (e.g. gdb) to write to a memory-mapped file in the traced process's address space even if neither the traced process nor the tracing process had write access to that file.
Security: CVE-2013-2171 Security: FreeBSD-SA-13:06.mmap Approved by: so
|
251894 |
18-Jun-2013 |
jeff |
Refine UMA bucket allocation to reduce space consumption and improve performance.
- Always free to the alloc bucket if there is space. This gives LIFO allocation order to improve hot-cache performance. This also allows for zones with a single bucket per-cpu rather than a pair if the entire working set fits in one bucket. - Enable per-cpu caches of buckets. To prevent recursive bucket allocation one bucket zone still has per-cpu caches disabled. - Pick the initial bucket size based on a table driven maximum size per-bucket rather than the number of items per-page. This gives more sane initial sizes. - Only grow the bucket size when we face contention on the zone lock, this causes bucket sizes to grow more slowly. - Adjust the number of items per-bucket to account for the header space. This packs the buckets more efficiently per-page while making them not quite powers of two. - Eliminate the per-zone free bucket list. Always return buckets back to the bucket zone. This ensures that as zones grow into larger bucket sizes they eventually discard the smaller sizes. It persists fewer buckets in the system. The locking is slightly trickier. - Only switch buckets in zalloc, not zfree, this eliminates pathological cases where we ping-pong between two buckets. - Ensure that the thread that fills a new bucket gets to allocate from it to give a better upper bound on allocation time.
Sponsored by: EMC / Isilon Storage Division
|
251826 |
17-Jun-2013 |
jeff |
- Add a new UMA API: uma_zcache_create(). This makes a zone without any backing memory that is only a container for per-cpu caches of arbitrary pointer items. These zones have no kegs. - Convert the regular keg based allocator to use the new import/release functions. - Move some stats to be atomics since they would require excessive zone locking/unlocking with the new import/release paradigm. Make zone_free_item simpler now that callers can manage more stats. - Check for these cache-only zones in the public APIs and debugging code by checking zone_first_keg() against NULL.
Sponsored by: EMC / Isilong Storage Division
|
251709 |
13-Jun-2013 |
jeff |
- Convert the slab free item list from a linked array of indices to a bitmap using sys/bitset. This is much simpler, has lower space overhead and is cheaper in most cases. - Use a second bitmap for invariants asserts and improve the quality of the asserts as well as the number of erroneous conditions that we will catch. - Drastically simplify sizing code. Special case refcnt zones since they will be going away. - Update stale comments.
Sponsored by: EMC / Isilon Storage Division
|
251591 |
10-Jun-2013 |
alc |
Revise the interface between vm_object_madvise() and vm_page_dontneed() so that pointless calls to pmap_is_modified() can be easily avoided when performing madvise(..., MADV_FREE).
Sponsored by: EMC / Isilon Storage Division
|
251523 |
08-Jun-2013 |
glebius |
Make sys_mlock() function just a wrapper around vm_mlock() function that does all the job.
Reviewed by: kib, jilles Sponsored by: Nginx, Inc.
|
251471 |
06-Jun-2013 |
attilio |
Complete r251452: Avoid to busy/unbusy a page in cases where there is no need to drop the vm_obj lock, more nominally when the page is full valid after vm_page_grab().
Sponsored by: EMC / Isilon storage division Reviewed by: alc
|
251397 |
04-Jun-2013 |
attilio |
In vm_object_split(), busy and consequently unbusy the pages only when swap_pager_copy() is invoked, otherwise there is no reason to do so. This will eliminate the necessity to busy pages most of the times.
Sponsored by: EMC / Isilon storage division Reviewed by: alc
|
251367 |
04-Jun-2013 |
alc |
Update a comment.
|
251359 |
04-Jun-2013 |
alc |
Relax the object locking in vm_pageout_map_deactivate_pages() and vm_pageout_object_deactivate_pages(). A read lock suffices.
Sponsored by: EMC / Isilon Storage Division
|
251318 |
03-Jun-2013 |
kib |
Remove irrelevant comments.
Discussed with: alc MFC after: 3 days
|
251280 |
03-Jun-2013 |
alc |
Require that the page lock is held, instead of the object lock, when clearing the page's PGA_REFERENCED flag. Since we are typically manipulating the page's act_count field when we are clearing its PGA_REFERENCED flag, the page lock is already held everywhere that we clear the PGA_REFERENCED flag. So, in fact, this revision only changes some comments and an assertion. Nonetheless, it will enable later changes to object locking in the pageout code.
Introduce vm_page_assert_locked(), which completely hides the implementation details of the page lock from the caller, and use it in vm_page_aflag_clear(). (The existing vm_page_lock_assert() could not be used in vm_page_aflag_clear().) Over the coming weeks, I expect that we'll either eliminate or replace the various uses of vm_page_lock_assert() with vm_page_assert_locked().
Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
|
251229 |
01-Jun-2013 |
alc |
Now that access to the page's "act_count" field is synchronized by the page lock instead of the object lock, there is no reason for vm_page_activate() to assert that the object is locked for either read or write access. (The "VPO_UNMANAGED" flag never changes after page allocation.)
Sponsored by: EMC / Isilon Storage Division
|
251183 |
31-May-2013 |
alc |
Simplify the definition of vm_page_lock_assert(). There is no compelling reason to inline the implementation of vm_page_lock_assert() in the !KLD_MODULES case. Use the same implementation for both KLD_MODULES and !KLD_MODULES.
Reviewed by: kib
|
251151 |
30-May-2013 |
kib |
After the object lock was dropped, the object' reference count could change. Retest the ref_count and return from the function to not execute the further code which assumes that ref_count == 1 if it is not. Also, do not leak vnode lock if other thread cleared OBJ_TMPFS flag meantime.
Reported by: bdrewery Tested by: bdrewery, pho Sponsored by: The FreeBSD Foundation
|
251150 |
30-May-2013 |
kib |
Remove the capitalization in the assertion message. Print the address of the object to get useful information from optimizated kernels dump.
|
251077 |
28-May-2013 |
attilio |
o Change the locking scheme for swp_bcount. It can now be accessed with a write lock on the object containing it OR with a read lock on the object containing it along with the swhash_mtx. o Remove some duplicate assertions for swap_pager_freespace() and swap_pager_unswapped() but keep the object locking references for documentation.
Sponsored by: EMC / Isilon storage division Reviewed by: alc
|
250909 |
22-May-2013 |
attilio |
Acquire read lock on the src object for vm_fault_copy_entry().
Sponsored by: EMC / Isilon storage division Reviewed by: alc
|
250884 |
21-May-2013 |
attilio |
o Relax locking assertions for vm_page_find_least() o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks.
Sponsored by: EMC / Isilon storage division Reviewed by: alc
|
250849 |
21-May-2013 |
kib |
Add ddb command 'show pginfo' which provides useful information about a vm page, denoted either by an address of the struct vm_page, or, if the '/p' modifier is specified, by a physical address of the corresponding frame.
Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
250748 |
17-May-2013 |
alc |
Relax the object locking in vm_fault_prefault(). A read lock suffices.
Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
|
250745 |
17-May-2013 |
alc |
Relax the object locking assertion in vm_page_lookup(). Now that a radix tree is used to maintain the object's collection of resident pages, vm_page_lookup() no longer needs an exclusive lock.
Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
|
250601 |
13-May-2013 |
attilio |
o Add accessor functions to add and remove pages from a specific freelist. o Split the pool of free pages queues really by domain and not rely on definition of VM_RAW_NFREELIST. o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific function that is called when calculating the allocation domain. The RR counter is kept, currently, per-thread. In the future it is expected that such function evolves in a real policy decision referee, based on specific informations retrieved by per-thread and per-vm_object attributes. o Add the concept of "probed domains" under the form of vm_ndomains. It is responsibility for every architecture willing to support multiple memory domains to correctly probe vm_ndomains along with mem_affinity segments attributes. Those two values are supposed to remain always consistent. Please also note that vm_ndomains and td_dom_rr_idx are both int because segments already store domains as int. Ideally u_int would have much more sense. Probabilly this should be cleaned up in the future. o Apply RR domain selection also to vm_phys_zero_pages_idle().
Sponsored by: EMC / Isilon storage division Partly obtained from: jeff Reviewed by: alc Tested by: jeff
|
250594 |
13-May-2013 |
peter |
Bandaid for compiling with gcc, which happens to be the default compiler for a number of platforms still.
|
250577 |
12-May-2013 |
alc |
Refactor vm_page_alloc()'s interactions with vm_reserv_alloc_page() and vm_page_insert() so that (1) vm_radix_lookup_le() is never called while the free page queues lock is held and (2) vm_radix_lookup_le() is called at most once. This change reduces the average time that the free page queues lock is held by vm_page_alloc() as well as vm_page_alloc()'s average overall running time.
Sponsored by: EMC / Isilon Storage Division
|
250520 |
11-May-2013 |
alc |
To reduce the amount of arithmetic performed in the various radix tree functions, reverse the numbering scheme for the levels. The highest numbered level in the tree now appears near the root instead of the leaves.
Sponsored by: EMC / Isilon Storage Division
|
250361 |
08-May-2013 |
attilio |
Fix-up r250338 by completing the removal of VM_NDOMAIN in favor of MAXMEMDOM. This unbreak builds.
Sponsored by: EMC / Isilon storage division Reported by: adrian, jeli
|
250338 |
07-May-2013 |
attilio |
Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in order to match the MAXCPU concept. The change should also be useful for consolidation and consistency.
Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc
|
250334 |
07-May-2013 |
alc |
Remove a redundant call to panic() from vm_radix_keydiff(). The assertion before the loop accomplishes the same thing.
Sponsored by: EMC / Isilon Storage Division
|
250259 |
04-May-2013 |
alc |
Optimize vm_radix_lookup_ge() and vm_radix_lookup_le(). Specifically, change the way that these functions ascend the tree when the search for a matching leaf fails at an interior node. Rather than returning to the root of the tree and repeating the lookup with an updated key, maintain a stack of interior nodes that were visited during the descent and use that stack to resume the lookup at the closest ancestor that might have a matching descendant.
Sponsored by: EMC / Isilon Storage Division Reviewed by: attilio Tested by: pho
|
250219 |
03-May-2013 |
jhb |
Fix two bugs in the current NUMA-aware allocation code: - vm_phys_alloc_freelist_pages() can be called by vm_page_alloc_freelist() to allocate a page from a specific freelist. In the NUMA case it did not properly map the public VM_FREELIST_* constants to the correct backing freelists, nor did it try all NUMA domains for allocations from VM_FREELIST_DEFAULT. - vm_phys_alloc_pages() did not pin the thread and each call to vm_phys_alloc_freelist_pages() fetched the current domain to choose which freelist to use. If a thread migrated domains during the loop in vm_phys_alloc_pages() it could skip one of the freelists. If the other freelists were out of memory then it is possible that vm_phys_alloc_pages() would fail to allocate a page even though pages were available resulting in a panic in vm_page_alloc().
Reviewed by: alc MFC after: 1 week
|
250187 |
02-May-2013 |
kib |
Add a hint suggesting why tmpfs does not need a special case there.
|
250030 |
28-Apr-2013 |
kib |
Rework the handling of the tmpfs node backing swap object and tmpfs vnode v_object to avoid double-buffering. Use the same object both as the backing store for tmpfs node and as the v_object.
Besides reducing memory use up to 2x times for situation of mapping files from tmpfs, it also makes tmpfs read and write operations copy twice bytes less.
VM subsystem was already slightly adapted to tolerate OBJT_SWAP object as v_object. Now the vm_object_deallocate() is modified to not reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle VV_TEXT flag on the last dereference of the tmpfs backing object.
Reviewed by: alc Tested by: pho, bf MFC after: 1 month
|
250029 |
28-Apr-2013 |
kib |
Make vm_object_page_clean() and vm_mmap_vnode() tolerate the vnode' v_object of non OBJT_VNODE type.
For vm_object_page_clean(), simply do not assert that object type must be OBJT_VNODE, and add a comment explaining how the check for OBJ_MIGHTBEDIRTY prevents the rest of function from operating on such objects.
For vm_mmap_vnode(), if the object type is not OBJT_VNODE, require it to be for swap pager (or default), handle the bypass filesystems, and correctly acquire the object reference in this case.
Reviewed by: alc Tested by: pho, bf MFC after: 1 week
|
250028 |
28-Apr-2013 |
kib |
Assert that the object type for the vnode' non-NULL v_object, passed to vnode_pager_setsize(), is either OBJT_VNODE, or, if vnode was already reclaimed, OBJT_DEAD. Note that the later is only possible due to some filesystems, in particular, nfsiods from nfs clients, call vnode_pager_setsize() with unlocked vnode.
More, if the object is terminated, do not perform the resizing operation.
Reviewed by: alc Tested by: pho, bf MFC after: 1 week
|
250026 |
28-Apr-2013 |
kib |
Convert panic() into KASSERT().
Reviewed by: alc MFC after: 1 week
|
250018 |
28-Apr-2013 |
alc |
Eliminate an unneeded call to vm_radix_trimkey() from vm_radix_lookup_le(). This call is clearing bits from the key that will be set again by the next line.
Sponsored by: EMC / Isilon Storage Division
|
249986 |
27-Apr-2013 |
alc |
Avoid some lookup restarts in vm_radix_lookup_{ge,le}().
Sponsored by: EMC / Isilon Storage Division
|
249763 |
22-Apr-2013 |
glebius |
Panic if UMA_ZONE_PCPU is created at early stages of boot, when mp_ncpus isn't yet initialized. Otherwise we will panic at first allocation later.
Sponsored by: Nginx, Inc.
|
249745 |
22-Apr-2013 |
alc |
Simplify vm_radix_{add,dec}lev().
Sponsored by: EMC / Isilon Storage Division
|
249605 |
18-Apr-2013 |
alc |
When calculating the number of reserved nodes, discount the pages that will be used to store the nodes.
Sponsored by: EMC / Isilon Storage Division
|
249502 |
15-Apr-2013 |
alc |
Although we perform path compression to reduce the height of the trie and the number of interior nodes, we have previously created a level zero interior node at the root of every non-empty trie, even when that node is not strictly necessary, i.e., it has only one child. This change is the second (and final) step in eliminating those unnecessary level zero interior nodes. Specifically, it updates the deletion and insertion functions so that they do not require a level zero interior node at the root of the trie. For a "buildworld" workload, this change results in a 16.8% reduction in the number of interior nodes allocated and a similar reduction in the average execution time for lookup functions. For example, the average execution time for a call to vm_radix_lookup_ge() is reduced by 22.9%.
Reviewed by: attilio, jeff (an earlier version) Sponsored by: EMC / Isilon Storage Division
|
249427 |
12-Apr-2013 |
alc |
Although we perform path compression to reduce the height of the trie and the number of interior nodes, we always create a level zero interior node at the root of every non-empty trie, even when that node is not strictly necessary, i.e., it has only one child. This change is the first step in eliminating those unnecessary level zero interior nodes. Specifically, it updates all of the lookup functions so that they do not require a level zero interior node at the root.
Reviewed by: attilio, jeff (an earlier version) Sponsored by: EMC / Isilon Storage Division
|
249313 |
09-Apr-2013 |
glebius |
Convert UMA code to C99 uintXX_t types.
|
249312 |
09-Apr-2013 |
glebius |
Swap us_freecount and us_flags, achieving same structure size as before previous commit.
Submitted by: alc
|
249309 |
09-Apr-2013 |
glebius |
Since now we support 256 items per slab, we need more bits for us_freecount.
This grows uma_slab_head on 32-bit arches, but growth isn't significant. Taking kmem zones as example, only the 32 byte zone is affected, ipers is reduced from 113 to 112.
In collaboration with: kib
|
249305 |
09-Apr-2013 |
glebius |
Fix KASSERTs: maximum number of items per slab is 256.
|
249303 |
09-Apr-2013 |
kib |
Fix the assertions for the state of the object under the map entry with the MAP_ENTRY_VN_WRITECNT flag: - Move the assertion that verifies the state of the v_writecount and vnp.writecount, under the block where the object is locked. - Check that the object type is OBJT_VNODE before asserting.
Reported by: avg Reviewed by: alc MFC after: 1 week
|
249278 |
08-Apr-2013 |
attilio |
The per-page act_count can be made very-easily protected by the per-page lock rather than vm_object lock, without any further overhead. Make the formal switch.
Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
|
249264 |
08-Apr-2013 |
glebius |
Merge from projects/counters: UMA_ZONE_PCPU zones.
These zones have slab size == sizeof(struct pcpu), but request from VM enough pages to fit (uk_slabsize * mp_ncpus). An item allocated from such zone would have a separate twin for each CPU in the system, and these twins are at a distance of sizeof(struct pcpu) from each other. This magic value of distance would allow us to make some optimizations later.
To address private item from a CPU simple arithmetics should be used:
item = (type *)((char *)base + sizeof(struct pcpu) * curcpu)
These arithmetics are available as zpcpu_get() macro in pcpu.h.
To introduce non-page size slabs a new field had been added to uma_keg uk_slabsize. This shifted some frequently used fields of uma_keg to the fourth cache line on amd64. To mitigate this pessimization, uma_keg fields were a bit rearranged and least frequently used uk_name and uk_link moved down to the fourth cache line. All other fields, that are dereferenced frequently fit into first three cache lines.
Sponsored by: Nginx, Inc.
|
249221 |
07-Apr-2013 |
alc |
Micro-optimize the order of struct vm_radix_node's fields. Specifically, arrange for all of the fields to start at a short offset from the beginning of the structure.
Eliminate unnecessary masking of VM_RADIX_FLAGS from the root pointer in vm_radix_getroot().
Sponsored by: EMC / Isilon Storage Division
|
249218 |
06-Apr-2013 |
jeff |
Prepare to replace the buf splay with a trie:
- Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases.
Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division
|
249211 |
06-Apr-2013 |
alc |
Simplify vm_radix_keybarr().
Sponsored by: EMC / Isilon Storage Division
|
249182 |
06-Apr-2013 |
alc |
Simplify vm_radix_insert().
Reviewed by: attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
249038 |
03-Apr-2013 |
alc |
Replace the remaining uses of vm_radix_node_page() by vm_radix_isleaf() and vm_radix_topage(). This transformation eliminates some unnecessary conditional branches from the inner loops of vm_radix_insert(), vm_radix_lookup{,_ge,_le}(), and vm_radix_remove().
Simplify the control flow of vm_radix_lookup_{ge,le}().
Reviewed by: attilio (an earlier version) Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
248815 |
28-Mar-2013 |
kib |
Release the v_writecount reference on the vnode in case of error, before the vnode is vput() in vm_mmap_vnode(). Error return means that there is no use reference on the vnode from the vm object reference, and failing to restore v_writecount breaks the invariant that v_writecount is less or equal to the usecount.
The situation observed when nfs client returns ESTALE for VOP_GETATTR() after the open.
In collaboration with: pho MFC after: 1 week
|
248728 |
26-Mar-2013 |
alc |
Introduce vm_radix_isleaf() and use it in a couple places. As compared to using vm_radix_node_page() == NULL, the compiler is able to generate one less conditional branch when vm_radix_isleaf() is used. More use cases involving the inner loops of vm_radix_insert(), vm_radix_lookup{,_ge,_le}(), and vm_radix_remove() will follow.
Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
|
248684 |
24-Mar-2013 |
alc |
Micro-optimize the control flow in a few places. Eliminate a panic call that could never be reached in vm_radix_insert(). (If the pointer being checked by the panic call were ever NULL, the immmediately preceding loop would have already crashed on a NULL pointer dereference.)
Reviewed by: attilio (an earlier version) Sponsored by: EMC / Isilon Storage Division
|
248569 |
21-Mar-2013 |
kib |
Only size and create the bio_transient_map when unmapped buffers are enabled. Now, disabling the unmapped buffers should result in the kernel memory map identical to pre-r248550.
Sponsored by: The FreeBSD Foundation
|
248550 |
20-Mar-2013 |
kib |
Fix the logic inversion in the r248512.
Noted by: mckay
|
248514 |
19-Mar-2013 |
kib |
Do not map the swap i/o pbufs if the geom provider for the swap partition accepts unmapped requests.
Sponsored by: The FreeBSD Foundation Tested by: pho
|
248512 |
19-Mar-2013 |
kib |
Pass unmapped buffers for page in requests if the filesystem indicated support for the unmapped i/o.
Sponsored by: The FreeBSD Foundation Tested by: pho
|
248508 |
19-Mar-2013 |
kib |
Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads.
The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag.
When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation.
Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap.
The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests.
Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested.
In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached.
By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions.
Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
|
248449 |
18-Mar-2013 |
attilio |
Sync back vmcontention branch into HEAD: Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie.
This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections.
The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed.
The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property.
This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done.
The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped.
Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/
Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc)
|
248283 |
14-Mar-2013 |
kib |
Some style fixes.
Sponsored by: The FreeBSD Foundation
|
248280 |
14-Mar-2013 |
kib |
Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified.
The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture.
Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement.
For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking.
Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks
|
248277 |
14-Mar-2013 |
kib |
Remove excessive and inconsistent initializers for the various kernel maps and submaps.
MFC after: 2 weeks
|
248197 |
12-Mar-2013 |
attilio |
Simplify vm_page_is_valid().
Sponsored by: EMC / Isilon storage division Reviewed by: alc
|
248117 |
09-Mar-2013 |
alc |
Update a comment: The object lock is no longer a mutex.
|
248084 |
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
248082 |
09-Mar-2013 |
attilio |
Merge from vmc-playground: Introduce a new KPI that verifies if the page cache is empty for a specified vm_object. This KPI does not make assumptions about the locking in order to be used also for building assertions at init and destroy time. It is mostly used to hide implementation details of the page cache.
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: alc (vm_radix based version) Tested by: flo, pho, jhb, davide
|
248032 |
08-Mar-2013 |
andre |
Move the callout subsystem initialization to its own SYSINIT() from being indirectly called via cpu_startup()+vm_ksubmap_init(). The boot order position remains the same at SI_SUB_CPU.
Allocation of the callout array is changed to stardard kernel malloc from a slightly obscure direct kernel_map allocation.
kern_timeout_callwheel_alloc() is renamed to callout_callwheel_init() to better describe its purpose. kern_timeout_callwheel_init() is removed simplifying the per-cpu initialization.
Reviewed by: davide
|
247788 |
04-Mar-2013 |
attilio |
Merge from vmcontention: As vm objects are type-stable there is no need to initialize the resident splay tree pointer and the cache splay tree pointer in _vm_object_allocate() but this could be done in the init UMA zone handler.
The destructor UMA zone handler, will further check if the condition is retained at every destruction and catch for bugs.
Sponsored by: EMC / Isilon storage division Submitted by: alc
|
247659 |
02-Mar-2013 |
alc |
The value held by the vm object's field pg_color is only considered valid if the flag OBJ_COLORED is set. Since _vm_object_allocate() doesn't set this flag, it needn't initialize pg_color.
Sponsored by: EMC / Isilon Storage Division
|
247602 |
02-Mar-2013 |
pjd |
Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights.
- The cap_new(2) system call is left, but it is no longer documented and should not be used in new code.
- The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one.
- The cap_getrights(2) syscall is renamed to cap_rights_get(2).
- If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall.
- If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2).
- To support ioctl and fcntl white-listing the filedesc structure was heavly modified.
- The audit subsystem, kdump and procstat tools were updated to recognize new syscalls.
- Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below:
CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT.
Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2).
Added CAP_SYMLINKAT: - Allow for symlinkat(2).
Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2).
Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory.
Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall.
Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call.
Removed CAP_MAPEXEC.
CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE.
Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).
Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT.
CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required).
CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required).
Added convinient defines:
#define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE
#define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN)
Added defines for backward API compatibility:
#define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)
Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
247400 |
27-Feb-2013 |
attilio |
Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions.
Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
|
247360 |
26-Feb-2013 |
attilio |
Merge from vmc-playground branch: Replace the sub-optimal uma_zone_set_obj() primitive with more modern uma_zone_reserve_kva(). The new primitive reserves before hand the necessary KVA space to cater the zone allocations and allocates pages with ALLOC_NOOBJ. More specifically: - uma_zone_reserve_kva() does not need an object to cater the backend allocator. - uma_zone_reserve_kva() can cater M_WAITOK requests, in order to serve zones which need to do uma_prealloc() too. - When possible, uma_zone_reserve_kva() uses directly the direct-mapping by uma_small_alloc() rather than relying on the KVA / offset combination.
The removal of the object attribute allows 2 further changes: 1) _vm_object_allocate() becomes static within vm_object.c 2) VM_OBJECT_LOCK_INIT() is removed. This function is replaced by direct calls to mtx_init() as there is no need to export it anymore and the calls aren't either homogeneous anymore: there are now small differences between arguments passed to mtx_init().
Sponsored by: EMC / Isilon storage division Reviewed by: alc (which also offered almost all the comments) Tested by: pho, jhb, davide
|
247346 |
26-Feb-2013 |
attilio |
Remove white spaces.
Sponsored by: EMC / Isilon storage division
|
247323 |
26-Feb-2013 |
attilio |
Wrap the sleeps synchronized by the vm_object lock into the specific macro VM_OBJECT_SLEEP(). This hides some implementation details like the usage of the msleep() primitive and the necessity to access to the lock address directly. For this reason VM_OBJECT_MTX() macro is now retired.
Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
|
246926 |
18-Feb-2013 |
alc |
On arm, like sparc64, the end of the kernel map varies from one type of machine to another. Therefore, VM_MAX_KERNEL_ADDRESS can't be a constant. Instead, #define it to be a variable, vm_max_kernel_address, just like we do on sparc64.
Reviewed by: kib Tested by: ian
|
246805 |
14-Feb-2013 |
jhb |
Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel config file.
Requested by: phk (ages ago) MFC after: 1 month
|
246316 |
04-Feb-2013 |
marius |
Try to improve r242655 take III: move these SYSCTLs describing the kernel map, which is defined and initialized in vm/vm_kern.c, to the latter.
Submitted by: alc
|
246087 |
29-Jan-2013 |
glebius |
Fix typo in debug printf.
|
246032 |
28-Jan-2013 |
zont |
- Add system wide page faults requiring I/O counter.
Reviewed by: alc MFC after: 2 weeks
|
246030 |
28-Jan-2013 |
zont |
- Add sysctls to show number of stats scans.
MFC after: 2 weeks
|
246029 |
28-Jan-2013 |
zont |
- Style.
MFC after: 2 weeks
|
245421 |
14-Jan-2013 |
zont |
- Get rid of unused function vmspace_wired_count().
Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
|
245296 |
11-Jan-2013 |
zont |
- Improve readability of sys_obreak().
Suggested by: alc Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
|
245255 |
10-Jan-2013 |
zont |
- Reduce kernel size by removing unnecessary pointer indirections.
GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes.
Suggested by: alc Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
|
245226 |
09-Jan-2013 |
ken |
Fix a bug in the device pager code that can trigger an assertion in devfs if a particular race condition is hit in the device pager code.
This was a side effect of change 227530 which changed the device pager interface to call a new destructor routine for the cdev. That destructor routine, old_dev_pager_dtor(), takes a VM object handle.
The object handle is cast to a struct cdev *, and passed into dev_rel().
That works in most cases, except the case in cdev_pager_allocate() where there is a race condition between two threads allocating an object backed by the same device. The loser of the race deallocates its object at the end of the function.
The problem is that before inserting the object into the dev_pager_object_list, the object's handle is changed from the struct cdev pointer to the object's own address. This is to avoid conflicts with the winner of the race, which already inserted an object in the list with a handle that is a pointer to the same cdev structure.
The object is then passed to vm_object_deallocate(), and eventually makes its way down to old_dev_pager_dtor(). That function passes the handle pointer (which is actually a VM object, not a struct cdev as usual) into dev_rel(). dev_rel() decrements the reference count in the assumed struct cdev (which happens to be 0), and that triggers the assertion in dev_rel() that the reference count is greater than or equal to 0.
The fix is to add a cdev pointer to the VM object, and use that pointer when calling the cdev_pg_dtor() routine.
vm_object.h: Add a struct cdev pointer to the VM object structure.
device_pager.c: In cdev_pager_allocate(), populate the new cdev pointer.
In dev_pager_dealloc(), use the new cdev pointer when calling the object's cdev_pg_dtor() routine.
Reviewed by: kib Sponsored by: Spectra Logic Corporation MFC after: 1 week
|
244532 |
21-Dec-2012 |
glebius |
Comment fix: there is no ub_ptr, instead explain meaning of uz_count field verbally.
|
244384 |
18-Dec-2012 |
zont |
- Fix locked memory accounting for maps with MAP_WIREFUTURE flag. - Add sysctl vm.old_mlock which may turn such accounting off.
Reviewed by: avg, trasz Approved by: kib (mentor) MFC after: 1 week
|
244043 |
09-Dec-2012 |
alc |
In the past four years, we've added two new vm object types. Each time, similar changes had to be made in various places throughout the machine- independent virtual memory layer to support the new vm object type. However, in most of these places, it's actually not the type of the vm object that matters to us but instead certain attributes of its pages. For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain fictitious pages. In other words, in most of these places, we were testing the vm object's type to determine if it contained fictitious (or unmanaged) pages.
To both simplify the code in these places and make the addition of future vm object types easier, this change introduces two new vm object flags that describe attributes of the vm object's pages, specifically, whether they are fictitious or unmanaged.
Reviewed and tested by: kib
|
244024 |
08-Dec-2012 |
pjd |
White-space cleanups.
|
243998 |
07-Dec-2012 |
pjd |
Implemented uma_zone_set_warning(9) function that sets a warning, which will be printed once the given zone becomes full and cannot allocate an item. The warning will not be printed more often than every five minutes.
All UMA warnings can be globally turned off by setting sysctl/tunable vm.zone_warnings to 0.
Discussed on: arch Obtained from: WHEEL Systems MFC after: 2 weeks
|
243659 |
28-Nov-2012 |
alc |
Add support for the (relatively) new object type OBJT_MGTDEVICE to vm_object_set_memattr(). Also, add a "safety belt" so that vm_object_set_memattr() doesn't silently modify undefined object types.
Reviewed by: kib MFC after: 10 days
|
243529 |
25-Nov-2012 |
alc |
Make a few small changes to vm_map_pmap_enter():
Add detail to the comment describing this function. In particular, describe what MAP_PREFAULT_PARTIAL does.
Eliminate the abrupt change in behavior when the specified address range grows from MAX_INIT_PT pages to MAX_INIT_PT plus one pages. Instead of doing nothing, i.e., preloading no mappings whatsoever, map any resident pages that fall within the start of the specified address range, i.e., [addr, addr + ulmin(size, ptoa(MAX_INIT_PT))).
Long ago, the vm object's list of resident pages was not ordered, so this function had to choose between probing the global hash table of all resident pages and iterating over the vm object's unordered list of resident pages. Now, the list is ordered, so there is no reason for MAP_PREFAULT_PARTIAL to be concerned with the vm object's count of resident changes.
MFC after: 14 days
|
243366 |
21-Nov-2012 |
alc |
Correct an error in r230623. When both VM_ALLOC_NODUMP and VM_ALLOC_ZERO were specified to vm_page_alloc(), PG_NODUMP wasn't being set on the allocated page when it happened to be pre-zeroed.
MFC after: 5 days
|
243333 |
20-Nov-2012 |
jh |
- Don't pass geom and provider names as format strings. - Add __printflike() attributes. - Remove an extra argument for the g_new_geomf() call in swapongeom_ev().
Reviewed by: pjd
|
243176 |
17-Nov-2012 |
alc |
Update a comment to reflect the elimination of the hold queue in r242300.
|
243132 |
16-Nov-2012 |
kib |
Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h to vm/vm_phys.h, where it belongs.
Requested and reviewed by: alc MFC after: 2 weeks
|
243131 |
16-Nov-2012 |
kib |
Explicitely state that M_USE_RESERVE requires M_NOWAIT, using assertion.
Reviewed by: alc MFC after: 2 weeks
|
243040 |
14-Nov-2012 |
kib |
Flip the semantic of M_NOWAIT to only require the allocation to not sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions.
Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class.
Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags().
Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks
|
242941 |
13-Nov-2012 |
alc |
Replace the single, global page queues lock with per-queue locks on the active and inactive paging queues.
Reviewed by: kib
|
242903 |
12-Nov-2012 |
attilio |
Fix DDB command "show map XXX": - Check that an argument is always available, otherwise current map printing before to recurse is garbage. - Spit out a message if an argument is not provided. - Remove unread nlines variable. - Use an explicit recursive function, disassociated from the DB_SHOW_COMMAND() body, in order to make clear prototype and recursion of the above mentioned function. The code results now much less obscure.
Submitted by: gianni
|
242476 |
02-Nov-2012 |
kib |
The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay.
Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks.
Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph.
Tested by: pho MFC after: 3 weeks
|
242434 |
01-Nov-2012 |
alc |
In general, we call pmap_remove_all() before calling vm_page_cache(). So, the call to pmap_remove_all() within vm_page_cache() is usually redundant. This change eliminates that call to pmap_remove_all() and introduces a call to pmap_remove_all() before vm_page_cache() in the one place where it didn't already exist.
When iterating over a paging queue, if the object containing the current page has a zero reference count, then the page can't have any managed mappings. So, a call to pmap_remove_all() is pointless.
Change a panic() call in vm_page_cache() to a KASSERT().
MFC after: 6 weeks
|
242402 |
31-Oct-2012 |
attilio |
Rework the known mutexes to benefit about staying on their own cache line in order to avoid manual frobbing but using struct mtx_padalign.
The sole exception being nvme and sxfge drivers, where the author redefined CACHE_LINE_SIZE manually, so they need to be analyzed and dealt with separately.
Reviwed by: jimharris, alc
|
242300 |
29-Oct-2012 |
alc |
Replace the page hold queue, PQ_HOLD, by a new page flag, PG_UNHOLDFREE, because the queue itself serves no purpose. When a held page is freed, inserting the page into the hold queue has the side effect of setting the page's "queue" field to PQ_HOLD. Later, when the page is unheld, it will be freed because the "queue" field is PQ_HOLD. In other words, PQ_HOLD is used as a flag, not a queue. So, this change replaces it with a flag.
To accomodate the new page flag, make the page's "flags" field wider and "oflags" field narrower.
Reviewed by: kib
|
242268 |
28-Oct-2012 |
trasz |
Remove useless check; vm_pindex_t is unsigned on all architectures.
CID: 3701 Found with: Coverity Prevent
|
242152 |
26-Oct-2012 |
mdf |
Const-ify the zone name argument to uma_zcreate(9).
MFC after: 3 days
|
242151 |
26-Oct-2012 |
andre |
Move the corresponding MTX_SYSINIT() next to their struct mtx declaration to make their relationship more obvious as done with the other such mutexs.
|
242012 |
24-Oct-2012 |
kib |
Commit the actual text provided by Alan, instead of the wrong update in r242011.
MFC after: 1 week
|
242011 |
24-Oct-2012 |
kib |
Dirty the newly copied anonymous pages after the wired region is forked. Otherwise, pagedaemon might reclaim the page without saving its content into the swap file, resulting in the valid content replaced by zeroes.
Reported and tested by: pho Reviewed and comment update by: alc MFC after: 1 week
|
241896 |
22-Oct-2012 |
kib |
Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes.
Conducted and reviewed by: attilio Tested by: pho
|
241825 |
22-Oct-2012 |
eadler |
Print flags as hex instead of an integer.
PR: kern/168210 Submitted by: linimon Reviewed by: alc Approved by: cperciva MFC after: 3 days
|
241517 |
13-Oct-2012 |
alc |
Move vm_page_requeue() to the only file that uses it.
MFC after: 3 weeks
|
241512 |
13-Oct-2012 |
alc |
Eliminate the conditional for releasing the page queues lock in vm_page_sleep(). vm_page_sleep() is no longer called with this lock held.
Eliminate assertions that the page queues lock is NOT held. These assertions won't translate well to having distinct locks on the active and inactive page queues, and they really aren't that useful.
MFC after: 3 weeks
|
241155 |
03-Oct-2012 |
alc |
Tidy up a bit:
Update some of the comments. In particular, use "sleep" in preference to "block" where appropriate.
Eliminate some unnecessary casts.
Make a few whitespace changes for consistency.
Reviewed by: kib MFC after: 3 days
|
241025 |
28-Sep-2012 |
kib |
Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write.
Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode.
Tested by: pho (previous version) MFC after: 2 weeks
|
240862 |
23-Sep-2012 |
alc |
Address a race condition that was introduced in r238212. Unless the page queues lock is acquired before the page lock is released, there is no guarantee that the page will still be in that same page queue when vm_page_requeue() is called.
Reported by: pho In collaboration with: kib MFC after: 3 days
|
240741 |
20-Sep-2012 |
kib |
Plug the accounting leak for the wired pages when msync(MS_INVALIDATE) is performed on the vnode mapping which is wired in other address space.
While there, explicitely assert that the page is unwired and zero the wire_count instead of substract. The condition is rechecked later in vm_page_free(_toq) already.
Reported and tested by: zont Reviewed by: alc (previous version) MFC after: 1 week
|
240676 |
18-Sep-2012 |
glebius |
If caller specifies UMA_ZONE_OFFPAGE explicitly, then do not waste memory in an allocation for a slab.
Reviewed by: jeff
|
240518 |
14-Sep-2012 |
eadler |
Correct double "the the"
Approved by: cperciva MFC after: 3 days
|
240145 |
05-Sep-2012 |
zont |
- Simplify VM code by using vmspace_wired_count() for counting wired memory of a process.
Reviewed by: avg Approved by: kib (mentor) MFC after: 2 weeks
|
240134 |
05-Sep-2012 |
des |
Whitespace cleanup.
|
240113 |
04-Sep-2012 |
des |
No memory barrier is required. This was pointed out by kib@ a while ago, but I got distracted by other matters.
(for real this time)
|
240105 |
04-Sep-2012 |
des |
Revert previous commit, which was performed in the wrong tree.
|
240096 |
04-Sep-2012 |
des |
No memory barrier is required. This was pointed out by kib@ a while ago, but I got distracted by other matters.
|
240069 |
03-Sep-2012 |
zont |
- After r240026 sgrowsiz should be used in a safer maner.
Approved by: kib (mentor) MCF after: 1 week
|
239895 |
30-Aug-2012 |
zont |
- Remove accounting of locked memory from vsunlock(9) that I missed in r239818.
Approved by: kib (mentor)
|
239818 |
29-Aug-2012 |
zont |
- Don't take an account of locked memory for current process in vslock(9).
There are two consumers of vslock(9): sysctl code and drm driver. These consumers are using locked memory as transient memory, it doesn't belong to a process's memory.
Suggested by: avg Reviewed by: alc Approved by: kib (mentor) MFC after: 2 weeks
|
239723 |
27-Aug-2012 |
pluknet |
Typo in previous change: print half the theoretical maximum as maximum recommended amount.
Reported by: <site freebsd at orientalsensation com> Reviewed by: des
|
239710 |
26-Aug-2012 |
glebius |
Fix function name in keg_cachespread_init() assert.
|
239327 |
16-Aug-2012 |
des |
- When running out of swzone, instead of spewing an error message every tick until the situation is resolved (if ever), just print a single message when running out and another when space becomes available.
- When adding more swap, warn if the total amount exceeds half the theoretical maximum we can handle.
|
239250 |
14-Aug-2012 |
kib |
For old mmap syscall, when executing on amd64 or ia64, enforce the PROT_EXEC if prot is non-zero, process is 32bit and kern.elf32.i386_read_exec syscal is enabled. This workaround is needed for old i386 a.out binaries, where dynamic linker did not specified PROT_EXEC for mapping of the text.
The kern.elf32.i386_read_exec MIB name looks weird for a.out binaries, but I reused the existing knob which already has the needed semantic.
MFC after: 1 week
|
239247 |
14-Aug-2012 |
kib |
Adjust the r205536, by allowing a non-zero offset for anonymous mappings for a.out binaries. Apparently, a.out ld.so from FreeBSD 1.1.5.1 can issue such requests.
Reported and tested by: Dan Plassche <dplassche@gmail.com> MFC after: 1 week
|
239246 |
14-Aug-2012 |
kib |
Do not leave invalid pages in the object after the short read for a network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid.
Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success.
Noted and reviewed by: alc MFC after: 1 week
|
239121 |
07-Aug-2012 |
alc |
Never sleep on busy pages in vm_pageout_launder(), always skip them. Long ago, sleeping on busy pages in vm_pageout_launder() made sense. The call to vm_pageout_flush() specified asynchronous I/O and sleeping on busy pages blocked vm_pageout_launder() until the flush had completed. However, in CVS revision 1.35 of vm/vm_contig.c, the call to vm_pageout_flush() was changed to request synchronous I/O, but the sleep on busy pages was not removed.
|
239065 |
05-Aug-2012 |
kib |
After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages.
Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it.
Suggested and reviewed by: alc MFC after: 2 weeks
|
239040 |
04-Aug-2012 |
kib |
Reduce code duplication and exposure of direct access to struct vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES().
Reviewed by: alc MFC after: 1 week
|
238998 |
03-Aug-2012 |
alc |
Inline vm_page_aflags_clear() and vm_page_aflags_set().
Add comments stating that neither these functions nor the flags that they are used to manipulate are part of the KBI.
|
238915 |
30-Jul-2012 |
alc |
Eliminate an unneeded declaration. (I should have removed this as part of r227568.)
|
238791 |
26-Jul-2012 |
kib |
Do not requeue held page or page for which locking failed, just leave them alone.
Process the act_count updates for the held pages in the vm_pageout loop over the inactive queue, instead of refusing to do anything with such page.
Clarify the intent of the addl_page_shortage counter and change its use for pages which are not processed in the loop according to the description.
Reviewed by: alc MFC after: 2 weeks
|
238732 |
24-Jul-2012 |
alc |
Addendum to r238604. If the inactive queue scan isn't restarted, then the variable "addl_page_shortage_init" isn't needed.
X-MFC after: r238604
|
238604 |
18-Jul-2012 |
kib |
Do not restart scan of the inactive queue when non-inactive page is found. Rather, we shall not find such pages on inactive queue at all.
Requested and reviewed by: alc MFC after: 2 weeks
|
238561 |
18-Jul-2012 |
alc |
Move what remains of vm/vm_contig.c into vm/vm_pageout.c, where similar code resides. Rename vm_contig_grow_cache() to vm_pageout_grow_cache().
Reviewed by: kib
|
238543 |
17-Jul-2012 |
alc |
Correct vm_page_alloc_contig()'s implementation of VM_ALLOC_NODUMP.
|
238536 |
16-Jul-2012 |
alc |
Various improvements to vm_contig_grow_cache(). Most notably, even when it can't sleep, it can still move clean pages from the inactive queue to the cache. Also, when a page is cached, there is no need to restart the scan. The "next" page pointer held by vm_contig_launder() is still valid. Finally, add a comment summarizing what vm_contig_grow_cache() does based upon the value of "tries".
MFC after: 3 weeks
|
238510 |
15-Jul-2012 |
alc |
Correct an off-by-one error in vm_reserv_alloc_contig() that resulted in the last reservation of a multi-reservation allocation not being initialized.
|
238502 |
15-Jul-2012 |
mdf |
Fix a bug with memguard(9) on 32-bit architectures without a VM_KMEM_MAX_SIZE.
The code was not taking into account the size of the kernel_map, which the kmem_map is allocated from, so it could produce a sub-map size too large to fit. The simplest solution is to ignore VM_KMEM_MAX entirely and base the memguard map's size off the kernel_map's size, since this is always relevant and always smaller.
Found by: Justin Hibbits
|
238456 |
14-Jul-2012 |
alc |
If vm_contig_grow_cache() is allowed to sleep, then invoke the vm_lowmem handlers.
|
238452 |
14-Jul-2012 |
alc |
Move kmem_alloc_{attr,contig}() to vm/vm_kern.c, where similarly named functions reside. Correct the comment describing kmem_alloc_contig().
|
238359 |
11-Jul-2012 |
attilio |
Document the object type movements, related to swp_pager_copy(), in vm_object_collapse() and vm_object_split().
In collabouration with: alc MFC after: 3 days
|
238258 |
08-Jul-2012 |
kib |
Avoid vm page queues lock leak after r238212.
Reported and tested by: Michael Butler <imb protected-networks net> Reviewed by: alc Pointy hat to: kib MFC after: 20 days
|
238212 |
07-Jul-2012 |
kib |
Drop page queues mutex on each iteration of vm_pageout_scan over the inactive queue, unless busy page is found.
Dropping the mutex often should allow the other lock acquires to proceed without waiting for whole inactive scan to finish. On machines with lot of physical memory scan often need to iterate a lot before it finishes or finds a page which requires laundring, causing high latency for other lock waiters.
Suggested and reviewed by: alc MFC after: 3 weeks
|
238206 |
07-Jul-2012 |
eadler |
Add missing sleep stat increase
PR: kern/168211 Submitted by: linimon Reviewed by: alc Approved by: cperciva MFC after: 3 days
|
238180 |
06-Jul-2012 |
kib |
Style.
Reviewed by: alc (previous version) MFC after: 1 week
|
238000 |
02-Jul-2012 |
jhb |
Honor db_pager_quit in 'show uma' and 'show malloc'.
MFC after: 1 month
|
237623 |
27-Jun-2012 |
alc |
Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.
|
237451 |
22-Jun-2012 |
attilio |
- Add a comment explaining the locking of the cached pages pool held by vm_objects. - Add flags for the per-object lock and free pages queue mutex lock. Use the newly added flags to mark the cache root within the vm_object structure.
Please note that other vm_object members should be marked with correct locking but they are left for other commits.
In collabouration with: alc
MFC after: 3 days3 days3 days
|
237346 |
20-Jun-2012 |
alc |
Selectively inline vm_page_dirty().
|
237334 |
20-Jun-2012 |
jhb |
Move the per-thread deferred user map entries list into a private list in vm_map_process_deferred() which is then iterated to release map entries. This avoids having a nested vm map unlock operation called from the loop body attempt to recuse into vm_map_process_deferred(). This can happen if the vm_map_remove() triggers the OOM killer.
Reviewed by: alc, kib MFC after: 1 week
|
237172 |
16-Jun-2012 |
attilio |
Do a more targeted check on the page cache and avoid to check the cache pointer directly in vnode_pager_setsize() by using newly introduced vm_page_is_cached() function.
Reviewed by: alc MFC after: 2 weeks X-MFC: r234039,234064
|
237168 |
16-Jun-2012 |
alc |
The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer.
Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer.
As an added bonus, tidy up some nearby comments concerning page flags.
Reviewed by: kib MFC after: 6 weeks
|
236848 |
10-Jun-2012 |
kib |
Use the previous stack entry protection and max protection to correctly propagate the stack execution permissions when stack is grown down.
First, curproc->p_sysent->sv_stackprot specifies maximum allowed stack protection for current ABI, so the new stack entry was typically marked executable always. Second, for non-main stack MAP_STACK mapping, the PROT_ flags should be used which were specified at the mmap(2) call time, and not sv_stackprot.
MFC after: 1 week
|
236417 |
01-Jun-2012 |
eadler |
Revert r236380
PR: kern/166780 Requested by: many Approved by: cperciva (implicit)
|
236380 |
01-Jun-2012 |
eadler |
Add sysctl to query amount of swap space free
PR: kern/166780 Submitted by: Radim Kolar <hsn@sendmail.cz> Approved by: cperciva MFC after: 1 week
|
235854 |
23-May-2012 |
emax |
Tweak condition for disabling allocation from per-CPU buckets in low memory situation. I've observed a situation where per-CPU allocations were disabled while there were enough free cached pages. Basically, cnt.v_free_count was sitting stable at a value lower than cnt.v_free_min and that caused massive performance drop.
Reviewed by: alc MFC after: 1 week
|
235850 |
23-May-2012 |
kib |
Calculate the count of per-process cow faults. Export the count to userspace using the obscure spare int field in struct kinfo_proc.
Submitted by: Andrey Zonov <andrey zonov org> MFC after: 1 week
|
235829 |
23-May-2012 |
avg |
vm_pager_object_lookup: small performance optimization
do not needlessly lock an object if its handle doesn't match
Reviewed by: kib, alc MFC after: 1 week
|
235776 |
22-May-2012 |
andrew |
Fix booting on ARM.
In PHYS_TO_VM_PAGE() when VM_PHYSSEG_DENSE is set the check if we are past the end of vm_page_array was incorrect causing it to return NULL. This value is then used in vm_phys_add_page causing a data abort.
Reviewed by: alc, kib, imp Tested by: stas
|
235689 |
20-May-2012 |
nwhitehorn |
Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies range operations like pmap_remove() and pmap_protect() as well as allowing simple operations like pmap_extract() not to involve any global state. This substantially reduces lock coverages for the global table lock and improves concurrency.
|
235603 |
18-May-2012 |
kib |
Do not double-reference the found vm object in cdev_pager_lookup(). vm_pager_object_lookup() already referenced the object.
Note that there is no in-tree consumers of cdev_pager_lookup(). The only known user of the function is i915 gem driver, which is not yet imported. This should make the KPI change minor.
Submitted by: avg MFC after: 1 week
|
235375 |
12-May-2012 |
kib |
Add new pager type, OBJT_MGTDEVICE. It provides the device pager which carries fictitous managed pages. In particular, the consumers of the new object type can remove all mappings of the device page with pmap_remove_all().
The range of physical addresses used for fake page allocation shall be registered with vm_phys_fictitious_reg_range() interface to allow the PHYS_TO_VM_PAGE() to work in pmap.
Most likely, only i386 and amd64 pmaps can handle fictitious managed pages right now.
Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
|
235372 |
12-May-2012 |
kib |
Add a facility to register a range of physical addresses to be used for allocation of fictitious pages, for which PHYS_TO_VM_PAGE() returns proper fictitious vm_page_t. The range should be de-registered after consumer stopped using it.
De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate over registered ranges.
A hash container might be developed instead of range registration interface, and fake pages could be put automatically into the hash, were PHYS_TO_VM_PAGE() could look them up later. This should be considered before the MFC of the commit is done.
Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
|
235366 |
12-May-2012 |
kib |
Split the code from vm_page_getfake() to initialize the fake page struct vm_page into new interface vm_page_initfake(). Handle the case of fake page re-initialization with changed memattr.
Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
|
235365 |
12-May-2012 |
kib |
Assert that the page passed to vm_page_putfake() is unmanaged.
Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
|
235362 |
12-May-2012 |
kib |
Assert that fictitious or unmanaged pages do not appear on active/inactive lists.
Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
|
235359 |
12-May-2012 |
kib |
Commit the change forgotten in r235356.
Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
|
235356 |
12-May-2012 |
kib |
Make the vm_page_array_size long. Remove redundand zero initialization for vm_page_array_size and nearby variablees.
Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
|
235230 |
10-May-2012 |
alc |
Give vm_fault()'s sequential access optimization a makeover.
There are two aspects to the sequential access optimization: (1) read ahead of pages that are expected to be accessed in the near future and (2) unmap and cache behind of pages that are not expected to be accessed again. This revision changes both aspects.
The read ahead optimization is now more effective. It starts with the same initial read window as before, but arithmetically grows the window on sequential page faults. This can yield increased read bandwidth. For example, on one of my machines, a program using mmap() to read a file that is several times larger than the machine's physical memory takes about 17% less time to complete.
The unmap and cache behind optimization is now more selectively applied. The read ahead window must grow to its maximum size before unmap and cache behind is performed. This significantly reduces the number of times that pages are unmapped and cached only to be reactivated a short time later.
The unmap and cache behind optimization now clears each page's referenced flag. Previously, in the case of dirty pages, if the containing file was still mapped at the time that the page daemon examined the dirty pages, they would be reactivated.
From a stylistic standpoint, this revision also cleanly separates the implementation of the read ahead and unmap/cache behind optimizations.
Glanced at: kib MFC after: 2 weeks
|
234576 |
22-Apr-2012 |
nwhitehorn |
Avoid a lock order reversal in pmap_extract_and_hold() from relocking the page. This PMAP requires an additional lock besides the PMAP lock in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release.
Suggested by: kib MFC after: 4 days
|
234556 |
21-Apr-2012 |
kib |
When MAP_STACK mapping is created, the map entry is created only to cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent call to vm_map_wire() to wire the whole stack region fails due to VM_MAP_WIRE_NOHOLES flag.
Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack.
Reported and tested by: Sushanth Rai <sushanth_rai yahoo com> Reviewed by: alc MFC after: 1 week
|
234554 |
21-Apr-2012 |
alc |
As documented in vm_page.h, updates to the vm_page's flags no longer require the page queues lock.
MFC after: 1 week
|
234064 |
09-Apr-2012 |
attilio |
- Introduce a cache-miss optimization for consistency with other accesses of the cache member of vm_object objects. - Use novel vm_page_is_cached() for checks outside of the vm subsystem.
Reviewed by: alc MFC after: 2 weeks X-MFC: r234039
|
234039 |
08-Apr-2012 |
alc |
Fix mincore(2) so that it reports PG_CACHED pages as resident.
MFC after: 2 weeks
|
234038 |
08-Apr-2012 |
alc |
If a page belonging a reservation is cached, then mark the reservation so that it will be freed to the cache pool rather than the default pool. Otherwise, the cached pages within the reservation may be recycled sooner than necessary.
Reported by: Andrey Zonov
|
233960 |
06-Apr-2012 |
attilio |
Staticize vm_page_cache_remove().
Reviewed by: alc
|
233949 |
06-Apr-2012 |
nwhitehorn |
Reduce the frequency that the PowerPC/AIM pmaps invalidate instruction caches, by invalidating kernel icaches only when needed and not flushing user caches for shared pages.
Suggested by: kib MFC after: 2 weeks
|
233925 |
05-Apr-2012 |
jhb |
Add new ktrace records for the start and end of VM faults. This gives a pair of records similar to syscall entry and return that a user can use to determine how long page faults take. The new ktrace records are enabled via the 'p' trace type, and are enabled in the default set of trace points.
Reviewed by: kib MFC after: 2 weeks
|
233627 |
28-Mar-2012 |
mckusick |
Keep track of the mount point associated with a special device to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'.
Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected.
This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too.
Reviewed by: kib
|
233291 |
22-Mar-2012 |
alc |
Handle spurious page faults that may occur in no-fault sections of the kernel.
When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel.
In collaboration with: kib Tested by: gibbs (an earlier version)
I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem.
MFC after: 1 week
|
233194 |
19-Mar-2012 |
jhb |
Bah, just revert my earlier change entirely. (Missed alc's request to do this earlier.)
Requested by: alc
|
233191 |
19-Mar-2012 |
jhb |
Fix madvise(MADV_WILLNEED) to properly handle individual mappings larger than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int' count of pages overflowed on 64-bit platforms. While here, change vm_object_madvise() to accept two vm_pindex_t parameters (start and end) rather than a (start, count) tuple to match other VM APIs as suggested by alc@.
|
233190 |
19-Mar-2012 |
jhb |
Alter the previous commit to use vm_size_t instead of vm_pindex_t. vm_pindex_t is not a count of pages per se, it is more like vm_ooffset_t, but a page index instead of a byte offset.
|
233100 |
17-Mar-2012 |
kib |
In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag if the filesystem performed short write and we are skipping the page due to this.
Propogate write error from the pager back to the callers of vm_pageout_flush(). Report the failure to write a page from the requested range as the FALSE return value from vm_object_page_clean(), and propagate it back to msync(2) to return EIO to usermode.
While there, convert the clearobjflags variable in the vm_object_page_clean() and arguments of the helper functions to boolean.
PR: kern/165927 Reviewed by: alc MFC after: 2 weeks
|
232984 |
14-Mar-2012 |
jhb |
Pedantic nit: use vm_pindex_t instead of long for a count of pages.
|
232701 |
08-Mar-2012 |
jhb |
Add KTR_VFS traces to track modifications to a vnode's writecount.
|
232399 |
02-Mar-2012 |
alc |
Eliminate stale incorrect ARGSUSED comments.
Submitted by: bde
|
232288 |
29-Feb-2012 |
alc |
Simplify kmem_alloc() by eliminating code that existed on account of external pagers in Mach. FreeBSD doesn't implement external pagers. Moreover, it don't pageout the kernel object. So, the reasons for having code don't hold.
Reviewed by: kib MFC after: 6 weeks
|
232166 |
25-Feb-2012 |
alc |
Simplify vm_mmap()'s control flow.
Add a comment describing what vm_mmap_to_errno() does.
Reviewed by: kib MFC after: 3 weeks X-MFC after: r232071
|
232160 |
25-Feb-2012 |
alc |
Simplify vmspace_fork()'s control flow by copying immutable data before the vm map locks are acquired. Also, eliminate redundant initialization of the new vm map's timestamp.
Reviewed by: kib MFC after: 3 weeks
|
232103 |
24-Feb-2012 |
kib |
Place the if() at the right location, to activate the v_writecount accounting for shared writeable mappings for all filesystems, not only for the bypass layers.
Submitted by: alc Pointy hat to: kib MFC after: 20 days
|
232071 |
23-Feb-2012 |
kib |
Account the writeable shared mappings backed by file in the vnode v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero.
Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object.
Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be.
Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks
|
232002 |
22-Feb-2012 |
kib |
Remove wrong comment.
Discussed with: alc MFC after: 3 days
|
231819 |
16-Feb-2012 |
alc |
When vm_mmap() is used to map a vm object into a kernel vm_map, it makes no sense to check the size of the kernel vm_map against the user-level resource limits for the calling process.
Reviewed by: kib
|
231526 |
11-Feb-2012 |
kib |
Close a race due to dropping of the map lock between creating map entry for a shared mapping and marking the entry for inheritance. Other thread might execute vmspace_fork() in between (e.g. by fork(2)), resulting in the mapping becoming private.
Noted and reviewed by: alc MFC after: 1 week
|
231378 |
10-Feb-2012 |
ed |
Remove direct access to si_name.
Code should just use the devtoname() function to obtain the name of a character device. Also add const keywords to pieces of code that need it to build properly.
MFC after: 2 weeks
|
230877 |
01-Feb-2012 |
mav |
Fix NULL dereference panic on attempt to turn off (on system shutdown) disconnected swap device.
This is quick and imperfect solution, as swap device will still be opened and GEOM will not be able to destroy it. Proper solution would be to automatically turn off and close disconnected swap device, but with existing code it will cause panic if there is at least one page on device, even if it is unimportant page of the user-level process. It needs some work.
Reviewed by: kib@ MFC after: 1 week
|
230623 |
27-Jan-2012 |
kmacy |
exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create
Reviewed by: alc, avg MFC after: 2 weeks
|
230247 |
17-Jan-2012 |
nwhitehorn |
Revert r212360 now that PowerPC can handle large sparse arguments to pmap_remove() (changed in r228412).
MFC after: 2 weeks
|
229934 |
10-Jan-2012 |
kib |
Change the type of the paging_in_progress refcounter from u_short to u_int. With the auto-sized buffer cache on the modern machines, UFS metadata can generate more the 65535 pages belonging to the buffers undergoing i/o, overflowing the counter.
Reported and tested by: jimharris Reviewed by: alc MFC after: 1 week
|
229495 |
04-Jan-2012 |
kib |
Do not restart the scan in vm_object_page_clean() on the object generation change if requested mode is async. The object generation is only changed when the object is marked as OBJ_MIGHTBEDIRTY. For async mode it is enough to write each dirty page, not to make a guarantee that all pages are cleared after the vm_object_page_clean() returned.
Diagnosed by: truckman Tested by: flo Reviewed by: alc, truckman MFC after: 2 weeks
|
228936 |
28-Dec-2011 |
alc |
Optimize vm_object_split()'s handling of reservations.
|
228838 |
23-Dec-2011 |
kib |
Optimize the common case of msyncing the whole file mapping with MS_SYNC flag. The system must guarantee that all writes are finished before syscalls returned. Schedule the writes in async mode, which is much faster and allows the clustering to occur. Wait for writes using VOP_FSYNC(), since we are syncing the whole file mapping.
Potentially, the restriction to only apply the optimization can be relaxed by not requiring that the mapping cover whole file, as it is done by other OSes.
Reported and tested by: az Reviewed by: alc MFC after: 2 weeks
|
228567 |
16-Dec-2011 |
kib |
Move kstack_cache_entry into the private header, and make the stack cache list header accessible outside vm_glue.c.
MFC after: 1 week
|
228498 |
14-Dec-2011 |
eadler |
- The previous commit (r228449) accidentally moved the vm.stats.vm.* sysctls to vm.stats.sys. Move them back.
Noticed by: pho Reviewed by: bde (earlier version) Approved by: bz MFC after: 1 week Pointy hat to: me
|
228449 |
13-Dec-2011 |
eadler |
Document a large number of currently undocumented sysctls. While here fix some style(9) issues and reduce redundancy.
PR: kern/155491 PR: kern/155490 PR: kern/155489 Submitted by: Galimov Albert <wtfcrap@mail.ru> Approved by: bde Reviewed by: jhb MFC after: 1 week
|
228432 |
12-Dec-2011 |
kib |
Fix printf.
Submitted by: az MFC after: 1 week
|
228287 |
05-Dec-2011 |
alc |
Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to use superpage reservations. So, for the first time, kernel virtual memory that is allocated by contigmalloc(), kmem_alloc_attr(), and kmem_alloc_contig() can be promoted to superpages. In fact, even a series of small contigmalloc() allocations may collectively result in a promoted superpage.
Eliminate some duplication of code in vm_reserv_alloc_page().
Change the type of vm_reserv_reclaim_contig()'s first parameter in order that it be consistent with other vm_*_contig() functions.
Tested by: marius (sparc64)
|
228156 |
30-Nov-2011 |
kib |
Rename vm_page_set_valid() to vm_page_set_valid_range(). The vm_page_set_valid() is the most reasonable name for the m->valid accessor.
Reviewed by: attilio, alc
|
228133 |
29-Nov-2011 |
kib |
Hide the internals of vm_page_lock(9) from the loadable modules. Since the address of vm_page lock mutex depends on the kernel options, it is easy for module to get out of sync with the kernel.
No vm_page_lockptr() accessor is provided for modules. It can be added later if needed, unless proper KPI is developed to serve the needs.
Reviewed by: attilio, alc MFC after: 3 weeks
|
227788 |
21-Nov-2011 |
attilio |
Introduce the same mutex-wise fix in r227758 for sx locks.
The functions that offer file and line specifications are: - sx_assert_ - sx_downgrade_ - sx_slock_ - sx_slock_sig_ - sx_sunlock_ - sx_try_slock_ - sx_try_xlock_ - sx_try_upgrade_ - sx_unlock_ - sx_xlock_ - sx_xlock_sig_ - sx_xunlock_
Now vm_map locking is fully converted and can avoid to know specifics about locking procedures. Reviewed by: kib MFC after: 1 month
|
227758 |
20-Nov-2011 |
attilio |
Introduce macro stubs in the mutex implementation that will be always defined and will allow consumers, willing to provide options, file and line to locking requests, to not worry about options redefining the interfaces. This is typically useful when there is the need to build another locking interface on top of the mutex one.
The introduced functions that consumers can use are: - mtx_lock_flags_ - mtx_unlock_flags_ - mtx_lock_spin_flags_ - mtx_unlock_spin_flags_ - mtx_assert_ - thread_lock_flags_
Spare notes: - Likely we can get rid of all the 'INVARIANTS' specification in the ppbus code by using the same macro as done in this patch (but this is left to the ppbus maintainer) - all the other locking interfaces may require a similar cleanup, where the most notable case is sx which will allow a further cleanup of vm_map locking facilities - The patch should be fully compatible with older branches, thus a MFC is previewed (infact it uses all the underlying mechanisms already present).
Comments review by: eadler, Ben Kaduk Discussed with: kib, jhb MFC after: 1 month
|
227606 |
17-Nov-2011 |
alc |
Eliminate end-of-line white space.
|
227568 |
16-Nov-2011 |
alc |
Refactor the code that performs physically contiguous memory allocation, yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks.
From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things.
Reviewed by: kib Discussed with: jhb
|
227530 |
15-Nov-2011 |
kib |
Update the device pager interface, while keeping the compatibility layer for old KPI and KBI. New interface should be used together with d_mmap_single cdevsw method.
Device pager can be allocated with the cdev_pager_allocate(9) function, which takes struct cdev_pager_ops, containing constructor/destructor and page fault handler methods supplied by driver.
Constructor and destructor, called at the pager allocation and deallocation time, allow the driver to handle per-object private data.
The pager handler is called to handle page fault on the vm map entry backed by the driver pager. Driver shall return either the vm_page_t which should be mapped, or error code (which does not cause kernel panic anymore). The page handler interface has a placeholder to specify the access mode causing the fault, but currently PROT_READ is always passed there.
Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
|
227529 |
15-Nov-2011 |
kib |
Remove the condition that is always true.
Submitted by: alc MFC after: 1 week
|
227309 |
07-Nov-2011 |
ed |
Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
|
227127 |
06-Nov-2011 |
alc |
Wake up the page daemon in vm_page_alloc_freelist() if it couldn't allocate the requested page because too few pages are cached or free.
Document the VM_ALLOC_COUNT() option to vm_page_alloc() and vm_page_alloc_freelist().
Make style changes to vm_page_alloc() and vm_page_alloc_freelist(), such as using a variable name that more closely corresponds to the comments.
|
227103 |
05-Nov-2011 |
kib |
Remove redundand definitions. The chunk was missed from r227102.
MFC after: 2 weeks
|
227102 |
05-Nov-2011 |
kib |
Provide typedefs for the type of bit mask for the page bits. Use the defined types instead of int when manipulating masks. Supposedly, it could fix support for 32KB page size in the machine-independend VM layer.
Reviewed by: alc MFC after: 2 weeks
|
227072 |
04-Nov-2011 |
alc |
Simplify the implementation of the failure case in kmem_alloc_attr().
|
227070 |
04-Nov-2011 |
jhb |
Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files.
Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor.
The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible.
To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files.
Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
|
227012 |
02-Nov-2011 |
alc |
Add support for VM_ALLOC_WIRED and VM_ALLOC_ZERO to vm_page_alloc_freelist() and use these new options in the mips pmap.
Wake up the page daemon in vm_page_alloc_freelist() if the number of free and cached pages becomes too low.
Tidy up vm_page_alloc_init(). In particular, add a comment about an important restriction on its use.
Tested by: jchandra@
|
226928 |
30-Oct-2011 |
alc |
Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt at eliminating duplicated code in the various pmap implementations.
Micro-optimize vm_phys_free_pages().
Introduce vm_phys_free_contig(). It is fast routine for freeing an arbitrary number of physically contiguous pages. In particular, it doesn't require the number of pages to be a power of two.
Use "u_long" instead of "unsigned long".
Bruce Evans (bde@) has convinced me that the "boundary" parameters to kmem_alloc_contig(), vm_phys_alloc_contig(), and vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not "u_long". Make this change.
|
226891 |
28-Oct-2011 |
alc |
Use "u_long" instead of "unsigned long".
|
226848 |
27-Oct-2011 |
alc |
Tidy up the comment at the head of vm_page_alloc, and mention that the returned page has the flag VPO_BUSY set.
|
226843 |
27-Oct-2011 |
alc |
Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.
|
226824 |
27-Oct-2011 |
alc |
contigmalloc(9) and contigfree(9) are now implemented in terms of other more general VM system interfaces. So, their implementation can now reside in kern_malloc.c alongside the other functions that are declared in malloc.h.
|
226740 |
25-Oct-2011 |
alc |
Speed up vm_page_cache() and vm_page_remove() by checking for a few common cases that can be handled in constant time. The insight being that a page's parent in the vm object's tree is very often its predecessor or successor in the vm object's ordered memq.
Tested by: jhb MFC after: 10 days
|
226642 |
22-Oct-2011 |
attilio |
VN_NRESERVLEVEL is used in this file but opt_vm is not included thus the stub switch won't be correctly handled. Include opt_vm.h.
Submitted by: jeff MFC after: 3 days
|
226388 |
15-Oct-2011 |
kib |
Control the execution permission of the readable segments for i386 binaries on the amd64 and ia64 with the sysctl, instead of unconditionally enabling it.
Reviewed by: marcel
|
226366 |
14-Oct-2011 |
jhb |
Fix a typo in a comment.
|
226343 |
13-Oct-2011 |
marcel |
In sys_obreak() and when compiling for amd64 or ia64, when the process is ILP32 (i.e. i386) grant execute permissions by default. The JDK 1.4.x depends on being able to execute from the heap on i386.
|
226313 |
12-Oct-2011 |
glebius |
Make memguard(9) capable to guard uma(9) allocations.
|
225856 |
29-Sep-2011 |
kib |
Style nit.
Submitted by: jhb MFC after: 2 weeks
|
225843 |
28-Sep-2011 |
kib |
Fix grammar.
Submitted by: bf MFC after: 2 weeks
|
225840 |
28-Sep-2011 |
kib |
Use the trick of performing the atomic operation on the contained aligned word to handle the dirty mask updates in vm_page_clear_dirty_mask(). Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold() the sole purpose of which was to protect dirty on architectures which does not provide short or byte-wide atomics.
Reviewed by: alc, attilio Tested by: flo (sparc64) MFC after: 2 weeks
|
225838 |
28-Sep-2011 |
kib |
Use the explicitly-sized types for the dirty and valid masks.
Requested by: attilio Reviewed by: alc MFC after: 2 weeks
|
225617 |
16-Sep-2011 |
kmacy |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls.
Reviewed by: rwatson Approved by: re (bz)
|
225418 |
06-Sep-2011 |
kib |
Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced.
Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
|
225089 |
22-Aug-2011 |
kib |
Update some comments in swap_pager.c.
Reviewed and most wording by: alc MFC after: 1 week Approved by: re (bz)
|
225076 |
22-Aug-2011 |
kib |
Apply the limit to avoid the overflows in the radix tree subr_blist.c after the conversion of the swap device size to the page size units, not before. That lifts the limit on the usable swap partition size from 32GB to 256GB, that is less depressing for the modern systems.
Submitted by: Alexander V. Chernikov <melifaro ipfw ru> Reviewed by: alc Approved by: re (bz) MFC after: 2 weeks
|
224778 |
11-Aug-2011 |
rwatson |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0:
Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op.
Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions.
In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit.
Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent.
Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
224746 |
09-Aug-2011 |
kib |
- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged.
Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
|
224689 |
07-Aug-2011 |
alc |
Fix an error in kmem_alloc_attr(). Unless "tries" is updated, kmem_alloc_attr() could get stuck in a loop.
Approved by: re (kib) MFC after: 3 days
|
224582 |
01-Aug-2011 |
kib |
Implement the linprocfs swaps file, providing information about the configured swap devices in the Linux-compatible format.
Based on the submission by: Robert Millan <rmh debian org> PR: kern/159281 Reviewed by: bde Approved by: re (kensmith) MFC after: 2 weeks
|
224522 |
30-Jul-2011 |
kib |
Fix a race in the device pager allocation. If another thread won and allocated the device pager for the given handle, then the object fictitious pages list and the object membership in the global object list still need to be initialized. Otherwise, dev_pager_dealloc() will traverse uninitialized pointers.
Reported and tested by: pho Reviewed by: jhb Approved by: re (kensmith) MFC after: 1 week
|
223914 |
10-Jul-2011 |
kib |
Extract the code to translate VM error into errno, into an exported function vm_mmap_to_errno(). It is useful for the drivers that implement mmap(2)-like functionality, to be able to return error codes consistent with mmap(2).
Sponsored by: The FreeBSD Foundation No objections from: alc MFC after: 1 week
|
223913 |
10-Jul-2011 |
kib |
Style.
MFC after: 3 days
|
223889 |
09-Jul-2011 |
kib |
Add a facility to disable processing page faults. When activated, uiomove generates EFAULT if any accessed address is not mapped, as opposed to handling the fault.
Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version)
|
223825 |
06-Jul-2011 |
trasz |
All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
|
223823 |
06-Jul-2011 |
attilio |
Handle a race between device_pager and devsw in a more graceful manner: return an error code rather than panic the kernel.
Sponsored by: Sandvine Incorporated Reviewed by: kib Tested by: pho MFC after: 2 weeks
|
223729 |
02-Jul-2011 |
alc |
Initialize marker pages as held rather than fictitious/wired. Marking the page as held is more useful as a safety precaution in case someone forgets to check for PG_MARKER.
Reviewed by: kib
|
223677 |
29-Jun-2011 |
alc |
Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages.
This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages.
Update all of the existing assertions on pmap_remove_all() to reflect this change.
Reviewed by: kib
|
223464 |
23-Jun-2011 |
alc |
Revert to using the page queues lock in vm_page_clear_dirty_mask() on MIPS. (At present, although atomic_clear_char() is defined by atomic.h on MIPS, it is not actually implemented by support.S.)
|
223307 |
19-Jun-2011 |
alc |
Precisely document the synchronization rules for the page's dirty field. (Saying that the lock on the object that the page belongs to must be held only represents one aspect of the rules.)
Eliminate the use of the page queues lock for atomically performing read- modify-write operations on the dirty field when the underlying architecture supports atomic operations on char and short types.
Document the fact that 32KB pages aren't really supported.
Reviewed by: attilio, kib
|
222992 |
11-Jun-2011 |
kib |
Assert that page is VPO_BUSY or page owner object is locked in vm_page_undirty(). The assert is not precise due to VPO_BUSY owner to tracked, so assertion does not catch the case when VPO_BUSY is owned by other thread.
Reviewed by: alc
|
222991 |
11-Jun-2011 |
kib |
Fix a bug in r222586. Lock the page owner object around the modification of the m->dirty.
Reported and tested by: nwhitehorn Reviewed by: alc
|
222586 |
01-Jun-2011 |
kib |
In the VOP_PUTPAGES() implementations, change the default error from VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return VM_PAGER_AGAIN for the partially written page. Always forward at least one page in the loop of vm_object_page_clean().
VM_PAGER_ERROR causes the page reactivation and does not clear the page dirty state, so the write is not lost.
The change fixes an infinite loop in vm_object_page_clean() when the filesystem returns permanent errors for some page writes.
Reported and tested by: gavin Reviewed by: alc, rmacklem MFC after: 1 week
|
222184 |
22-May-2011 |
alc |
Correct an error in r222163. Unless UMA_MD_SMALL_ALLOC is defined, startup_alloc() must be used until uma_startup2() is called.
Reported by: jh
|
222163 |
21-May-2011 |
alc |
1. Prior to r214782, UMA did not support multipage allocations before uma_startup2() was called. Thus, setting the variable "booted" to true in uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because any allocations made after uma_startup() but before uma_startup2() could be satisfied by uma_small_alloc(). Now, however, some multipage allocations are necessary before uma_startup2() just to allocate zone structures on machines with a large number of processors. Thus, a Boolean can no longer effectively describe the state of the UMA allocator. Instead, make "booted" have three values to describe how far initialization has progressed. This allows multipage allocations to continue using startup_alloc() until uma_startup2(), but single-page allocations may begin using uma_small_alloc() after uma_startup().
2. With the aforementioned change, only a modest increase in boot pages is necessary to boot UMA on a large number of processors.
3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM. It has only been used between r182028 and r204128.
Reviewed by: attilio [1], nwhitehorn [3] Tested by: sbruno
|
222137 |
20-May-2011 |
alc |
Fix spelling errors.
|
222132 |
20-May-2011 |
alc |
Eliminate a redundant #include. ("vm/vm_param.h" already includes "machine/vmparam.h".)
|
221855 |
13-May-2011 |
mdf |
Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware.
Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment).
Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit.
Requested by: alc MFC after: 1 week MFC with: r221853
|
221853 |
13-May-2011 |
mdf |
Usa a globally visible region of zeros for both /dev/zero and the md device. There are likely other kernel uses of "blob of zeros" than can be converted.
Reviewed by: alc MFC after: 1 week
|
221714 |
09-May-2011 |
mlaier |
Another long standing vm bug found at Isilon: Fix a race between vm_object_collapse and vm_fault.
Reviewed by: alc@ MFC after: 3 days
|
221096 |
26-Apr-2011 |
obrien |
Reap old SPL comments.
Reviewed by: alc
|
220977 |
23-Apr-2011 |
kib |
Fix two bugs in r218670.
Hold the vnode around the region where object lock is dropped, until vnode lock is acquired.
Do not drop the vnode reference for a case when the object was deallocated during unlock. Note that in this case, VV_TEXT is cleared by vnode_pager_dealloc().
Reported and tested by: pho Reviewed by: alc MFC after: 3 days
|
220390 |
06-Apr-2011 |
jhb |
Fix several places to ignore processes that are not yet fully constructed.
MFC after: 1 week
|
220387 |
06-Apr-2011 |
trasz |
In vm_daemon(), do not skip processes stopped with SIGSTOP.
|
220386 |
06-Apr-2011 |
trasz |
Add RACCT_RSS.
Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
|
220373 |
05-Apr-2011 |
trasz |
Add accounting for most of the memory-related resources.
Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
|
220001 |
25-Mar-2011 |
kib |
Handle the corner case in vm_fault_quick_hold_pages().
If supplied length is zero, and user address is invalid, function might return -1, due to the truncation and rounding of the address. The callers interpret the situation as EFAULT. Instead of handling the zero length in caller, filter it in vm_fault_quick_hold_pages().
Sponsored by: The FreeBSD Foundation Reviewed by: alc
|
219968 |
24-Mar-2011 |
jhb |
Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.
MFC after: 1 week
|
219819 |
21-Mar-2011 |
jeff |
- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
|
219727 |
18-Mar-2011 |
trasz |
In vm_daemon(), when iterating over all processes in the system, skip those which are not yet fully initialized (i.e. ones with p_state == PRS_NEW). Without it, we could panic in _thread_lock_flags().
Note that there may be other instances of FOREACH_PROC_IN_SYSTEM() that require similar fix.
Reported by: pho, keramida Discussed with: kib
|
219476 |
11-Mar-2011 |
alc |
Eliminate duplication of the fake page code and zone by the device and sg pagers.
Reviewed by: jhb
|
219124 |
01-Mar-2011 |
brucec |
Change the return type of vmspace_swap_count to a long to match the other vmspace_*_count functions.
MFC after: 3 days
|
218989 |
24-Feb-2011 |
pluknet |
Remove sysctl vm.max_proc_mmap used to protect from KVA space exhaustion. As it was pointed out by Alan Cox, that no longer serves its purpose with the modern UMA allocator compared to the old one used in 4.x days.
The removal of sysctl eliminates max_proc_mmap type overflow leading to the broken mmap(2) seen with large amount of physical memory on arches with factually unbound KVA space (such as amd64). It was found that slightly less than 256GB of physmem was enough to trigger the overflow.
Reviewed by: alc, kib Approved by: avg (mentor) MFC after: 2 months
|
218966 |
23-Feb-2011 |
brucec |
Calculate and return the count in vmspace_swap_count as a vm_offset_t instead of an int to avoid overflow.
While here, clean up some style(9) issues.
PR: kern/152200 Reviewed by: kib MFC after: 2 weeks
|
218773 |
17-Feb-2011 |
alc |
Remove pmap fields that are either unused or not fully implemented.
Discussed with: kib
|
218701 |
15-Feb-2011 |
kib |
Since r218070 reenabled the call to vm_map_simplify_entry() from vm_map_insert(), the kmem_back() assumption about newly inserted entry might be broken due to interference of two factors. In the low memory condition, when vm_page_alloc() returns NULL, supplied map is unlocked. If another thread performs kmem_malloc() meantime, and its map entry is placed right next to our thread map entry in the map, both entries wire count is still 0 and entries are coalesced due to vm_map_simplify_entry().
Mark new entry with MAP_ENTRY_IN_TRANSITION to prevent coalesce. Fix some style issues, tighten the assertions to account for MAP_ENTRY_IN_TRANSITION state.
Reported and tested by: pho Reviewed by: alc
|
218670 |
13-Feb-2011 |
kib |
Lock the vnode around clearing of VV_TEXT flag. Remove mp_fixme() note mentioning that vnode lock is needed.
Reviewed by: alc Tested by: pho MFC after: 1 week
|
218592 |
12-Feb-2011 |
jmallett |
Use CPU_FOREACH rather than expecting CPUs 0 through mp_ncpus-1 to be present. Don't micro-optimize the uniprocessor case; use the same loop there.
Submitted by: Bhanu Prakash Reviewed by: kib, jhb
|
218589 |
12-Feb-2011 |
alc |
Retire VFS_BIO_DEBUG. Convert those checks that were still valid into KASSERT()s and eliminate the rest.
Replace excessive printf()s and a panic() in bufdone_finish() with a KASSERT() in vm_page_io_finish().
Reviewed by: kib
|
218345 |
05-Feb-2011 |
alc |
Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are incorrectly calling vm_object_page_clean(). They are passing the length of the range rather than the ending offset of the range.
Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the callers.
Reviewed by: kib MFC after: 3 weeks
|
218304 |
04-Feb-2011 |
alc |
Since the last parameter to vm_object_shadow() is a vm_size_t and not a vm_pindex_t, it makes no sense for its callers to perform atop(). Let vm_object_shadow() do that instead.
|
218113 |
31-Jan-2011 |
alc |
Release the free page queues lock earlier in vm_page_alloc().
Discussed with: kib@
|
218070 |
29-Jan-2011 |
alc |
Reenable the call to vm_map_simplify_entry() from vm_map_insert() for non- MAP_STACK_* entries. (See r71983 and r74235.)
In some cases, performing this call to vm_map_simplify_entry() halves the number of vm map entries used by the Sun JDK.
|
217916 |
27-Jan-2011 |
mdf |
Explicitly wire the user buffer rather than doing it implicitly in sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT drain for extremely large amounts of data where the caller knows that appropriate references are held, and sleeping is not an issue.
Inspired by: rwatson
|
217688 |
21-Jan-2011 |
pluknet |
Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
|
217529 |
18-Jan-2011 |
alc |
Move the definition of M_VMPGDATA to the swap pager, where the only remaining uses are.
|
217508 |
17-Jan-2011 |
alc |
Explicitly initialize the page's queue field to PQ_NONE instead of relying on PQ_NONE being zero.
Redefine PQ_NONE and PQ_COUNT so that a page queue isn't allocated for PQ_NONE.
Reviewed by: kib@
|
217482 |
16-Jan-2011 |
alc |
Sort function prototypes.
|
217479 |
16-Jan-2011 |
alc |
Update a lock annotation on the page structure.
|
217478 |
16-Jan-2011 |
alc |
Shift responsibility for synchronizing access to the page's act_count field to the object's lock.
Reviewed by: kib@
|
217477 |
16-Jan-2011 |
alc |
Clean up the start of vm_page_alloc(). In particular, eliminate an assertion that is no longer required. Long ago, calls to vm_page_alloc() from an interrupt handler had to specify VM_ALLOC_INTERRUPT so that vm_page_alloc() would not attempt to reclaim a PQ_CACHE page from another vm object. Today, with the synchronization on a vm object's collection of PQ_CACHE pages, this is no longer an issue. In fact, VM_ALLOC_INTERRUPT now reclaims PQ_CACHE pages just like VM_ALLOC_{NORMAL,SYSTEM}.
MFC after: 3 weeks
|
217463 |
15-Jan-2011 |
kib |
For consistency, use kernel_object instead of &kernel_object_store when initializing the object mutex. Do the same for kmem_object.
Discussed with: alc MFC after: 1 week
|
217453 |
15-Jan-2011 |
alc |
For some time now, the kernel and kmem objects have been ordinary OBJT_PHYS objects. Thus, there is no need for handling them specially in vm_fault(). In fact, this special case handling would have led to an assertion failure just before the call to pmap_enter().
Reviewed by: kib@ MFC after: 6 weeks
|
217265 |
11-Jan-2011 |
jhb |
Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes.
Reviewed by: bde
|
217192 |
09-Jan-2011 |
kib |
Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out.
Comments wording and reviewed by: alc
|
217177 |
09-Jan-2011 |
alc |
Eliminate a redundant alignment directive on the page locks array.
|
217171 |
08-Jan-2011 |
alc |
Eliminate the counting of vm_page_pa_tryrelock calls. We really don't need it anymore. Moreover, its implementation had a type mismatch, a long is not necessarily an uint64_t. (This mismatch was hidden by casting.) Move the remaining two counters up a level in the sysctl hierarchy. There is no reason for them to be under the vm.pmap node.
Reviewed by: kib
|
216899 |
03-Jan-2011 |
alc |
Release the page lock early in vm_pageout_clean(). There is no reason to hold this lock until the end of the function.
With the aforementioned change to vm_pageout_clean(), page locks don't need to support recursive (MTX_RECURSE) or duplicate (MTX_DUPOK) acquisitions.
Reviewed by: kib
|
216874 |
01-Jan-2011 |
alc |
Make a couple refinements to r216799 and r216810. In particular, revise a comment and move it to its proper place.
Reviewed by: kib
|
216873 |
01-Jan-2011 |
brucec |
There can be more than 0x20000000 swap meta blocks allocated if a swap-backed md(4) device is used. Don't panic when deallocating such a device if swap has been used.
PR: kern/133170 Discussed with: kib MFC after: 3 days
|
216810 |
29-Dec-2010 |
kib |
Remove OBJ_CLEANING flag. The vfs_setdirty_locked_object() is the only consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY was cleared early in vm_object_page_clean, before the cleaning pass was done. This is no longer true after r216799.
Moreover, since OBJ_CLEANING is a flag, and not the counter, it could be reset too prematurely when parallel vm_object_page_clean() are performed.
Reviewed by: alc (as a part of the bigger patch) MFC after: 1 month (after r216799 is merged)
|
216807 |
29-Dec-2010 |
alc |
There is no point in vm_contig_launder{,_page}() flushing held pages, instead skip over them. As long as a page is held, it can't be reclaimed by contigmalloc(M_WAITOK). Moreover, a held page may be undergoing modification, e.g., vmapbuf(), so even if the hold were released before the completion of contigmalloc(), the page might have to be flushed again.
MFC after: 3 weeks
|
216799 |
29-Dec-2010 |
kib |
Move the increment of vm object generation count into vm_object_set_writeable_dirty().
Fix an issue where restart of the scan in vm_object_page_clean() did not removed write permissions for newly added pages or, if the mapping for some already scanned page changed to writeable due to fault. Merge the two loops in vm_object_page_clean(), doing the remove of write permission and cleaning in the same loop. The restart of the loop then correctly downgrade writeable mappings.
Fix an issue where a second caller to msync() might actually return before the first caller had actually completed flushing the pages. Clear the OBJ_MIGHTBEDIRTY flag after the cleaning loop, not before.
Calls to pmap_is_modified() are not needed after pmap_remove_write() there.
Proposed, reviewed and tested by: alc MFC after: 1 week
|
216772 |
28-Dec-2010 |
alc |
Correct a typo in vm_fault_quick_hold_pages().
Reported by: Bartosz Stec
|
216731 |
27-Dec-2010 |
alc |
Move vm_object_print()'s prototype to the expected place.
|
216701 |
26-Dec-2010 |
alc |
Retire vm_fault_quick(). It's no longer used.
Reviewed by: kib@
|
216699 |
25-Dec-2010 |
alc |
Introduce and use a new VM interface for temporarily pinning pages. This new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel.
In collaboration with: kib@
|
216604 |
20-Dec-2010 |
alc |
Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold().
In collaboration with: kib@ MFC after: 6 weeks
|
216511 |
17-Dec-2010 |
alc |
Implement and use a single optimized function for unholding a set of pages.
Reviewed by: kib@
|
216425 |
14-Dec-2010 |
alc |
Change memguard_fudge() so that it can handle km_max being zero. Not every platform defines VM_KMEM_SIZE_MAX, and on those platforms km_max will be zero.
Reviewed by: mdf Tested by: marius
|
216335 |
09-Dec-2010 |
mlaier |
Fix a long standing (from the original 4.4BSD lite sources) race between vmspace_fork and vm_map_wire that would lead to "vm_fault_copy_wired: page missing" panics. While faulting in pages for a map entry that is being wired down, mark the containing map as busy. In vmspace_fork wait until the map is unbusy, before we try to copy the entries.
Reviewed by: kib MFC after: 5 days Sponsored by: Isilon Systems, Inc.
|
216319 |
09-Dec-2010 |
jchandra |
Revert the vm/vm_page.c change in r216317.
This adds back changes in r216141, which was reverted by the above check in.
|
216317 |
09-Dec-2010 |
jchandra |
swi_vm() for mips.
|
216186 |
04-Dec-2010 |
trasz |
Fix comment intentation.
|
216141 |
03-Dec-2010 |
imp |
To make minidumps work properly on mips for memory that's direct mapped and entered via vm_page_setup, keep track of it like we do for amd64.
# A separate commit will be made to move this to a capability-based ifdef # rather than arch-based ifdef.
Submitted by: alc@ MFC after: 1 week
|
216128 |
02-Dec-2010 |
trasz |
Replace pointer to "struct uidinfo" with pointer to "struct ucred" in "struct vm_object". This is required to make it possible to account for per-jail swap usage.
Reviewed by: kib@ Tested by: pho@ Sponsored by: FreeBSD Foundation
|
216090 |
01-Dec-2010 |
alc |
Correct an error in the allocation of the vm_page_dump array in vm_page_startup(). Specifically, the dump_avail array should be used instead of the phys_avail array to calculate the size of vm_page_dump. For example, the pages for the message buffer are allocated prior to vm_page_startup() by subtracting them from the last entry in the phys_avail array, but the first thing that vm_page_startup() does after creating the vm_page_dump array is to set the bits corresponding to the message buffer pages in that array. However, these bits might not actually exist in the array, because the size of the array is determined by the current value in the last entry of the phys_avail array. In general, the only reason why this doesn't always result in an out-of-bounds array access is that the size of the vm_page_dump array is rounded up to the next page boundary. This change eliminates that dependence on rounding (and luck).
MFC after: 6 weeks
|
215973 |
28-Nov-2010 |
jchandra |
Fix issue noted by alc while reviewing r215938: The current implementation of vm_page_alloc_freelist() does not handle order > 0 correctly. Remove order parameter to the function and use it only for order 0 pages.
Submitted by: alc
|
215796 |
24-Nov-2010 |
kib |
After the sleep caused by encountering a busy page, relookup the page.
Submitted and reviewed by: alc Reprted and tested by: pho MFC after: 5 days
|
215610 |
21-Nov-2010 |
kib |
Eliminate the mab, maf arrays and related variables.
The change also fixes off-by-one error in the calculation of mreq.
Suggested and reviewed by: alc Tested by: pho MFC after: 5 days
|
215597 |
20-Nov-2010 |
alc |
Optimize vm_object_terminate().
Reviewed by: kib MFC after: 1 week
|
215574 |
20-Nov-2010 |
kib |
The runlen returned from vm_pageout_flush() might be zero legitimately, when mreq page has status VM_PAGER_AGAIN.
MFC after: 5 days
|
215538 |
19-Nov-2010 |
alc |
Reduce the amount of detail printed by vm_page_free_toq() when it panics.
Reviewed by: kib
|
215508 |
19-Nov-2010 |
mlaier |
Off by one page in vm_reserv_reclaim_contig(): Also reclaim reservations with only a single free page if that satisfies the requested size.
MFC after: 3 days Reviewed by: alc
|
215471 |
18-Nov-2010 |
kib |
vm_pageout_flush() might cache the pages that finished write to the backing storage. Such pages might be then reused, racing with the assert in vm_object_page_collect_flush() that verified that dirty pages from the run (most likely, pages with VM_PAGER_AGAIN status) are write-protected still. In fact, the page indexes for the pages that were removed from the object page list should be ignored by vm_object_page_clean().
Return the length of successfully written run from vm_pageout_flush(), that is, the count of pages between requested page and first page after requested with status VM_PAGER_AGAIN. Supply the requested page index in the array to vm_pageout_flush(). Use the returned run length to forward the index of next page to clean in vm_object_page_clean().
Reported by: avg Reviewed by: alc MFC after: 1 week
|
215469 |
18-Nov-2010 |
kib |
Only increment object generation count when inserting the page into object page list. The only use of object generation count now is a restart of the scan in vm_object_page_clean(), which makes sense to do on the page addition. Page removals do not affect the dirtiness of the object, as well as manipulations with the shadow chain.
Suggested and reviewed by: alc MFC after: 1 week
|
215321 |
14-Nov-2010 |
kib |
Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix.
Reported and tested by: swell.k gmail com MFC after: 1 week
|
215309 |
14-Nov-2010 |
kib |
Use symbolic names instead of hardcoding values for magic p_osrel constants.
MFC after: 1 week
|
215307 |
14-Nov-2010 |
kib |
Implement a (soft) stack guard page for auto-growing stack mappings. The unmapped page separates the tip of the stack and possible adjanced segment, making some uses of stack overflow harder. The stack growing code refuses to expand the segment to the last page of the reseved region when sysctl security.bsd.stack_guard_page is set to 1. The default value for sysctl and accompanying tunable is 0.
Please note that mmap(MAP_FIXED) still can place a mapping right up to the stack, making continuous region.
Reviewed by: alc MFC after: 1 week
|
215093 |
10-Nov-2010 |
alc |
Enable reservation-based physical memory allocation. Even without the creation of large page mappings in the pmap, it can provide modest performance benefits. In particular, for a "buildworld" on a 2x 1GHz Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system time by 12.6%.
Tested by: marius@
|
214953 |
07-Nov-2010 |
alc |
In case the stack size reaches its limit and its growth must be restricted, ensure that grow_amount is a multiple of the page size. Otherwise, the kernel may crash in swap_reserve_by_uid() on HEAD and FreeBSD 8.x, and produce a core file with a missing stack on FreeBSD 7.x.
Diagnosed and reported by: jilles Reviewed by: kib MFC after: 1 week
|
214903 |
07-Nov-2010 |
gonzo |
- Add minidump support for FreeBSD/mips
|
214782 |
04-Nov-2010 |
jhb |
Update startup_alloc() to support multi-page allocations and allow internal zones whose objects are larger than a page to use startup_alloc(). This allows allocation of zone objects during early boot on machines with a large number of CPUs since the resulting zone objects are larger than a page.
Submitted by: trema Reviewed by: attilio MFC after: 1 week
|
214564 |
30-Oct-2010 |
alc |
Correct some format strings used by sysctls.
MFC after: 1 week
|
214144 |
21-Oct-2010 |
jhb |
- Make 'vm_refcnt' volatile so that compilers won't be tempted to treat its value as a loop invariant. Currently this is a no-op because 'atomic_cmpset_int()' clobbers all memory on current architectures. - Use atomic_fetchadd_int() instead of an atomic_cmpset_int() loop to drop a reference in vmspace_free().
Reviewed by: alc MFC after: 1 month
|
214095 |
20-Oct-2010 |
avg |
PG_BUSY -> VPO_BUSY, PG_WANTED -> VPO_WANTED in manual pages and comments
Reviewed by: alc MFC after: 4 days
|
214062 |
19-Oct-2010 |
mdf |
uma_zfree(zone, NULL) should do nothing, to match free(9).
Noticed by: Ron Steinke <rsteinke at isilon dot com> MFC after: 3 days
|
213911 |
16-Oct-2010 |
lstewart |
Change uma_zone_set_max to return the effective value of "nitems" after rounding. The same value can also be obtained with uma_zone_get_max, but this change avoids a caller having to make two back-to-back calls.
Sponsored by: FreeBSD Foundation Reviewed by: gnn, jhb
|
213910 |
16-Oct-2010 |
lstewart |
- Simplify implementation of uma_zone_get_max. - Add uma_zone_get_cur which returns the current approximate occupancy of a zone. This is useful for providing stats via sysctl amongst other things.
Sponsored by: FreeBSD Foundation Reviewed by: gnn, jhb MFC after: 2 weeks
|
213408 |
04-Oct-2010 |
alc |
If vm_map_find() is asked to allocate a superpage-aligned region of virtual addresses that is greater than a superpage in size but not a multiple of the superpage size, then vm_map_find() is not always expanding the kernel pmap to support the last few small pages being allocated. These failures are not commonplace, so this was first noticed by someone porting FreeBSD to a new architecture. Previously, we grew the kernel page table in vm_map_findspace() when we found the first available virtual address. This works most of the time because we always grow the kernel pmap or page table by an amount that is a multiple of the superpage size. Now, instead, we defer the call to pmap_growkernel() until we are committed to a range of virtual addresses in vm_map_insert(). In general, there is another reason to prefer calling pmap_growkernel() in vm_map_insert(). It makes it possible for someone to do the equivalent of an mmap(MAP_FIXED) on the kernel map.
Reported by: Svatopluk Kraus Reviewed by: kib@ MFC after: 3 weeks
|
212931 |
20-Sep-2010 |
mdf |
Replace an XXX comment with the appropriate code.
Submitted by: alc
|
212873 |
19-Sep-2010 |
alc |
Allow a POSIX shared memory object that is opened for read but not for write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e., copy-on-write.
(This is a regression in the new implementation of POSIX shared memory objects that is used by HEAD and RELENG_8. This bug does not exist in RELENG_7's user-level, file-based implementation.)
PR: 150260 MFC after: 3 weeks
|
212868 |
19-Sep-2010 |
alc |
Make refinements to r212824. In particular, don't make vm_map_unlock_nodefer() part of the synchronization interface for maps.
Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing how they should be used. In particular, describe the deferred deallocations issue with vm_map_unlock_and_wait().
Redo the implementation of vm_map_unlock_and_wait() so that it passes along the caller's file and line information, just like the other map locking primitives.
Reviewed by: kib X-MFC after: r212824
|
212824 |
18-Sep-2010 |
kib |
Adopt the deferring of object deallocation for the deleted map entries on map unlock to the lock downgrade and later read unlock operation.
System map entries cannot be backed by OBJT_VNODE objects, no need to defer deallocation for them. Map entries from user maps do not require the owner map for deallocation, and can be accumulated in the thread-local list for freeing when a user map is unlocked.
Move the collection of entries for deferred reclamation into vm_map_delete(). Create helper vm_map_process_deferred(), that is called from locations where processing is feasible. Do not process deferred entries in vm_map_unlock_and_wait() since map_sleep_mtx is held.
Reviewed by: alc, rstone (previous versions) Tested by: pho MFC after: 2 weeks
|
212750 |
16-Sep-2010 |
mdf |
Re-add r212370 now that the LOR in powerpc64 has been resolved:
Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary.
Reviewed by: phk (original patch)
|
212572 |
13-Sep-2010 |
mdf |
Revert r212370, as it causes a LOR on powerpc. powerpc does a few unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex.
Requested by: nwhitehorn
|
212370 |
09-Sep-2010 |
mdf |
Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary.
Reviewed by: phk
|
212360 |
09-Sep-2010 |
nwhitehorn |
On architectures with non-tree-based page tables like PowerPC, every page in a range must be checked when calling pmap_remove(). Calling pmap_remove() from vm_pageout_map_deactivate_pages() with the entire range of the map could result in attempting to demap an extraordinary number of pages (> 10^15), so iterate through each map entry and unmap each of them individually.
MFC after: 6 weeks
|
212282 |
07-Sep-2010 |
rstone |
Fix a typo in r212281. uintptr -> uintptr_t
Pointy hat to: rstone
Approved by: emaste (mentor) MFC after: 2 weeks
|
212281 |
07-Sep-2010 |
rstone |
In munmap() downgrade the vm_map_lock to a read lock before taking a read lock on the pmc-sx lock. This prevents a deadlock with pmc_log_process_mappings, which has an exclusive lock on pmc-sx and tries to get a read lock on a vm_map. Downgrading the vm_map_lock in munmap allows pmc_log_process_mappings to continue, preventing the deadlock.
Without this change I could cause a deadlock on a multicore 8.1-RELEASE system by having one thread constantly mmap'ing and then munmap'ing a PROT_EXEC mapping in a loop while I repeatedly invoked and stopped pmcstat in system-wide sampling mode.
Reviewed by: fabient Approved by: emaste (mentor) MFC after: 2 weeks
|
212174 |
03-Sep-2010 |
avg |
vm_page.c: include opt_msgbuf.h for MSGBUF_SIZE use in vm_page_startup
vm_page_startup uses MSGBUF_SIZE value for adding msgbuf pages to minidump. If opt_msgbuf.h is not included and MSGBUF_SIZE is overriden in kernel config, then not all msgbuf pages will be dumped. And most importantly, struct msgbuf itself will not be included. Thus the dump would look corrupted/incomplete to tools like kgdb, dmesg, etc that try to access struct msgbuf as one of the first things they do when working on a crash dump.
MFC after: 5 days
|
212063 |
31-Aug-2010 |
mdf |
Have memguard(9) crash with an easier-to-debug message on double-free.
Reviewed by: zml MFC after: 3 weeks
|
212058 |
31-Aug-2010 |
mdf |
The realloc case for memguard(9) will copy too many bytes when reallocating to a smaller-sized allocation. Fix this issue.
Noticed by: alc Reviewed by: alc Approved by: zml (mentor) MFC after: 3 weeks
|
211937 |
28-Aug-2010 |
alc |
Add the MAP_PREFAULT_READ option to mmap(2).
Reviewed by: jhb, kib
|
211396 |
16-Aug-2010 |
andre |
Add uma_zone_get_max() to obtain the effective limit after a call to uma_zone_set_max().
The UMA zone limit is not exactly set to the value supplied but rounded up to completely fill the backing store increment (a page normally). This can lead to surprising situations where the number of elements allocated from UMA is higher than the supplied limit value. The new get function reads back the effective value so that the supplied limit value can be adjusted to the real limit.
Reviewed by: jeffr MFC after: 1 week
|
211229 |
12-Aug-2010 |
mdf |
Fix compile. It seemed better to have memguard.c include opt_vm.h in case future compile-time knobs were added that it wants to use. Also add include guards and forward declarations to vm/memguard.h.
Approved by: zml (mentor) MFC after: 1 month
|
211194 |
11-Aug-2010 |
mdf |
Rework memguard(9) to reserve significantly more KVA to detect use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow:
- randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded
Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month
|
210923 |
06-Aug-2010 |
kib |
Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes.
In collaboration with: pho MFC after: 1 month
|
210550 |
27-Jul-2010 |
jhb |
Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory.
Reviewed by: alc
|
210548 |
27-Jul-2010 |
trasz |
Fix commented out resource limit check in mlockall(2). It's still racy, but at least less misleading.
|
210545 |
27-Jul-2010 |
alc |
Introduce exec_alloc_args(). The objective being to encapsulate the details of the string buffer allocation in one place.
Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string.
Reviewed by: kib
|
210475 |
25-Jul-2010 |
alc |
Change the order in which the file name, arguments, environment, and shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space.
Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string.
Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.)
Reviewed by: kib
|
210327 |
21-Jul-2010 |
jchandra |
Redo the page table page allocation on MIPS, as suggested by alc@.
The UMA zone based allocation is replaced by a scheme that creates a new free page list for the KSEG0 region, and a new function in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table page allocation code. Dropping the page queue and pmap locks before the call to uma_zfree, and re-acquiring them afterwards will introduce a race condtion(noted by alc@).
The changes are : - Revert the earlier changes in MIPS pmap.c that added UMA zone for page table pages. - Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that is not directly mapped (in 32bit kernel). Normal page allocations will first try the HIGHMEM freelist and then the default(direct mapped) freelist. - Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int order, int req)' to vm/vm_page.c to allocate a page from a specified freelist. The MIPS page table pages will be allocated using this function from the freelist containing direct mapped pages. - Move the page initialization code from vm_phys_alloc_contig() to a new function vm_page_alloc_init(), and use this function to initialize pages in vm_page_alloc_freelist() too. - Split the function vm_phys_alloc_pages(int pool, int order) to create vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
|
209861 |
09-Jul-2010 |
alc |
Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently, the maintenance of vm_pageout_deficit can be localized to just two places: vm_page_alloc() and vm_pageout_scan().
This change also corrects an off-by-one error in the maintenance of vm_pageout_deficit. Historically, the buffer cache functions, allocbuf() and vm_hold_load_pages(), have not taken into account that vm_page_alloc() already increments vm_pageout_deficit by one.
Reviewed by: kib
|
209792 |
08-Jul-2010 |
kib |
Make VM_ALLOC_RETRY flag mandatory for vm_page_grab(). Assert that the flag is always provided, and unconditionally retry after sleep for the busy page or failed allocation.
The intent is to remove VM_ALLOC_RETRY eventually.
Proposed and reviewed by: alc
|
209713 |
05-Jul-2010 |
kib |
Add the ability for the allocflag argument of the vm_page_grab() to specify the increment of vm_pageout_deficit when sleeping due to page shortage. Then, in allocbuf(), the code to allocate pages when extending vmio buffer can be replaced by a call to vm_page_grab().
Suggested and reviewed by: alc MFC after: 2 weeks
|
209702 |
04-Jul-2010 |
kib |
Several cleanups for the r209686: - remove unused defines; - remove unused curgeneration argument for vm_object_page_collect_flush(); - always assert that vm_object_page_clean() is called for OBJT_VNODE; - move vm_page_find_least() into for() statement initial clause.
Submitted by: alc
|
209686 |
04-Jul-2010 |
kib |
Reimplement vm_object_page_clean(), using the fact that vm object memq is ordered by page index. This greatly simplifies the implementation, since we no longer need to mark the pages with VPO_CLEANCHK to denote the progress. It is enough to remember the current position by index before dropping the object lock.
Remove VPO_CLEANCHK and VM_PAGER_IGNORE_CLEANCHK as unused. Garbage-collect vm.msync_flush_flags sysctl.
Suggested and reviewed by: alc Tested by: pho
|
209685 |
04-Jul-2010 |
kib |
Introduce a helper function vm_page_find_least(). Use it in several places, which inline the function.
Reviewed by: alc Tested by: pho MFC after: 1 week
|
209669 |
03-Jul-2010 |
alc |
Improve the comment and man page for vm_page_alloc(). Specifically, document one of the optional flags; clarify which of the flags are optional (and which are not), and remove mention of a restriction on the reclamation of cached pages that no longer holds since version 7.
MFC after: 1 week
|
209651 |
02-Jul-2010 |
alc |
Push down the acquisition of the page queues lock into vm_pageout_page_stats(). In particular, avoid acquiring the page queues lock unless iterating over the active queue.
|
209650 |
02-Jul-2010 |
alc |
Use vm_page_prev() instead of vm_page_lookup() in the implementation of vm_fault()'s automatic delete-behind heuristic. vm_page_prev() is typically faster.
|
209647 |
02-Jul-2010 |
alc |
With the demise of page coloring, the page queue macros no longer serve any useful purpose. Eliminate them.
Reviewed by: kib
|
209610 |
30-Jun-2010 |
alc |
Simplify entry to vm_pageout_clean(). Expect the page to be locked. Previously, the caller unlocked the page, and vm_pageout_clean() immediately reacquired the page lock. Also, assert rather than test that the page is neither busy nor held. Since vm_pageout_clean() is called with the object and page locked, the page can't have changed state since the caller verified that the page is neither busy nor held.
|
209407 |
21-Jun-2010 |
alc |
Introduce vm_page_next() and vm_page_prev(), and use them in vm_pageout_clean(). When iterating over a range of pages, these functions can be cheaper than vm_page_lookup() because their implementation takes advantage of the vm_object's memq being ordered.
Reviewed by: kib@ MFC after: 3 weeks
|
209215 |
15-Jun-2010 |
sbruno |
Add a new column to the output of vmstat -z to indicate the number of times the system was forced to sleep when requesting a new allocation.
Expand the debugger hook, db_show_uma, to display these results as well.
This has proven to be very useful in out of memory situations when it is not known why systems have become sluggish or fail in odd ways.
Reviewed by: rwatson alc Approved by: scottl (mentor) peter Obtained from: Yahoo Inc.
|
209173 |
14-Jun-2010 |
alc |
Eliminate checks for a page having a NULL object in vm_pageout_scan() and vm_pageout_page_stats(). These checks were recently introduced by the first page locking commit, r207410, but they are not needed. At the same time, eliminate some redundant accesses to the page's object field. (These accesses should have neen eliminated by r207410.)
Make the assertion in vm_page_flag_set() stricter. Specifically, only managed pages should have PG_WRITEABLE set.
Add a comment documenting an assertion to vm_page_flag_clear().
It has long been the case that fictitious pages have their wire count permanently set to one. Add comments to vm_page_wire() and vm_page_unwire() documenting this. Add assertions to these functions as well.
Update the comment describing vm_page_unwire(). Much of the old comment had little to do with vm_page_unwire(), but a lot to do with _vm_page_deactivate(). Move relevant parts of the old comment to _vm_page_deactivate().
Only pages that belong to an object can be paged out. Therefore, it is pointless for vm_page_unwire() to acquire the page queues lock and enqueue such pages in one of the paging queues. Generally speaking, such pages are immediately freed after the call to vm_page_unwire(). Previously, it was the call to vm_page_free() that reacquired the page queues lock and removed these pages from the paging queues. Now, we will never acquire the page queues lock for this case. (It is also worth noting that since both vm_page_unwire() and vm_page_free() occurred with the page locked, the page daemon never saw the page with its object field set to NULL.)
Change the panic with vm_page_unwire() to provide a more precise message.
Reviewed by: kib@
|
209059 |
11-Jun-2010 |
jhb |
Update several places that iterate over CPUs to use CPU_FOREACH().
|
208990 |
10-Jun-2010 |
alc |
Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(), simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
|
208794 |
04-Jun-2010 |
jchandra |
Make vm_contig_grow_cache() extern, and use it when vm_phys_alloc_contig() fails to allocate MIPS page table pages. The current usage of VM_WAIT in case of vm_phys_alloc_contig() failure is not correct, because:
"There is no guarantee that any of the available free (or cached) pages after the VM_WAIT will fall within the range of suitable physical addresses. Every time this function sleeps and a single page is freed (or cached) by someone else, this function will be reawakened. With a little bad luck, you could spin indefinitely."
We also add low and high parameters to vm_contig_grow_cache() and vm_contig_launder() so that we restrict vm_contig_launder() to the range of pages we are interested in.
Reported by: alc
Reviewed by: alc Approved by: rrs (mentor)
|
208791 |
03-Jun-2010 |
kib |
Do not leak vm page lock in vm_contig_launder(), vm_pageout_page_lock() always returns with the page locked.
Submitted by: alc Pointy hat to: kib
|
208772 |
03-Jun-2010 |
kib |
Add assertion and comment in vm_page_flag_set() describing the expectations when the PG_WRITEABLE flag is set.
Reviewed by: alc
|
208764 |
03-Jun-2010 |
alc |
Maintain the pretense that we support 32KB pages for the sake of the ia64 LINT build.
|
208745 |
02-Jun-2010 |
alc |
Minimize the use of the page queues lock for synchronizing access to the page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.
|
208645 |
29-May-2010 |
alc |
When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call.
When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called.
Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator.
There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this.
Tidy up the variable declarations at the start of pmap_enter().
|
208574 |
26-May-2010 |
alc |
Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock.
Assert that the page is managed in pmap_is_referenced().
On powerpc/aim, push down the page queues lock acquisition from moea*_is_modified() and moea*_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock.
Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment.
Correct a spelling error in vm_page_dontneed().
Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable.
Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked.
Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty().
Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions.
Reviewed by: kib
|
208524 |
25-May-2010 |
alc |
Eliminate the acquisition and release of the page queues lock from vfs_busy_pages(). It is no longer needed.
Submitted by: kib
|
208504 |
24-May-2010 |
alc |
Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed().
Reviewed by: kib (an earlier version)
|
208340 |
20-May-2010 |
kib |
When waiting for the busy page, do not unlock the object unless unlock cannot be avoided.
Reviewed by: alc MFC after: 1 week
|
208264 |
18-May-2010 |
alc |
The page queues lock is no longer required by vm_page_set_invalid(), so eliminate it.
Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here.
Reviewed by: kib
|
208175 |
16-May-2010 |
alc |
On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures.
On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page.
With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running.
Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.
|
208164 |
16-May-2010 |
alc |
Correct an error of omission in r202897: Now that amd64 uses the direct map to access the message buffer, we must explicitly request that the underlying physical pages are included in a crash dump.
Reported by: Benjamin Kaduk
|
208159 |
16-May-2010 |
alc |
Add a comment about the proper use of vm_object_page_remove().
MFC after: 1 week
|
207905 |
11-May-2010 |
alc |
Update synchronization annotations for struct vm_page. Add a comment explaining how the setting of PG_WRITEABLE is synchronized.
|
207846 |
10-May-2010 |
kib |
Continue cleaning the queue instead of moving to the next queue or bailing out if acquisition of page lock caused page position in the queue to change.
Pointed out by: alc
|
207823 |
09-May-2010 |
alc |
Push down the acquisition of the page queues lock into vm_pageq_remove(). (This eliminates a surprising number of page queues lock acquisitions by vm_fault() because the page's queue is PQ_NONE and thus the page queues lock is not needed to remove the page from a queue.)
|
207822 |
09-May-2010 |
alc |
Call vm_page_deactivate() rather than vm_page_dontneed() in swp_pager_force_pagein(). By dirtying the page, swp_pager_force_pagein() forces vm_page_dontneed() to insert the page at the head of the inactive queue, just like vm_page_deactivate() does. Moreover, because the page was invalid, it can't have been mapped, and thus the other effect of vm_page_dontneed(), clearing the page's reference bits has no effect. In summary, there is no reason to call vm_page_dontneed() since its effect will be identical to calling the simpler vm_page_deactivate().
|
207806 |
09-May-2010 |
alc |
Remove the page queues lock around a call to vm_page_activate(). Make the page dirty before adding it to the active queue.
|
207798 |
08-May-2010 |
alc |
Minimize the scope of the page queues lock in vm_fault().
|
207796 |
08-May-2010 |
alc |
Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write().
Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.)
Switch to a per-processor counter for the total number of pages cached.
|
207759 |
07-May-2010 |
jkim |
Fix a typo in the previous commit.
|
207752 |
07-May-2010 |
kib |
One more use for vm_pageout_init_marker().
Reviewed by: alc
|
207747 |
07-May-2010 |
alc |
Eliminate unnecessary page queues locking.
|
207746 |
07-May-2010 |
alc |
Push down the page queues lock into vm_page_activate().
|
207740 |
07-May-2010 |
alc |
Update the synchronization requirements for the page usage count.
|
207739 |
07-May-2010 |
alc |
Eliminate acquisitions of the page queues lock that are no longer needed.
Switch to a per-processor counter for the number of pages freed during process termination.
|
207738 |
07-May-2010 |
alc |
Push down the page queues lock into vm_page_deactivate(). Eliminate an incorrect comment.
|
207728 |
06-May-2010 |
alc |
Eliminate page queues locking around most calls to vm_page_free().
|
207706 |
06-May-2010 |
alc |
Update a comment to say that access to a page's wire count is now synchronized by the page lock.
|
207702 |
06-May-2010 |
alc |
Push down the page queues lock inside of vm_page_free_toq() and pmap_page_is_mapped() in preparation for removing page queues locking around calls to vm_page_free(). Setting aside the assertion that calls pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page queues lock just long enough to actually add or remove the page from the paging queues.
Update vm_page_unhold() to reflect the above change.
|
207694 |
06-May-2010 |
kib |
Add a helper function vm_pageout_page_lock(), similar to tegge' vm_pageout_fallback_object_lock(), to obtain the page lock while having page queue lock locked, and still maintain the page position in a queue.
Use the helper to lock the page in the pageout daemon and contig launder iterators instead of skipping the page if its lock is contested. Skipping locked pages easily causes pagedaemon or launder to not make a progress with page cleaning.
Proposed and reviewed by: alc
|
207669 |
05-May-2010 |
alc |
Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.)
This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename().
Discussed with: kib
|
207644 |
05-May-2010 |
alc |
Push down the acquisition of the page queues lock into vm_page_unwire().
Update the comment describing which lock should be held on entry to vm_page_wire().
Reviewed by: kib
|
207617 |
04-May-2010 |
alc |
Add page locking to the vm_page_cow* functions.
Push down the acquisition and release of the page queues lock into vm_page_wire().
Reviewed by: kib
|
207601 |
04-May-2010 |
alc |
Add lock assertions.
|
207580 |
03-May-2010 |
kib |
Handle busy status of the page in a way expected for pager_getpage(). Flush requested page, unbusy other pages, do not clear m->busy.
Reviewed by: alc MFC after: 1 week
|
207577 |
03-May-2010 |
alc |
Acquire the page lock around vm_page_wire() in vm_page_grab().
Assert that the page lock is held in vm_page_wire().
|
207576 |
03-May-2010 |
alc |
It makes more sense for the object-based backend allocator to use OBJT_PHYS objects instead of OBJT_DEFAULT objects because we never reclaim or pageout the allocated pages. Moreover, they are mapped with pmap_qenter(), which creates unmanaged mappings.
Reviewed by: kib
|
207552 |
03-May-2010 |
alc |
The pages allocated by kmem_alloc_attr() and kmem_malloc() are unmanaged. Consequently, neither the page lock nor the page queues lock is needed to unwire and free them.
|
207551 |
03-May-2010 |
alc |
Assert that the page queues lock is held in vm_page_remove() and vm_page_unwire() only if the page is managed, i.e., pageable.
|
207544 |
02-May-2010 |
alc |
Add page lock assertions where we access the page's hold_count.
|
207541 |
02-May-2010 |
alc |
Eliminate an assignment that was made redundant by r207410.
|
207540 |
02-May-2010 |
alc |
Defer the acquisition of the page and page queues locks in vm_pageout_object_deactivate_pages().
|
207539 |
02-May-2010 |
alc |
Simplify vm_fault(). The introduction of the new page lock renders a bit of cleverness by vm_fault() to avoid repeatedly releasing and reacquiring the page queues lock pointless.
Reviewed by: kib, kmacy
|
207531 |
02-May-2010 |
alc |
Correct an error in r207410: Remove an unlock of a lock that is no longer held.
|
207530 |
02-May-2010 |
alc |
It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(), to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.
|
207519 |
02-May-2010 |
alc |
This change addresses the race condition that was introduced by the previous revision, r207450, to this file. Specifically, between dropping the page queues lock in vm_contig_launder() and reacquiring it in vm_contig_launder_page(), the page may be removed from the active or inactive queue. It could be wired, freed, cached, etc. None of which vm_contig_launder_page() is prepared for.
Reviewed by: kib, kmacy
|
207487 |
02-May-2010 |
alc |
Correct an error of omission in r206819. If VMFS_TLB_ALIGNED_SPACE is specified to vm_map_find(), then retry the vm_map_findspace() if vm_map_insert() fails because the aligned space is already partly used.
Reported by: Neel Natu
|
207460 |
01-May-2010 |
kmacy |
Update locking comment above vm_page: - re-assign page queue lock "Q" - assign page lock "P" - update several uncommented fields - observe that hold_count is now protected by the page lock "P"
|
207452 |
30-Apr-2010 |
kmacy |
push up dropping of the page queue lock to avoid holding it in vm_pageout_flush
|
207451 |
30-Apr-2010 |
kmacy |
don't call vm_pageout_flush with the page queue mutex held
Reported by: Michael Butler
|
207450 |
30-Apr-2010 |
kmacy |
- acquire the page lock in vm_contig_launder_page before checking page fields - release page queue lock before calling vm_pageout_flush
|
207448 |
30-Apr-2010 |
kmacy |
- don't check hold_count without the page lock held - don't leak the page lock if m->object is NULL (assuming that that check will in fact even be valid when m->object is protected by the page lock)
|
207438 |
30-Apr-2010 |
kib |
Unlock page lock instead of recursively locking it.
|
207412 |
30-Apr-2010 |
kmacy |
don't allow unsynchronized free in vm_page_unhold
|
207410 |
30-Apr-2010 |
kmacy |
On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
|
207374 |
29-Apr-2010 |
alc |
Simplify the inner loop of vm_pageout_object_deactivate_pages(). Rather than checking each page for PG_UNMANAGED, check the vm object's type. Only OBJT_PHYS can have unmanaged pages. Eliminate a pointless counter. The vm object is locked, that lock is never released by the inner loop, and the set of pages contained by the vm object is not changed by the inner loop. Therefore, the counter serves no purpose.
|
207365 |
29-Apr-2010 |
kib |
When doing kstack swapin, read as much pages in one run as possible.
Suggested and reviewed by: alc (previous version) Tested by: pho MFC after: 2 weeks
|
207364 |
29-Apr-2010 |
kib |
In swap pager, do not free the non-requested pages from the run if they are wired. Kstack pages are wired, this change prepares swap pager for handling of long runs of kstack pages.
Noted and reviewed by: alc Tested by: pho MFC after: 2 weeks
|
207308 |
28-Apr-2010 |
alc |
Setting PG_REFERENCED on a page at the end of vm_fault() is redundant since the page table entry's accessed bit is either preset by the immediately preceding call to pmap_enter() or by hardware (or software) upon return from vm_fault() when the faulting access is restarted.
|
207306 |
28-Apr-2010 |
alc |
Change vm_object_madvise() so that it checks whether the page is invalid or unmanaged before acquiring the page queues lock. Neither of these tests require that lock. Moreover, a better way of testing if the page is unmanaged is to test the type of vm object. This avoids a pointless vm_page_lookup().
MFC after: 3 weeks
|
207155 |
24-Apr-2010 |
alc |
Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!
|
206885 |
20-Apr-2010 |
alc |
Eliminate an unnecessary call to pmap_remove_all(). If a page belongs to an object whose reference count is zero, then that page cannot possibly be mapped.
|
206823 |
19-Apr-2010 |
alc |
vm_thread_swapout() can safely dirty the page before rather than after acquiring the page queues lock.
|
206819 |
18-Apr-2010 |
jmallett |
o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the address space for an address as aligned by the new pmap_align_tlb() function, which is for constraints imposed by the TLB. [1] o) Add a kmem_alloc_nofault_space() function, which acts like kmem_alloc_nofault() but allows the caller to specify which find-space option to use. [1] o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the kernel stack address on MIPS. [1] o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on an odd boundary within the TLB, so that they are suitable for insertion as wired entries and do not have to share a TLB entry with another mapping, assuming they are appropriately-sized. o) Eliminate md_realstack now that the kstack will be appropriately-aligned on MIPS. o) Increase the number of guard pages to 2 so that we retain the proper alignment of the kstack address.
Reviewed by: [1] alc X-MFC-after: Making sure alc has not come up with a better interface.
|
206814 |
18-Apr-2010 |
alc |
Remove a nonsensical test from vm_pageout_clean(). A page can't be in the inactive queue and have a non-zero wire count.
Reviewed by: kib MFC after: 3 weeks
|
206801 |
18-Apr-2010 |
alc |
There is no justification for vm_object_split() setting PG_REFERENCED on a page that it is going to sleep on. Eliminate it.
MFC after: 3 weeks
|
206770 |
17-Apr-2010 |
alc |
In vm_object_madvise() setting PG_REFERENCED on a page before sleeping on that page only makes sense if the advice is MADV_WILLNEED. In that case, the intention is to activate the page, so discouraging the page daemon from reclaiming the page makes sense. In contrast, in the other cases, MADV_DONTNEED and MADV_FREE, it makes no sense whatsoever to discourage the page daemon from reclaiming the page by setting PG_REFERENCED.
Wrap a nearby line.
Discussed with: kib MFC after: 3 weeks
|
206768 |
17-Apr-2010 |
alc |
In vm_object_backing_scan(), setting PG_REFERENCED on a page before sleeping on that page is nonsensical. Doing so reduces the likelihood that the page daemon will reclaim the page before the thread waiting in vm_object_backing_scan() is reawakened. However, it does not guarantee that the page is not reclaimed, so vm_object_backing_scan() restarts after reawakening. More importantly, this muddles the meaning of PG_REFERENCED. There is no reason to believe that the caller of vm_object_backing_scan() is going to use (i.e., access) the contents of the page. There is especially no reason to believe that an access is more likely because vm_object_backing_scan() had to sleep on the page.
Discussed with: kib MFC after: 3 weeks
|
206761 |
17-Apr-2010 |
alc |
Setting PG_REFERENCED on the requested page in swap_pager_getpages() is either redundant or harmful, depending on the caller. For example, when called by vm_fault(), it is redundant. However, when called by vm_thread_swapin(), it is harmful. Specifically, if the thread is later swapped out, having PG_REFERENCED set on its stack pages leads the page daemon to reactivate these stack pages and delay their reclamation.
Reviewed by: kib MFC after: 3 weeks
|
206545 |
13-Apr-2010 |
alc |
Simplify vm_thread_swapin().
|
206483 |
11-Apr-2010 |
alc |
Initialize the virtual memory-related resource limits in a single place. Previously, one of these limits was initialized in two places to a different value in each place. Moreover, because an unsigned int was used to represent the amount of pageable physical memory, some of these limits were incorrectly initialized on 64-bit architectures. (Currently, this error is masked by login.conf's default settings.)
Make vm_thread_swapin() and vm_thread_swapout() static.
Submitted by: bde (an earlier version) Reviewed by: kib
|
206409 |
09-Apr-2010 |
alc |
Introduce the function kmem_alloc_attr(), which allocates kernel virtual memory with the specified physical attributes. In particular, like kmem_alloc_contig(), the caller can specify the physical address range from which the physical pages are allocated and the memory attributes (i.e., cache behavior) for these physical pages. However, in contrast to kmem_alloc_contig() or contigmalloc(), the physical pages that are allocated by kmem_alloc_attr() are not necessarily physically contiguous. This function is needed by DRM and VirtualBox.
Correct an error in the prototype for kmem_malloc(). The third argument had the wrong type.
Tested by: rnoland MFC after: 3 days
|
206360 |
07-Apr-2010 |
joel |
Start copyright notice with /*-
|
206264 |
06-Apr-2010 |
kib |
When OOM searches for a process to kill, ignore the processes already killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible.
This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch.
In collaboration with: pho Reviewed by: alc MFC after: 4 weeks
|
206174 |
05-Apr-2010 |
alc |
vm_reserv_alloc_page() should never be called on an OBJT_SG object, just as it is never called on an OBJT_DEVICE object. (This change should have been included in r195840.)
Reported by: dougb@, avg@ MFC after: 3 days
|
206142 |
03-Apr-2010 |
alc |
Make _vm_map_init() the one place where the vm map's pmap field is initialized.
Reviewed by: kib
|
206140 |
03-Apr-2010 |
alc |
Re-enable the call to pmap_release() by vmspace_dofree(). The accounting problem that is described in the comment has been addressed.
Submitted by: kib Tested by: pho (a few months ago) MFC after: 6 weeks
|
205536 |
23-Mar-2010 |
jhb |
Reject attempts to create a MAP_ANON mapping with a non-zero offset.
PR: kern/71258 Submitted by: Alexander Best MFC after: 2 weeks
|
205487 |
22-Mar-2010 |
kmacy |
- enable alignment on amd64 only - only align pcpu caches and the volatile portion of uma_zone
|
205298 |
18-Mar-2010 |
kmacy |
turn 205266 in to a no-op until the problem can be properly diagnosed
|
205266 |
17-Mar-2010 |
kmacy |
Cache line align various structures and move volatile counters to not share a cache line with (mostly) immutable state
Reviewed by: jeff@ MFC after: 7 days
|
204415 |
27-Feb-2010 |
kib |
Update comment for vm_page_alloc(9), listing all acceptable flags [1]. Note that the function does not sleep, it can block.
Submitted by: Giovanni Trematerra <giovanni.trematerra gmail com> [1] MFC after: 3 days
|
204205 |
22-Feb-2010 |
kib |
Remove write-only variable.
MFC after: 3 days
|
204181 |
21-Feb-2010 |
alc |
Align the start of the clean submap to a superpage boundary. Although no superpage mappings are created within the clean submap, aligning the start of the clean submap helps to prevent interference with kmem_alloc()'s use of superpages.
|
203175 |
29-Jan-2010 |
kib |
The MAP_ENTRY_NEEDS_COPY flag belongs to protoeflags, cow variable uses different namespace.
Reported by: Jonathan Anderson <jonathan.anderson cl cam ac uk> MFC after: 3 days
|
202529 |
17-Jan-2010 |
kib |
When a vnode-backed vm object is referenced, it increments the vnode reference count, and decrements it on dereference. If referenced object is deallocated, object type is reset to OBJT_DEAD. Consequently, all vnode references that are owned by object references are never released. vunref() the vnode in vm object deallocation code for OBJT_VNODE appropriate number of times to prevent leak.
Add an assertion to the vm_pageout() to make sure that we never get reference on the vnode but then do not execute code to release it.
In collaboration with: pho Reviewed by: alc MFC after: 3 weeks
|
201223 |
29-Dec-2009 |
rnoland |
Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.
This replaces d_mmap() with the d_mmap2() implementation and also changes the type of offset to vm_ooffset_t.
Purge d_mmap2().
All driver modules will need to be rebuilt since D_VERSION is also bumped.
Reviewed by: jhb@ MFC after: Not in this lifetime...
|
201145 |
28-Dec-2009 |
antoine |
(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used.
PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
|
200770 |
21-Dec-2009 |
kib |
VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.
Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects.
Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks
|
200129 |
05-Dec-2009 |
antoine |
Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros.
MFC after: 1 month
|
199870 |
28-Nov-2009 |
alc |
Properly synchronize the previous change.
|
199869 |
27-Nov-2009 |
alc |
Support the new VM_PROT_COPY option on wired pages. The effect of which is that a debugger can now set a breakpoint in a program that uses mlock(2) on its text segment or mlockall(2) on its entire address space.
|
199868 |
27-Nov-2009 |
alc |
Simplify the invocation of vm_fault(). Specifically, eliminate the flag VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault().
Discussed with: kib
|
199819 |
26-Nov-2009 |
alc |
Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has represented a write access that is allowed to override write protection. Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into text pages. Text pages are not just write protected but they are also copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the text page and triggers the replication of the page so that the breakpoint will be written to a private copy. However, here is where things become confused. It is the debugger, not the process being debugged that requires write access to the copied page. Nonetheless, the copied page is being mapped into the process with write access enabled. In other words, once the debugger sets a breakpoint within a text page, the program can write to its private copy of that text page. Whereas prior to setting the breakpoint, a SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the replication of a copy-on-write page even though the access is only for read. Moreover, the replicated page is only mapped into the process with read access, and not write access.
Reviewed by: kib MFC after: 4 weeks
|
199490 |
18-Nov-2009 |
alc |
Simplify both the invocation and the implementation of vm_fault() for wiring pages.
(Note: Claims made in the comments about the handling of breakpoints in wired pages have been false for roughly a decade. This and another bug involving breakpoints will be fixed in coming changes.)
Reviewed by: kib
|
198870 |
04-Nov-2009 |
alc |
Eliminate an unnecessary #include. (This #include should have been removed in r188331 when vnode_pager_lock() was eliminated.)
|
198855 |
03-Nov-2009 |
alc |
Eliminate a bit of hackery from vm_fault(). The operations that this hackery sought to prevent are now properly supported by vm_map_protect(). (See r198505.)
Reviewed by: kib
|
198854 |
03-Nov-2009 |
attilio |
Split P_NOLOAD into a per-thread flag (TDF_NOLOAD). This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag.
Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>
|
198812 |
02-Nov-2009 |
alc |
Avoid pointless calls to pmap_protect().
Reviewed by: kib
|
198811 |
02-Nov-2009 |
ivoras |
Add sysctl documentation strings. The descriptions are derived from tuning(7). One of the descriptions references tuning(7) because it is too complex to adequatly describe here (it is not a simple boolean sysctl) and users should be warned to that.
Reviewed by: alc, kib Approved by: gnn (mentor)
|
198721 |
31-Oct-2009 |
alc |
Correct an error in vm_fault_copy_entry() that has existed since the first version of this file. When a process forks, any wired pages are immediately copied because copy-on-write is not supported for wired pages. In other words, the child process is given its own private copy of each wired page from its parent's address space. Unfortunately, to date, these copied pages have been mapped into the child's address space with the wrong permissions, typically VM_PROT_ALL. This change corrects the permissions.
Reviewed by: kib
|
198505 |
27-Oct-2009 |
kib |
When protection of wired read-only mapping is changed to read-write, install new shadow object behind the map entry and copy the pages from the underlying objects to it. This makes the mprotect(2) call to actually perform the requested operation instead of silently do nothing and return success, that causes SIGSEGV on later write access to the mapping.
Reuse vm_fault_copy_entry() to do the copying, modifying it to behave correctly when src_entry == dst_entry.
Reviewed by: alc MFC after: 3 weeks
|
198476 |
26-Oct-2009 |
alc |
Simplify the inner loop of vm_fault_copy_entry().
Reviewed by: kib
|
198472 |
25-Oct-2009 |
alc |
Eliminate an unnecessary check from vm_fault_prefault().
|
198341 |
21-Oct-2009 |
marcel |
o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked).
The key property of this change is that the I-cache is made coherent *after* writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent *before* any writes happen. The difference is key when the I-cache prefetches.
|
198201 |
18-Oct-2009 |
kib |
Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA). Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is actually exceed.
Both changes aim at calling priv_check(9) only for the cases when privilege is actually exercised by the process.
Reported and tested by: rwatson Reviewed by: alc MFC after: 3 days
|
197750 |
04-Oct-2009 |
alc |
Align and pad the page queue and free page queue locks so that the linker can't possibly place them together within the same cache line.
MFC after: 3 weeks
|
197712 |
02-Oct-2009 |
bz |
Back out the functional parts from r197537. After r197711, affecting all user mappings, mmap no longer needs special treatment.
|
197661 |
01-Oct-2009 |
kib |
Move the annotation for vm_map_startup() immediately before the function.
MFC after: 3 days
|
197537 |
27-Sep-2009 |
simon |
Do not allow mmap with the MAP_FIXED argument to map at address zero. This is done to make it harder to exploit kernel NULL pointer security vulnerabilities. While this of course does not fix vulnerabilities, it does mitigate their impact.
Note that this may break some applications, most likely emulators or similar, which for one reason or another require mapping memory at zero.
This restriction can be disabled with the security.bsd.mmap_zero sysctl variable.
Discussed with: rwatson, bz Tested by: bz (Wine), simon (VirtualBox) Submitted by: jhb
|
197348 |
20-Sep-2009 |
kib |
Old (a.out) rtld attempts to mmap zero-length region, e.g. when bss of the linked object is zero-length. More old code assumes that mmap of zero length returns success.
For a.out and pre-8 ELF binaries, allow the mmap of zero length.
Reported by: tegge Reviewed by: tegge, alc, jhb MFC after: 3 days
|
196730 |
01-Sep-2009 |
kib |
Reintroduce the r196640, after fixing the problem with my testing.
Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1].
This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size.
Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks.
Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week
|
196648 |
29-Aug-2009 |
kib |
Reverse r196640 and r196644 for now.
|
196640 |
29-Aug-2009 |
kib |
Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1].
This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size.
Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks.
Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week
|
196637 |
29-Aug-2009 |
jhb |
Mark the fake pages constructed by the OBJT_SG pager valid. This was accidentally lost at one point during the PAT development. Without this fix vm_pager_get_pages() was zeroing each of the pages.
Submitted by: czander @ NVidia MFC after: 3 days
|
196615 |
28-Aug-2009 |
jhb |
Extend the device pager to support different memory attributes on different pages in an object. - Add a new variant of d_mmap() currently called d_mmap2() which accepts an additional in/out parameter that is the memory attribute to use for the requested page. - A driver either uses d_mmap() or d_mmap2() for all requests but not both. The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate that the driver provides a d_mmap2() handler instead of d_mmap(). This is done to make the change ABI compatible with existing drivers and MFC'able to 7 and 8.
Submitted by: alc MFC after: 1 month
|
195844 |
24-Jul-2009 |
jhb |
Remove debugging that crept in with previous commit.
Reported by: nwhitehorn Approved by: re (kib)
|
195840 |
24-Jul-2009 |
jhb |
Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver.
Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
|
195774 |
19-Jul-2009 |
alc |
Change the handling of fictitious pages by pmap_page_set_memattr() on amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev().
The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless.
Add pmap_page_set_memattr() to the Xen pmap.
Approved by: re (kib)
|
195749 |
18-Jul-2009 |
alc |
An addendum to r195649, "Add support to the virtual memory system for configuring machine-dependent memory attributes...":
Don't set the memory attribute for a "real" page that is allocated to a device object in vm_page_alloc(). It is a pointless act, because the device pager replaces this "real" page with a "fake" page and sets the memory attribute on that "fake" page.
Eliminate pointless code from pmap_cache_bits() on amd64.
Employ the "Self Snoop" feature supported by some x86 processors to avoid cache flushes in the pmap.
Approved by: re (kib)
|
195693 |
14-Jul-2009 |
jhb |
- Change mmap() to fail requests with EINVAL that pass a length of 0. This behavior is mandated by POSIX. - Do not fail requests that pass a length greater than SSIZE_MAX (such as > 2GB on 32-bit platforms). The 'len' parameter is actually an unsigned 'size_t' so negative values don't really make sense.
Submitted by: Alexander Best alexbestms at math.uni-muenster.de Reviewed by: alc Approved by: re (kib) MFC after: 1 week
|
195649 |
12-Jul-2009 |
alc |
Add support to the virtual memory system for configuring machine- dependent memory attributes:
Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior.
Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages.
Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager:
kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386.
vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes.
Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping.
Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7.
In collaboration with: jhb
Approved by: re (kib)
|
195635 |
12-Jul-2009 |
kib |
When VM_MAP_WIRE_HOLESOK is not specified and vm_map_wire(9) encounters non-readable and non-executable map entry, the entry is skipped from wiring and loop is aborted. But, since MAP_ENTRY_WIRE_SKIPPED was not set for the map entry, its wired_count is later erronously decremented. vm_map_delete(9) for such map entry stuck in "vmmaps".
Properly set MAP_ENTRY_WIRE_SKIPPED when aborting the loop.
Reported by: John Marshall <john.marshall riverwillow com au> Approved by: re (kensmith)
|
195329 |
03-Jul-2009 |
kib |
When forking a vm space that has wired map entries, do not forget to charge the objects created by vm_fault_copy_entry. The object charge was set, but reserve not incremented.
Reported by: Greg Rivers <gcr+freebsd-current tharned org> Reviewed by: alc (previous version) Approved by: re (kensmith)
|
195131 |
28-Jun-2009 |
kib |
Eliminiate code duplication by calling vm_object_destroy() from vm_object_collapse().
Requested and reviewed by: alc Approved by: re (kensmith)
|
195033 |
26-Jun-2009 |
alc |
This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes.
In collaboration with: jhb
|
194990 |
25-Jun-2009 |
kib |
Change the type of uio_resid member of struct uio from int to ssize_t. Note that this does not actually enable full-range i/o requests for 64 architectures, and is done now to update KBI only.
Tested by: pho Reviewed by: jhb, bde (as part of the review of the bigger patch)
|
194814 |
24-Jun-2009 |
kib |
Initialize the uip to silence gcc warning that seems to sneak in in some build environments.
Reported by: alc, bf1783 at googlemail com
|
194806 |
24-Jun-2009 |
alc |
The bits set in a page's dirty mask are a subset of the bits set in its valid mask. Consequently, there is no need to perform a bit-wise and of the page's dirty and valid masks in order to determine which parts of a page are dirty and valid.
Eliminate an unnecessary #include.
|
194766 |
23-Jun-2009 |
kib |
Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid.
The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup.
The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped.
The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4).
Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced.
In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
|
194642 |
22-Jun-2009 |
alc |
Validate the page in one place, dev_pager_getpages(), rather than doing it in two places, dev_pager_getfake() and dev_pager_updatefake().
Compare a pointer to "NULL" rather than "0".
|
194607 |
21-Jun-2009 |
alc |
Implement a mechanism within vm_phys_alloc_contig() to defer all necessary calls to vdrop() until after the free page queues lock is released. This eliminates repeatedly releasing and reacquiring the free page queues lock each time the last cached page is reclaimed from a vnode-backed object.
|
194562 |
21-Jun-2009 |
alc |
Strive for greater consistency among the places that implement real, fictious, and contiguous page allocation. Eliminate unnecessary reinitialization of a page's fields.
|
194459 |
18-Jun-2009 |
thompsa |
Track the kernel mapping of a physical page by a new entry in vm_page structure. When the page is shared, the kernel mapping becomes a special type of managed page to force the cache off the page mappings. This is needed to avoid stale entries on all ARM VIVT caches, and VIPT caches with cache color issue.
Submitted by: Mark Tinguely Reviewed by: alc Tested by: Grzegorz Bernacki, thompsa
|
194429 |
18-Jun-2009 |
alc |
Add support for UMA_SLAB_KERNEL to page_free(). (While I'm here remove an unnecessary newline character from the end of two panic messages.)
|
194393 |
17-Jun-2009 |
alc |
Eliminate unnecessary forward declarations.
|
194376 |
17-Jun-2009 |
alc |
Refactor contigmalloc() into two functions: a simple front-end that deals with the malloc tag and calls a new back-end, kmem_alloc_contig(), that allocates the pages and maps them.
The motivations for this change are two-fold: (1) A cache mode parameter will be added to kmem_alloc_contig(). In other words, kmem_alloc_contig() will be extended to support the allocation of memory with caller-specified caching. (2) The UMA allocation function that is used by the two jumbo frames zones can use kmem_alloc_contig() in place of contigmalloc() and thereby avoid having free jumbo frames held by the zone counted as live malloc()ed memory.
|
194337 |
17-Jun-2009 |
alc |
Pass the size of the mapping to contigmapping() as a "vm_size_t" rather than a "vm_pindex_t". A "vm_size_t" is more convenient for it to use.
|
194331 |
17-Jun-2009 |
alc |
Make the maintenance of a page's valid bits by contigmalloc() more like kmem_alloc() and kmem_malloc(). Specifically, defer the setting of the page's valid bits until contigmapping() when the mapping is known to be successful.
|
194209 |
14-Jun-2009 |
alc |
Long, long ago in r27464 special case code for mapping device-backed memory with 4MB pages was added to pmap_object_init_pt(). This code assumes that the pages of a OBJT_DEVICE object are always physically contiguous. Unfortunately, this is not always the case. For example, jhb@ informs me that the recently introduced /dev/ksyms driver creates a OBJT_DEVICE object that violates this assumption. Thus, this revision modifies pmap_object_init_pt() to abort the mapping if the OBJT_DEVICE object's pages are not physically contiguous. This revision also changes some inconsistent if not buggy behavior. For example, the i386 version aborts if the first 4MB virtual page that would be mapped is already valid. However, it incorrectly replaces any subsequent 4MB virtual page mappings that it encounters, potentially leaking a page table page. The amd64 version has a bug of my own creation. It potentially busies the wrong page and always an insufficent number of pages if it blocks allocating a page table page.
To my knowledge, there have been no reports of these bugs, hence, their persistance. I suspect that the existing restrictions that pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would choose to map, for example, that the first page must be aligned on a 2 or 4MB physical boundary and that the size of the mapping must be a multiple of the large page size, were enough to avoid triggering the bug for drivers like ksyms. However, one side effect of testing the OBJT_DEVICE object's pages for physical contiguity is that a dubious difference between pmap_object_init_pt() and the standard path for mapping devices pages, i.e., vm_fault(), has been eliminated. Previously, pmap_object_init_pt() would only instantiate the first PG_FICTITOUS page being mapped because it never examined the rest. Now, however, pmap_object_init_pt() uses the new function vm_object_populate() to instantiate them all (in order to support testing their physical contiguity). These pages need to be instantiated for the mechanism that I have prototyped for automatically maintaining the consistency of the PAT settings across multiple mappings, particularly, amd64's direct mapping, to work. (Translation: This change is also being made to support jhb@'s work on the Nvidia feature requests.)
Discussed with: jhb@
|
194126 |
13-Jun-2009 |
alc |
Eliminate an unnecessary clearing of a page's dirty bits in phys_pager_getpages().
|
193842 |
09-Jun-2009 |
alc |
Eliminate an unnecessary restriction on the vm object type from vm_map_pmap_enter(). The immediate effect of this change is that automatic prefaulting by mmap() for small mappings is performed on POSIX shared memory objects just the same as it is on ordinary files.
|
193643 |
07-Jun-2009 |
alc |
Eliminate unnecessary obfuscation when testing a page's valid bits.
|
193594 |
06-Jun-2009 |
alc |
Eliminate an unneeded forward declaration. (This should have been removed in revision 1.42.)
|
193593 |
06-Jun-2009 |
alc |
If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check the page's valid bits. The page is guaranteed to be fully valid. (For the record, this is documented in vm/vm_pager.h's comments.)
|
193522 |
05-Jun-2009 |
alc |
vm_thread_swapin() needn't validate any pages. The pages are already validated by vm_pager_get_pages().
|
193521 |
05-Jun-2009 |
alc |
Simplify contigfree().
|
193511 |
05-Jun-2009 |
rwatson |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
|
193303 |
02-Jun-2009 |
alc |
Correct a boundary case error in the management of a page's dirty bits by shm_dotruncate() and vnode_pager_setsize(). Specifically, if the length of a shared memory object or a file is truncated such that the length modulo the page size is between 1 and 511, then all of the page's dirty bits were cleared. Now, a dirty bit is cleared only if the corresponding block is truncated in its entirety.
|
193275 |
01-Jun-2009 |
jhb |
Add an extension to the character device interface that allows character device drivers to use arbitrary VM objects to satisfy individual mmap() requests. - A new d_mmap_single(cdev, &foff, objsize, &object, prot) callback is added to cdevsw. This function is called for each mmap() request. If it returns ENODEV, then the mmap() request will fall back to using the device's device pager object and d_mmap(). Otherwise, the method can return a VM object to satisfy this entire mmap() request via *object. It can also modify the starting offset into this object via *foff. This allows device drivers to use the file offset as a cookie to identify specific VM objects. - vm_mmap_vnode() has been changed to call vm_mmap_cdev() directly when mapping V_CHR vnodes. This avoids duplicating all the cdev mmap handling code and simplifies some of vm_mmap_vnode(). - D_VERSION has been bumped to D_VERSION_02. Older device drivers using D_VERSION_01 are still supported.
MFC after: 1 month
|
193126 |
30-May-2009 |
alc |
Eliminate a stale comment and the two remaining uses of the "register" keyword in this file.
|
193124 |
30-May-2009 |
alc |
Add assertions in two places where a page's valid or dirty bits are changed.
|
192968 |
28-May-2009 |
alc |
Change vm_object_page_remove() such that it clears the page's dirty bits when it invalidates the page.
Suggested by: tegge
|
192962 |
28-May-2009 |
alc |
Revise vm_pageout_scan()'s handling of partially dirty pages. Specifically, rather than unconditionally making partially dirty pages fully dirty, only make partially dirty pages fully dirty if the pmap says that the page has been modified.
(This change is also a small optimization. It eliminate an unnecessary call to pmap_is_modified() on pages that are mapped read only.)
Suggested by: tegge
|
192360 |
19-May-2009 |
kmacy |
- back out direct map hack - it is no longer needed
|
192261 |
17-May-2009 |
alc |
Eliminate a pointless call to pmap_clear_reference() from vm_pageout_scan(). If the page belongs to an object with a reference count of zero, then it can't have any managed mappings on which to clear a reference bit.
|
192207 |
16-May-2009 |
kmacy |
apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map
|
192134 |
15-May-2009 |
alc |
Eliminate unnecessary clearing of the page's dirty mask from various getpages functions.
Eliminate a stale comment.
|
192034 |
13-May-2009 |
alc |
Eliminate page queues locking from bufdone_finish() through the following changes:
Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect what this function actually does. Suggested by: tegge
Introduce a new version of vfs_page_set_valid() that does no more than what the function's name implies. Specifically, it does not update the page's dirty mask, and thus it does not require the page queues lock to be held.
Update two of the three callers to the old vfs_page_set_valid() to call vfs_page_set_validclean() instead because they actually require the page's dirty mask to be cleared.
Introduce vm_page_set_valid().
Reviewed by: tegge
|
192010 |
12-May-2009 |
alc |
Eliminate gratuitous clearing of the page's dirty mask.
|
191935 |
09-May-2009 |
alc |
Fix a race involving vnode_pager_input_smlfs(). Specifically, in the case that vnode_pager_input_smlfs() zeroes the page, it should not mark the page as valid until after the page is zeroed. Otherwise, the page could be mapped for read access (e.g., by vm_map_pmap_enter()) before the page is zeroed. Reviewed by: tegge
Eliminate gratuitous clearing of the page's dirty mask by vnode_pager_input_smlfs(). Instead, assert that the page is clean. Reviewed by: tegge
Eliminate some blank lines.
Eliminate pointless calls to pmap_clear_modify() and vm_page_undirty() from vnode_pager_input_old(). The page is not mapped. Therefore, it cannot have any page table entries that are modified.
Eliminate an incorrect comment from vnode_pager_generic_getpages().
|
191874 |
07-May-2009 |
alc |
Eliminate an incorrect comment.
|
191778 |
04-May-2009 |
alc |
Eliminate vnode_pager_input_smlfs()'s pointless call to pmap_clear_modify(). The page can't possibly have any modified page table entries because it isn't even mapped.
|
191626 |
28-Apr-2009 |
kib |
Use the acquired reference to the vmspace instead of direct dereferencing of p->p_vmspace in a place where it was missed in r191277.
Noted by: pluknet gmail com
|
191625 |
28-Apr-2009 |
kib |
Fix typo.
|
191543 |
26-Apr-2009 |
alc |
Eliminate an errant comment.
Discussed with: tegge
|
191531 |
26-Apr-2009 |
alc |
Eliminate an archaic band-aid. The immediately preceding comment already explains why the band-aid is unnecessary.
Suggested by: tegge
|
191478 |
25-Apr-2009 |
alc |
Eliminate unnecessary calls to pmap_clear_modify(). Specifically, calling pmap_clear_modify() on a page is pointless if that page is not mapped or it is only mapped for read access. Instead, assert that the page is not mapped or not mapped for write access as appropriate.
Eliminate unnecessary clearing of a page's dirty mask. Instead, assert that the page's dirty mask is clear.
|
191439 |
23-Apr-2009 |
kib |
Do not call vm_page_lookup() from the ddb routine, namely from "show vmopag" implementation. The vm_page_lookup() code modifies splay tree of the object pages, and asserts that object lock is taken. First issue could cause kernel data corruption, and second one instantly panics the INVARIANTS-enabled kernel.
Take the advantage of the fact that object->memq is ordered by page index, and iterate over memq to calculate the runs.
While there, make the code slightly more style-compliant by moving variables declarations to the right place.
Discussed with: jhb, alc Reviewed by: alc MFC after: 2 weeks
|
191277 |
19-Apr-2009 |
kib |
In both pageout oom handler and vm_daemon, acquire the reference to the vmspace of the examined process instead of directly accessing its vmspace, that may change. Also, as an optimization, check for P_INEXEC flag before examining the process.
Reported and tested by: pho (previous version) Reviewed by: alc MFC after: 3 week
|
191263 |
19-Apr-2009 |
alc |
Calling pmap_clear_modify() after calling pmap_remove_write() is pointless. The latter function already clears the modified status from each of the page's mappings.
|
191256 |
19-Apr-2009 |
alc |
Allow valid pages to be mapped for read access when they have a non-zero busy count. Only mappings that allow write access should be prevented by a non-zero busy count.
(The prohibition on mapping pages for read access when they have a non- zero busy count originated in revision 1.202 of i386/i386/pmap.c when this code was a part of the pmap.)
Reviewed by: tegge
|
190949 |
11-Apr-2009 |
alc |
Remove execute permission from the memory allocated by sbrk().
Pre-announced on: -arch (3/31/09) Discussed with: rwatson Tested by: marius (sparc64)
|
190912 |
11-Apr-2009 |
alc |
Previously, when vm_page_free_toq() was performed on a page belonging to a reservation, unless all of the reservation's pages were free, the reservation was moved to the head of the partially-populated reservations queue, where it would be the next reservation to be broken in case the free page queues were emptied. Now, instead, I am moving it to the tail. Very likely this reservation is in the process of being freed in its entirety, so placing it at the tail of the queue makes it more likely that the underlying physical memory will be returned to the free page queues as one contiguous chunk. If a reservation must be broken, it will, instead, be the longest unchanged reservation, which is arguably the reservation that is least likely to ever achieve promotion or be freed in its entirety.
MFC after: 6 weeks
|
190886 |
10-Apr-2009 |
kib |
When vm_map_wire(9) is allowed to skip holes in the wired region, skip the mappings without any of read and execution rights, in particular, the PROT_NONE entries. This makes mlockall(2) work for the process address space that has such mappings.
Since protection mode of the entry may change between setting MAP_ENTRY_IN_TRANSITION and final pass over the region that records the wire status of the entries, allocate new map entry flag MAP_ENTRY_WIRE_SKIPPED to mark the skipped PROT_NONE entries.
Reported and tested by: Hans Ottevanger <fbsdhackers beasties demon nl> Reviewed by: alc MFC after: 3 weeks
|
190705 |
04-Apr-2009 |
alc |
Retire VM_PROT_READ_IS_EXEC. It was intended to be a micro-optimization, but I see no benefit from it today.
VM_PROT_READ_IS_EXEC was only intended for use on processors that do not distinguish between read and execute permission. On an mmap(2) or mprotect(2), it automatically added execute permission if the caller specified permissions included read permission. The hope was that this would reduce the number of vm map entries needed to implement an address space because there would be fewer neighboring vm map entries that differed only in the presence or absence of VM_PROT_EXECUTE. (See vm/vm_mmap.c revision 1.56.)
Today, I don't see any real applications that benefit from VM_PROT_READ_IS_EXEC. In any case, vm map entries are now organized as a self-adjusting binary search tree instead of an ordered list. So, the need for coalescing vm map entries is not as great as it once was.
|
190604 |
01-Apr-2009 |
alc |
Eliminate dead code.
Reviewed by: jhb
|
189595 |
09-Mar-2009 |
jhb |
Adjust some variables (mostly related to the buffer cache) that hold address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow.
Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that.
MFC after: 1 month
|
189024 |
25-Feb-2009 |
alc |
Prior to r188331 a map entry's last read offset was only updated by a hard fault. In r188331 this update was relocated because of synchronization changes to a place where it would occur on both hard and soft faults. This change again restricts the update to hard faults.
|
189015 |
24-Feb-2009 |
kib |
Revert the addition of the freelist argument for the vm_map_delete() function, done in r188334. Instead, collect the entries that shall be freed, in the deferred_freelist member of the map. Automatically purge the deferred freelist when map is unlocked.
Tested by: pho Reviewed by: alc
|
189014 |
24-Feb-2009 |
kib |
Add the assertion macros for the map locks. Use them in several map manipulation functions.
Tested by: pho Reviewed by: alc
|
189012 |
24-Feb-2009 |
kib |
Update the comment after the r188334.
Reviewed by: alc
|
189004 |
24-Feb-2009 |
rdivacky |
Change the functions to ANSI in those cases where it breaks promotion to int rule. See ISO C Standard: SS6.7.5.3:15.
Approved by: kib (mentor) Reviewed by: warner Tested by: silence on -current
|
188967 |
23-Feb-2009 |
rwatson |
Put debug.vm_lowmem sysctl under DIAGNOSTIC.
Submitted by: sam MFC after: 3 days
|
188964 |
23-Feb-2009 |
rwatson |
Add a debugging sysctl, debug.vm_lowmem, that when assigned a value of 1 will trigger a pass through the VM's low-memory handlers, such as protocol and UMA drain routines. This makes it easier to exercise these otherwise rarely-invoked code paths.
MFC after: 3 days
|
188900 |
21-Feb-2009 |
alc |
Reduce the scope of the page queues lock in vm_object_page_remove().
MFC after: 1 week
|
188859 |
20-Feb-2009 |
alc |
Eliminate stale comments.
|
188386 |
09-Feb-2009 |
kib |
Comment out the assertion from r188321. It is not valid for nfs.
Reported by: alc
|
188383 |
09-Feb-2009 |
alc |
Avoid some cases of unnecessary page queues locking by vm_fault's delete- behind heuristic.
|
188348 |
08-Feb-2009 |
alc |
Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by a redundant assertion in vm_fault().
Reviewed by: kib
|
188337 |
08-Feb-2009 |
kib |
Remove no longer valid comment.
Submitted by: alc
|
188335 |
08-Feb-2009 |
kib |
Improve comments, correct English.
Submitted by: alc
|
188334 |
08-Feb-2009 |
kib |
Do not call vm_object_deallocate() from vm_map_delete(), because we hold the map lock there, and might need the vnode lock for OBJT_VNODE objects. Postpone object deallocation until caller of vm_map_delete() drops the map lock. Link the map entries to be freed into the freelist, that is released by the new helper function vm_map_entry_free_freelist().
Reviewed by: tegge, alc Tested by: pho
|
188333 |
08-Feb-2009 |
kib |
In vm_map_sync(), do not call vm_object_sync() while holding map lock. Reference object, drop the map lock, and then call vm_object_sync(). The object sync might require vnode lock for OBJT_VNODE type objects.
Reviewed by: tegge Tested by: pho
|
188331 |
08-Feb-2009 |
kib |
Do not sleep for vnode lock while holding map lock in vm_fault. Try to acquire vnode lock for OBJT_VNODE object after map lock is dropped. Because we have the busy page(s) in the object, sleeping there would result in deadlock with vnode resize. Try to get lock without sleeping, and, if the attempt failed, drop the state, lock the vnode, and restart the fault handler from the start with already locked vnode.
Because the vnode_pager_lock() function is inlined in vm_fault(), axe it.
Based on suggestion by: alc Reviewed by: tegge, alc Tested by: pho
|
188325 |
08-Feb-2009 |
kib |
Add the comments to vm_map_simplify_entry() and vmspace_fork(), describing why several calls to vm_deallocate_object() with locked map do not result in the acquisition of the vnode lock after map lock.
Suggested and reviewed by: tegge
|
188323 |
08-Feb-2009 |
kib |
Lock the new map in vmspace_fork(). The newly allocated map should not be accessible outside vmspace_fork() yet, but locking it would satisfy the protocol of the vm_map_entry_link() and other functions called from vmspace_fork().
Use trylock that is supposedly cannot fail, to silence WITNESS warning of the nested acquisition of the sx lock with the same name.
Suggested and reviewed by: tegge
|
188321 |
08-Feb-2009 |
kib |
Assert that vnode is exclusively locked when its vm object is resized.
Reviewed by: tegge
|
188320 |
08-Feb-2009 |
kib |
Do not leak the MAP_ENTRY_IN_TRANSITION flag when copying map entry on fork. Otherwise, copied entry cannot be removed in the child map.
Reviewed by: tegge MFC after: 2 weeks
|
188319 |
08-Feb-2009 |
kib |
Style.
|
187681 |
25-Jan-2009 |
jeff |
- Make the keg abstraction more complete. Permit a zone to have multiple backend kegs so it may source compatible memory from multiple backends. This is useful for cases such as NUMA or different layouts for the same memory type. - Provide a new api for adding new backend kegs to secondary zones. - Provide a new flag for adjusting the layout of zones to stagger allocations better across cache lines.
Sponsored by: Nokia
|
187658 |
23-Jan-2009 |
jhb |
- Mark all standalone INT/LONG/QUAD sysctl's MPSAFE. This is done inside the SYSCTL() macros and thus does not need to be done for all of the nodes scattered across the source tree. - Mark the name-cache related sysctl's (including debug.hashstat.*) MPSAFE. - Mark vm.loadavg MPSAFE. - Remove GIANT_REQUIRED from vmtotal() (everything in this routine already has sufficient locking) and mark vm.vmtotal MPSAFE. - Mark the vm.stats.(sys|vm).* sysctls MPSAFE.
|
187527 |
21-Jan-2009 |
jhb |
Now that vfs_markatime() no longer requires an exclusive lock due to the VOP_MARKATIME() changes, use a shared vnode lock for mmap().
Submitted by: ups
|
186719 |
03-Jan-2009 |
kib |
Extend the struct vm_page wire_count to u_int to avoid the overflow of the counter, that may happen when too many sendfile(2) calls are being executed with this vnode [1].
To keep the size of the struct vm_page and offsets of the fields accessed by out-of-tree modules, swap the types and locations of the wire_count and cow fields. Add safety checks to detect cow overflow and force fallback to the normal copy code for zero-copy sockets. [2]
Reported by: Anton Yuzhaninov <citrin citrin ru> [1] Suggested by: alc [2] Reviewed by: alc MFC after: 2 weeks
|
186665 |
01-Jan-2009 |
alc |
Resurrect shared map locks allowing greater concurrency during some map operations, such as page faults.
An earlier version of this change was ...
Reviewed by: kib Tested by: pho MFC after: 6 weeks
|
186633 |
31-Dec-2008 |
alc |
Update or eliminate some stale comments.
|
186618 |
30-Dec-2008 |
alc |
Avoid an unnecessary memory dereference in vm_map_entry_splay().
|
186616 |
30-Dec-2008 |
alc |
Style change to vm_map_lookup(): Eliminate a macro of dubious value.
|
186609 |
30-Dec-2008 |
alc |
Move the implementation of the vm map's fast path on address lookup from vm_map_lookup{,_locked}() to vm_map_lookup_entry(). Having the fast path in vm_map_lookup{,_locked}() limits its benefits to page faults. Moving it to vm_map_lookup_entry() extends its benefits to other operations on the vm map.
|
186374 |
21-Dec-2008 |
rnoland |
Fix printing of KASSERT message missed in r163604.
Approved by: kib
|
185012 |
16-Nov-2008 |
kib |
Instead of forcing vn_start_write() to reset mp back to NULL for the failed calls with non-NULL vp, explicitely clear mp after failure.
Tested by: stass Reviewed by: tegge PR: 123768 MFC after: 1 week
|
184728 |
06-Nov-2008 |
raj |
Support kernel crash mini dumps on ARM architecture.
Obtained from: Juniper Networks, Semihalf
|
184546 |
02-Nov-2008 |
keramida |
Various comment nits, and typos.
|
184168 |
22-Oct-2008 |
rwatson |
Update mmap() comment: no more block devices, so no more block device cache coherency questions.
MFC after: 3 days
|
183754 |
10-Oct-2008 |
attilio |
Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync()
and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close()
Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit.
As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP
Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
183474 |
29-Sep-2008 |
kib |
Move the code for doing out-of-memory grass from vm_pageout_scan() into the separate function vm_pageout_oom(). Supply a parameter for vm_pageout_oom() describing a reason for the call.
Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone is exhausted.
Reviewed by: alc Tested by: pho, jhb MFC after: 2 weeks
|
183389 |
26-Sep-2008 |
emaste |
Move CTASSERT from header file to source file, per implementation note now in the CTASSERT man page.
|
183383 |
26-Sep-2008 |
kib |
Save previous content of the td_fpop before storing the current filedescriptor into it. Make sure that td_fpop is NULL when calling d_mmap from dev_pager_getpages().
Change guards against td_fpop field being non-NULL with private state for another device, and against sudden clearing the td_fpop. This could occur when either a driver method calls another driver through the filedescriptor operation, or a page fault happen while driver is writing to a memory backed by another driver.
Noted by: rwatson Tested by: rnoland MFC after: 3 days
|
183236 |
21-Sep-2008 |
alc |
Prevent an integer overflow in vm_pageout_page_stats() on machines with a large number of physical pages.
PR: 126158 Submitted by: Dmitry Tejblum MFC after: 3 days
|
183216 |
20-Sep-2008 |
kib |
Allow the d_mmap driver methods to use cdevpriv KPI during verification phase of establishing mapping.
Discussed with: rwatson, jhb, rnoland Tested by: rnoland MFC after: 3 days
|
182371 |
28-Aug-2008 |
attilio |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
182047 |
23-Aug-2008 |
antoine |
Remove unused variable nosleepwithlocks.
PR: 126609 Submitted by: Mateusz Guzik MFC after: 1 month X-MFC: to stable/7 only, this variable is still used in stable/6
|
182028 |
23-Aug-2008 |
nwhitehorn |
Allow the MD UMA allocator to use VM routines like kmem_*(). Existing code requires MD allocator to be available early in the boot process, before the VM is fully available. This defines a new VM define (UMA_MD_SMALL_ALLOC_NEEDS_VM) that allows an MD UMA small allocator to become available at the same time as the default UMA allocator.
Approved by: marcel (mentor)
|
181887 |
20-Aug-2008 |
julian |
A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.
|
181811 |
17-Aug-2008 |
kmacy |
Work around differences in page allocation for initial page tables on xen
MFC after: 1 month
|
181693 |
13-Aug-2008 |
emaste |
Fix REDZONE(9) on amd64 and perhaps other 64 bit targets -- ensure the space that redzone adds to the allocation for storing its metadata is at least as large as the metadata that it will store there.
Submitted by: Nima Misaghian
|
181334 |
05-Aug-2008 |
jhb |
If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()).
With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock.
Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal().
Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
|
181239 |
03-Aug-2008 |
trhodes |
Fill in a few sysctl descriptions.
Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc
|
181024 |
30-Jul-2008 |
jhb |
One more whitespace nit.
|
181020 |
30-Jul-2008 |
jhb |
A few more whitespace fixes.
|
181019 |
30-Jul-2008 |
jhb |
If the kernel has run out of metadata for swap, then explicitly panic() instead of emitting a warning before deadlocking.
MFC after: 1 month
|
181004 |
30-Jul-2008 |
kib |
The behaviour of the lockmgr going back at least to the 4.4BSD-Lite2 was to downgrade the exclusive lock to shared one when exclusive lock owner requested shared lock. New lockmgr panics instead.
The vnode_pager_lock function requests shared lock on the vnode backing the OBJT_VNODE, and can be called when the current thread already holds an exlcusive lock on the vnode. For instance, it happens when handling page fault from the VOP_WRITE() uiomove that writes to the file, with the faulted in page fetched from the vm object backed by the same file. We then get the situation described above.
Verify whether the vnode is already exclusively locked by the curthread and request recursed exclusive vnode lock instead of shared, if true.
Reported by: gallatin Discussed with: attilio
|
180598 |
18-Jul-2008 |
alc |
Eliminate stale comments from kmem_malloc().
|
180446 |
11-Jul-2008 |
kib |
Use the VM_ALLOC_INTERRUPT for the page requests when allocating memory for the bio for swapout write. It allows the page allocator to drain free page list deeper. As result, a deadlock where pageout deamon sleeps waiting for bio to be allocated for swapout is no more reproducable in practice.
Alan said that M_USE_RESERVE shall be ressurrected and used there, but until this is implemented, M_NOWAIT does exactly what is needed.
Tested by: pho, kris Reviewed by: alc No objections from: phk MFC after: 2 weeks (RELENG_7 only)
|
180308 |
05-Jul-2008 |
alc |
Enable the creation of a kmem map larger than 4GB. Submitted by: Tz-Huan Huang
Make several variables related to kmem map auto-sizing static. Found by: CScout
|
179923 |
22-Jun-2008 |
alc |
Make preparations for increasing the size of the kernel virtual address space on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, the memory allocated during bootstrap, before the call to kmem_init(), starts at KERNBASE, which is not necessarily the same as VM_MIN_KERNEL_ADDRESS on amd64.
|
179921 |
21-Jun-2008 |
alc |
KERNBASE is not necessarily an address within the kernel map, e.g., PowerPC/AIM. Consequently, it should not be used to determine the maximum number of kernel map entries. Intead, use VM_MIN_KERNEL_ADDRESS, which marks the start of the kernel map on all architectures.
Tested by: marcel@ (PowerPC/AIM)
|
179765 |
12-Jun-2008 |
ups |
Fix vm object creation locking to allow SHARED vnode locking for vnode_create_vobject. (Not currently used)
Noticed by: kib@
|
179623 |
06-Jun-2008 |
alc |
Essentially, neither madvise(..., MADV_DONTNEED) nor madvise(..., MADV_FREE) work. (Moreover, I don't believe that they have ever worked as intended.) The explanation is fairly simple. Both MADV_DONTNEED and MADV_FREE perform vm_page_dontneed() on each page within the range given to madvise(). This function moves the page to the inactive queue. Specifically, if the page is clean, it is moved to the head of the inactive queue where it is first in line for processing by the page daemon. On the other hand, if it is dirty, it is placed at the tail. Let's further examine the case in which the page is clean. Recall that the page is at the head of the line for processing by the page daemon. The expectation of vm_page_dontneed()'s author was that the page would be transferred from the inactive queue to the cache queue by the page daemon. (Once the page is in the cache queue, it is, in effect, free, that is, it can be reallocated to a new vm object by vm_page_alloc() if it isn't reactivated quickly enough by a user of the old vm object.) The trouble is that nowhere in the execution of either MADV_DONTNEED or MADV_FREE is either the machine-independent reference flag (PG_REFERENCED) or the reference bit in any page table entry (PTE) mapping the page cleared. Consequently, the immediate reaction of the page daemon is to reactivate the page because it is referenced. In effect, the madvise() was for naught. The case in which the page was dirty is not too different. Instead of being laundered, the page is reactivated.
Note: The essential difference between MADV_DONTNEED and MADV_FREE is that MADV_FREE clears a page's dirty field. So, MADV_FREE is always executing the clean case above.
This revision changes vm_page_dontneed() to clear both the machine- independent reference flag (PG_REFERENCED) and the reference bit in all PTEs mapping the page.
MFC after: 6 weeks
|
179296 |
24-May-2008 |
alc |
To date, our implementation of munmap(2) has required that the entirety of the specified range be mapped. Specifically, it has returned EINVAL if the entire range is not mapped. There is not, however, any basis for this in either SuSv2 or our own man page. Moreover, neither Linux nor Solaris impose this requirement. This revision removes this requirement.
Submitted by: Tijl Coosemans PR: 118510 MFC after: 6 weeks
|
179159 |
20-May-2008 |
ups |
Allow VM object creation in ufs_lookup. (If vfs.vmiodirenable is set) Directory IO without a VM object will store data in 'malloced' buffers severely limiting caching of the data. Without this change VM objects for directories are only created on an open() of the directory. TODO: Inline test if VM object already exists to avoid locking/function call overhead.
Tested by: kris@ Reviewed by: jeff@ Reported by: David Filo
|
179081 |
18-May-2008 |
alc |
Retire pmap_addr_hint(). It is no longer used.
|
179076 |
17-May-2008 |
alc |
In order to map device memory using superpages, mmap(2) must find a superpage-aligned virtual address for the mapping. Revision 1.65 implemented an overly simplistic and generally ineffectual method for finding a superpage-aligned virtual address. Specifically, it rounds the virtual address corresponding to the end of the data segment up to the next superpage-aligned virtual address. If this virtual address is unallocated, then the device will be mapped using superpages. Unfortunately, in modern times, where applications like the X server dynamically load much of their code, this virtual address is already allocated. In such cases, mmap(2) simply uses the first available virtual address, which is not necessarily superpage aligned.
This revision changes mmap(2) to use a more robust method, specifically, the VMFS_ALIGNED_SPACE option that is now implemented by vm_map_find().
|
179074 |
17-May-2008 |
alc |
Preset a device object's alignment ("pg_color") based upon the physical address of the device's memory. This enables pmap_align_superpage() to propose a virtual address for mapping the device memory that permits the use of superpage mappings.
|
179019 |
15-May-2008 |
alc |
Don't call vm_reserv_alloc_page() on device-backed objects. Otherwise, the system may panic because there is no reservation structure corresponding to the physical address of the device memory.
Reported by: Giorgos Keramidas
|
178935 |
10-May-2008 |
alc |
Provide the new argument to kmem_suballoc().
|
178933 |
10-May-2008 |
alc |
Introduce a new parameter "superpage_align" to kmem_suballoc() that is used to request superpage alignment for the submap.
Request superpage alignment for the kmem_map.
Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.)
Remove a stale comment from kmem_malloc().
|
178928 |
10-May-2008 |
alc |
Generalize vm_map_find(9)'s parameter "find_space". Specifically, add support for VMFS_ALIGNED_SPACE, which requests the allocation of an address range best suited to superpages. The old options TRUE and FALSE are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no immediate need to update all of vm_map_find(9)'s callers.
While I'm here, correct a misstatement about vm_map_find(9)'s return values in the man page.
|
178875 |
09-May-2008 |
alc |
Introduce pmap_align_superpage(). It increases the starting virtual address of the given mapping if a different alignment might result in more superpage mappings.
|
178792 |
05-May-2008 |
kmacy |
add malloc flag to blist so that it can be used in ithread context
Reviewed by: alc, bsdimp
|
178637 |
28-Apr-2008 |
alc |
Eliminate pointless casts from kmem_suballoc().
|
178630 |
28-Apr-2008 |
alc |
vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can be passed by value.
|
178272 |
17-Apr-2008 |
jeff |
- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc.
Sponsored by: Nokia
|
177956 |
06-Apr-2008 |
alc |
Introduce vm_reserv_reclaim_contig(). This function is used by contigmalloc(9) as a last resort to steal pages from an inactive, partially-used superpage reservation.
Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and refactor it so that a separate subroutine is responsible for breaking the selected reservation. This subroutine is also used by vm_reserv_reclaim_contig().
|
177932 |
05-Apr-2008 |
alc |
Eliminate an unnecessary test from vm_phys_unfree_page().
|
177922 |
04-Apr-2008 |
alc |
Update a comment to vm_map_pmap_enter().
|
177921 |
04-Apr-2008 |
alc |
Reintroduce UMA_SLAB_KMAP; however, change its spelling to UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM. (UMA_SLAB_KMAP met its original demise in revision 1.30 of vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame allocators. Without it, UMA cannot correctly return pages from the jumbo frame zones to the VM system because it resets the pages' object field to NULL instead of the kernel object. In more detail, the jumbo frame zones are created with the option UMA_ZONE_REFCNT. This causes UMA to overwrite the pages' object field with the address of the slab. However, when UMA wants to release these pages, it doesn't know how to restore the object field, so it sets it to NULL. This change teaches UMA how to reset the object field to the kernel object.
Crashes reported by: kris Fix tested by: kris Fix discussed with: jeff MFC after: 6 weeks
|
177762 |
30-Mar-2008 |
alc |
Eliminate an unnecessary printf() from kmem_suballoc(). The subsequent panic() can be extended to convey the same information.
|
177704 |
29-Mar-2008 |
jeff |
- Use vm_object_reference_locked() directly from vm_object_reference(). This is intended to get rid of vget() consumers who don't wish to acquire a lock. This is functionally the same as calling vref(). vm_object_reference_locked() already uses vref.
Discussed with: alc
|
177458 |
20-Mar-2008 |
kib |
Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly obtain the reference. In particular, this fixes the panic reported in the PR. Remove the comments stating that this needs to be done.
PR: kern/119422 MFC after: 1 week
|
177414 |
19-Mar-2008 |
alc |
Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent migration to vm/vm_page.c.
|
177368 |
19-Mar-2008 |
jeff |
- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
|
177342 |
18-Mar-2008 |
alc |
Almost seven years ago, vm/vm_page.c was split into three parts: vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c. Today, vm/vm_pageq.c has withered to the point that it contains only four short functions, two of which are only used by vm/vm_page.c. Since I can't foresee any reason for vm/vm_pageq.c to grow, it is time to fold the remaining contents of vm/vm_pageq.c back into vm/vm_page.c.
Add some comments. Rename one of the functions, vm_pageq_enqueue(), that is now static within vm/vm_page.c to vm_page_enqueue(). Eliminate PQ_MAXCOUNT as it no longer serves any purpose.
|
177261 |
16-Mar-2008 |
alc |
Simplify the inner loop of vm_fault()'s delete-behind heuristic. Instead of checking each page for PG_UNMANAGED, perform a one-time check whether the object is OBJT_PHYS. (PG_UNMANAGED pages only belong to OBJT_PHYS objects.)
|
177253 |
16-Mar-2008 |
rwatson |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr.
MFC after: 1 month Discussed with: imp, rink
|
177091 |
12-Mar-2008 |
jeff |
Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
|
177085 |
12-Mar-2008 |
jeff |
- Pass the priority argument from *sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_* to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter.
Reviewed by: jhb, peter
|
176967 |
09-Mar-2008 |
alc |
Eliminate an unnecessary test from vm_fault's delete-behind heuristic. Specifically, since the delete-behind heuristic is never applied to a device-backed object, there is no point in checking whether each of the object's pages is fictitious. (Only device-backed objects have fictitious pages.)
|
176717 |
01-Mar-2008 |
marcel |
Make the vm_pmap field of struct vmspace the last field in the structure. This allows per-CPU variations of struct pmap on a single architecture without affecting the machine-independent fields. As such, the PMAP variations don't affect the ABI. They become part of it.
|
176596 |
26-Feb-2008 |
alc |
Correct a long-standing error in vm_object_page_remove(). Specifically, pmap_remove_all() must not be called on fictitious pages. To date, fictitious pages have been allocated from zeroed memory, effectively hiding this problem because the fictitious pages appear to have an empty pv list. Submitted by: Kostik Belousov
Rewrite the comments describing vm_object_page_remove() to better describe what it does. Add an assertion. Reviewed by: Kostik Belousov
MFC after: 1 week
|
176526 |
24-Feb-2008 |
alc |
Correct a long-standing error in vm_object_deallocate(). Specifically, only anonymous default (OBJT_DEFAULT) and swap (OBJT_SWAP) objects should ever have OBJ_ONEMAPPING set. However, vm_object_deallocate() was setting it on device (OBJT_DEVICE) objects. As a result, vm_object_page_remove() could be called on a device object and if that occurred pmap_remove_all() would be called on the device object's pages. However, a device object's pages are fictitious, and fictitious pages do not have an initialized pv list (struct md_page).
To date, fictitious pages have been allocated from zeroed memory, effectively hiding this problem. Now, however, the conversion of rotting diagnostics to invariants in the amd64 and i386 pmaps has revealed the problem. Specifically, assertion failures have occurred during the initialization phase of the X server on some hardware.
MFC after: 1 week Discussed with: Kostik Belousov Reported by: Michiel Boland
|
175294 |
13-Jan-2008 |
attilio |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
175210 |
10-Jan-2008 |
pjd |
When one tries to allocate memory with the M_WAITOK flag and we are short in address space in kmem map call vm_lowmem event in a loop and wait a bit for subsystems to reclaim some memory which in turn will reclaim address space as well.
Note, this is a work-around.
Reviewed by: alc Approved by: alc MFC after: 3 days
|
175202 |
10-Jan-2008 |
attilio |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
175164 |
08-Jan-2008 |
jhb |
Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())
|
175157 |
08-Jan-2008 |
csjp |
When MAC is enabled in the kernel, fix a panic triggered by a locking assertion hit in swapoff_one() when we un-mount a swap partition. We should be using curthread where we used thread0 before. This change also replaces the thread argument with a credential argument, as the MAC framework only requires the cred.
It should be noted that this allows the machine to be rebooted without panicing with "cannot differ from curthread or NULL" when MAC is enabled.
Submitted by: rwatson Reviewed by: attilio MFC after: 2 weeks
|
175079 |
04-Jan-2008 |
kib |
In the vm_map_stack(), check for the specified stack region wraparound.
Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days
|
175067 |
03-Jan-2008 |
alc |
Add an access type parameter to pmap_enter(). It will be used to implement superpage promotion.
Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.
|
175055 |
02-Jan-2008 |
alc |
Defer setting either PG_CACHED or PG_FREE until after the free page queues lock is acquired. Otherwise, the state of a reservation's pages' flags and its population count can be inconsistent. That could result in a page being freed twice.
Reported by: kris
|
175041 |
01-Jan-2008 |
alc |
Correct a style error that was introduced in revision 1.77.
|
174982 |
29-Dec-2007 |
alc |
Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date.
Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)
|
174940 |
27-Dec-2007 |
alc |
Add a list of reservations to the vm object structure.
Recycle the vm object's "pg_color" field to represent the color of the first virtual page address at which the object is mapped instead of the color of the object's first physical page. Since an object may not be mapped, introduce a flag "OBJ_COLORED" that indicates whether "pg_color" is valid.
|
174939 |
27-Dec-2007 |
alc |
Add the superpage reservation type.
|
174825 |
21-Dec-2007 |
alc |
Update the comment describing vm_phys_unfree_page().
|
174821 |
20-Dec-2007 |
alc |
Modify vm_phys_unfree_page() so that it no longer requires the given page to be in the free lists. Instead, it now returns TRUE if it removed the page from the free lists and FALSE if the page was not in the free lists.
This change is required to support superpage reservations. Specifically, once reservations are introduced, a cached page can either be in the free lists or a reservation.
|
174799 |
19-Dec-2007 |
alc |
Correct one half of a loop continuation condition in vm_phys_unfree_page(). At present, this error is inconsequential; the other half of the loop continuation condition is sufficient to achieve correct execution.
|
174769 |
19-Dec-2007 |
alc |
Eliminate redundant code from vm_page_startup().
|
174543 |
11-Dec-2007 |
alc |
Simplify vm_page_free_toq().
|
174142 |
02-Dec-2007 |
alc |
Correct a comment.
|
174137 |
01-Dec-2007 |
rwatson |
Modify stack(9) stack_print() and stack_sbuf_print() routines to use new linker interfaces for looking up function names and offsets from instruction pointers. Create two variants of each call: one that is "DDB-safe" and avoids locking in the linker, and one that is safe for use in live kernels, by virtue of observing locking, and in particular safe when kernel modules are being loaded and unloaded simultaneous to their use. This will allow them to be used outside of debugging contexts.
Modify two of three current stack(9) consumers to use the DDB-safe interfaces, as they run in low-level debugging contexts, such as inside lockmgr(9) and the kernel memory allocator.
Update man page.
|
173918 |
25-Nov-2007 |
alc |
Make contigmalloc(9)'s page laundering more robust. Specifically, use vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered.
MFC after: 6 weeks
|
173901 |
25-Nov-2007 |
alc |
Tidy up: Add comments. Eliminate the pointless malloc_type_allocated(..., 0) calls that occur when contigmalloc() has failed. Eliminate the acquisition and release of the page queues lock from vm_page_release_contig(). Rename contigmalloc2() to contigmapping(), reflecting what it does.
|
173853 |
23-Nov-2007 |
alc |
Add a read/write sysctl for reconfiguring the maximum number of physical pages that can be wired.
Submitted by: Eugene Grosbein PR: 114654 MFC after: 6 weeks
|
173846 |
22-Nov-2007 |
alc |
Remove an unnecessary call to pmap_remove_all() and the associated "XXX" comments from vnode_pager_setsize(). This call was introduced in revision 1.140 to address a problem that no longer exists. Specifically, pmap_zero_page_area() has replaced a (possibly) problematic implementation of page zeroing that was based on vm_pager_map(), bzero(), and vm_pager_unmap().
|
173836 |
21-Nov-2007 |
alc |
When reactivating a cached page, reset the page's pool to the default pool. (Not doing this before was a performance pessimization but not a cause for panic.)
|
173708 |
17-Nov-2007 |
alc |
Prevent the leakage of wired pages in the following circumstances: First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated. Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the pages beyond the EOF are unmapped and freed. However, when the file is mlock(2)ed, the pages beyond the EOF are unmapped but not freed because they have a non-zero wire count. This can be a mistake. Specifically, it is a mistake if the sole reason why the pages are wired is because of wired, managed mappings. Previously, unmapping the pages destroys these wired, managed mappings, but does not reduce the pages' wire count. Consequently, when the file is unmapped, the pages are not unwired because the wired mapping has been destroyed. Moreover, when the vm object is finally destroyed, the pages are leaked because they are still wired. The fix is to reduce the pages' wired count by the number of wired, managed mappings destroyed. To do this, I introduce a new pmap function pmap_page_wired_mappings() that returns the number of managed mappings to the given physical page that are wired, and I use this function in vm_object_page_remove().
Reviewed by: tegge MFC after: 6 weeks
|
173429 |
07-Nov-2007 |
pjd |
Change unused 'user_wait' argument to 'timo' argument, which will be used to specify timeout for msleep(9).
Discussed with: alc Reviewed by: alc
|
173361 |
05-Nov-2007 |
kib |
Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper).
In collaboration with: Peter Holm Reviewed by: jhb
|
173357 |
05-Nov-2007 |
kib |
The intent of the freeing the (zeroed) page in vm_page_cache() for default object rather than cache it was to have vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is no cached page in object at pindex. This allows to avoid explicit checks for cached pages in vm_object_backing_scan().
For now, we need the same bandaid for the swap object, otherwise both the vm_page_lookup() and the pager can report that there is no page at offset, while page is stored in the cache. Also, this fixes another instance of the KASSERT("object type is incompatible") failure in the vm_page_cache_transfer().
Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days
|
173292 |
02-Nov-2007 |
maxim |
o Fix panic message: it's swap_pager_putpages() not swap_pager_getpages().
Submitted by: Mark Tinguely
|
173180 |
30-Oct-2007 |
remko |
Correct a copy and paste'o in phys_pager.c, we are talking about phys here and not about devices.
PR: 93755 Approved by: imp (mentor, implicit when re-assigning the ticket to me).
|
173049 |
27-Oct-2007 |
alc |
Change vm_page_cache_transfer() such that it does not transfer pages that would have an offset beyond the end of the target object. Such pages should remain in the source object.
MFC after: 3 days Diagnosed and reviewed by: Kostik Belousov Reported and tested by: Peter Holm
|
172930 |
24-Oct-2007 |
rwatson |
Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms:
mac_<object>_<method/action> mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names.
All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
172875 |
22-Oct-2007 |
alc |
Correct an error of omission in the reimplementation of the page cache: vnode_pager_setsize() must handle the case where a file is truncated to a non-page-size-aligned boundary and there is a cached page underlying the new end of file.
Reported by: kris, tegge Tested by: kris MFC after: 3 days
|
172863 |
22-Oct-2007 |
alc |
Correct an error in vm_map_sync(), nee vm_map_clean(), that has existed since revision 1.1. Specifically, neither traversal of the vm map checks whether the end of the vm map has been reached. Consequently, the first traversal can wrap around and bogusly return an error.
This error has gone unnoticed for so long because no one had ever before tried msync(2)ing a region above the stack.
Reported by: peter MFC after: 1 week
|
172836 |
20-Oct-2007 |
julian |
Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
|
172780 |
18-Oct-2007 |
alc |
The previous revision, updating vm_object_page_remove() for the new page cache, did not account for the case where the vm object has nothing but cached pages.
Reported by: kris, tegge Reviewed by: tegge MFC after: 3 days
|
172779 |
18-Oct-2007 |
peter |
Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int.
|
172700 |
16-Oct-2007 |
ru |
Fix CTL_VM_NAMES.
|
172545 |
11-Oct-2007 |
jhb |
Allow recursion on the 'zones' internal UMA zone.
Submitted by: thompsa MFC after: 1 week Approved by: re (kensmith) Discussed with: jeff
|
172475 |
08-Oct-2007 |
kib |
Do not dereference NULL pointer.
Reported by: Peter Holm Reviewed by: alc Approved by: re (kensmith)
|
172472 |
08-Oct-2007 |
alc |
In the rare case that vm_page_cache() actually frees the given page, it must first ensure that the page is no longer mapped. This is trivially accomplished by calling pmap_remove_all() a little earlier in vm_page_cache(). While I'm in the neighborbood, make a related panic message a little more useful.
Approved by: re (kensmith) Reported by: Peter Holm and Konstantin Belousov Reviewed by: Konstantin Belousov
|
172466 |
07-Oct-2007 |
alc |
Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs when uma_small_free() frees a page. The solution has two parts: (1) Mark pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested. This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable flags, i.e., they do not change state between the time that a page is allocated and freed.
Approved by: re (kensmith) PR: 116794
|
172341 |
27-Sep-2007 |
alc |
Correct an error of omission in the reimplementation of the page cache: vm_object_page_remove() should convert any cached pages that fall with the specified range to free pages. Otherwise, there could be a problem if a file is first truncated and then regrown. Specifically, some old data from prior to the truncation might reappear.
Generalize vm_page_cache_free() to support the conversion of either a subset or the entirety of an object's cached pages.
Reported by: tegge Reviewed by: tegge Approved by: re (kensmith)
|
172322 |
25-Sep-2007 |
alc |
Correct an error in the previous revision, specifically, vm_object_madvise() should request that the reactivated, cached page not be busied.
Reported by: Rink Springer Approved by: re (kensmith)
|
172317 |
25-Sep-2007 |
alc |
Change the management of cached pages (PQ_CACHE) in two fundamental ways:
(1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed.
Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
|
172268 |
21-Sep-2007 |
jeff |
- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal.
Approved by: re
|
172207 |
17-Sep-2007 |
jeff |
- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM.
Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
|
172188 |
15-Sep-2007 |
alc |
Correct an assertion in vm_pageout_flush(). Specifically, if a page's status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate.
Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week
|
171902 |
20-Aug-2007 |
kib |
Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert(). For this, introduce vm_map_fixed() that does that for MAP_FIXED case.
Dropping the lock allowed for parallel thread to occupy the freed space.
Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
|
171889 |
18-Aug-2007 |
kib |
Remove comment that is no longer quite true.
Noted by: alc Approved by: re (kensmith)
|
171887 |
18-Aug-2007 |
kib |
Fix the phys_pager in the way similar to the rev. 1.83 of the sys/vm/device_pager.c:
Protect the creation of the phys pager with non-NULL handle with the phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now synchronized with its removal from the list, and phys_pager_mtx is put before vm object lock in lock order. Dispose the phys_pager_alloc_lock and tsleep calls, together with acquiring Giant, since phys_pager_mtx now covers the same block.
Reviewed by: alc Approved by: re (kensmith)
|
171779 |
07-Aug-2007 |
kib |
Protect the creation of the device pager with the dev_pager_mtx. Lookup of device pager in the pagers list by handle is now synchronized with its removal from the list, and dev_pager_mtx is put before vm object lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx now covers the same block.
Noted by: kensmith Reviewed by: alc Approved by: re (kensmith)
|
171737 |
05-Aug-2007 |
alc |
Consider a scenario in which one processor, call it Pt, is performing vm_object_terminate() on a device-backed object at the same time that another processor, call it Pa, is performing dev_pager_alloc() on the same device. The problem is that vm_pager_object_lookup() should not be allowed to return a doomed object, i.e., an object with OBJ_DEAD set, but it does. In detail, the unfortunate sequence of events is: Pt in vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls vm_pager_object_lookup(), which returns the doomed object. Next, Pa calls vm_object_reference(), which requires the doomed object's lock, so Pa waits for Pt to release the doomed object's lock. Pt proceeds to the point in vm_object_terminate() where it releases the doomed object's lock. Pa is now able to complete vm_object_reference() because it can now complete the acquisition of the doomed object's lock. So, now the doomed object has a reference count of one! Pa releases dev_pager_sx and returns the doomed object from dev_pager_alloc(). Pt now acquires dev_pager_mtx, removes the doomed object from dev_pager_object_list, releases dev_pager_mtx, and finally calls uma_zfree with the doomed object. However, the doomed object is still in use by Pa.
Repeating my key point, vm_pager_object_lookup() must not return a doomed object. Moreover, the test for the object's state, i.e., doomed or not, and the increment of the object's reference count should be carried out atomically.
Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks
|
171725 |
05-Aug-2007 |
kib |
Do not acquire Giant unconditionally around the calls to the cdevsw d_mmap methods. prep_cdevsw() already installs the shims that acquire/drop Giant for the methods of a driver that specified the D_NEEDGIANT flag.
Reviewed by: alc Approved by: re (kensmith)
|
171633 |
27-Jul-2007 |
alc |
Add a counter for the total number of pages cached and support for reporting the value of this counter in the program "vmstat".
Approved by: re (rwatson)
|
171599 |
26-Jul-2007 |
pjd |
When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more...
Discussed with: kib, ups Approved by: re (rwatson)
|
171514 |
20-Jul-2007 |
alc |
Two changes to vm_fault_additional_pages():
1. Rewrite the backward scan. Specifically, reverse the order in which pages are allocated so that upon failure it is never necessary to free pages that were just allocated. Moreover, any allocated pages can be put to use. This makes the backward scan behave just like the forward scan.
2. Eliminate an explicit, unsynchronized check for low memory before calling vm_page_alloc(). It serves no useful purpose. It is, in effect, optimizing the uncommon case at the expense of the common case.
Approved by: re (hrs) MFC after: 3 weeks
|
171451 |
14-Jul-2007 |
alc |
Eliminate two unused functions: vm_phys_alloc_pages() and vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to vm_phys_alloc_pages() and vm_phys_free_pages_locked() to vm_phys_free_pages(). Add comments regarding the need for the free page queues lock to be held by callers to these functions. No functional changes.
Approved by: re (hrs)
|
171445 |
14-Jul-2007 |
alc |
Eliminate dead code, specifically, an unused sysctl: "vm.idlezero_maxrun".
Approved by: re (hrs)
|
171420 |
13-Jul-2007 |
alc |
Update a comment describing the page queues.
Approved by: re (hrs)
|
171417 |
12-Jul-2007 |
alc |
Eliminate dead code.
Approved by: re (hrs)
|
171347 |
10-Jul-2007 |
alc |
Correct a problem in the ZERO_COPY_SOCKETS option, specifically, in vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given page is wired, preventing it from being recycled. However, when transmission of the page completes, the page is unwired and returned to the page queues. At that point, the page is not in any special state that prevents it from being recycled. Consequently, vm_page_cowfault() should verify that the page is still held by the same vm object before retrying the replacement of the page. Note: The containing object is, however, safe from being recycled by virtue of having a non-zero paging-in-progress count.
While I'm here, add some assertions and comments.
Approved by: re (rwatson) MFC After: 3 weeks
|
171310 |
08-Jul-2007 |
alc |
Eliminate the special case handling of OBJT_DEVICE objects in vm_fault_additional_pages() that was introduced in revision 1.47. Then as now, it is unnecessary because dev_pager_haspage() returns zero for both the number of pages to read ahead and read behind, producing the same exact behavior by vm_fault_additional_pages() as the special case handling.
Approved by: re (rwatson)
|
171288 |
06-Jul-2007 |
alc |
When a cached page is reactivated in vm_fault(), update the counter that tracks the total number of reactivated pages. (We have not been counting reactivations by vm_fault() since revision 1.46.)
Correct a comment in vm_fault_additional_pages().
Approved by: re (kensmith) MFC after: 1 week
|
171212 |
04-Jul-2007 |
peter |
Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate
Approved by: re (kensmith)
|
171150 |
02-Jul-2007 |
alc |
In the previous revision, when I replaced the unconditional acquisition of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode.
In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock.
This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object.
Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks
|
171048 |
26-Jun-2007 |
alc |
Eliminate the use of Giant from vm_daemon(). Replace the unconditional use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT().
Approved by: re (kensmith) MFC after: 3 weeks
|
171019 |
24-Jun-2007 |
alc |
Eliminate GIANT_REQUIRED from swap_pager_putpages().
Approved by: re (mux) MFC after: 1 week
|
170905 |
18-Jun-2007 |
alc |
Eliminate unnecessary checks from vm_pageout_clean(): The page that is passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED.
Reviewed by: tegge
|
170865 |
17-Jun-2007 |
mjacob |
Don't declare inline a function which isn't.
|
170864 |
17-Jun-2007 |
mjacob |
Make sure object is NULL- there is a possible case where you could fall through to it being used w/o being set. Put a break in the default case.
|
170863 |
17-Jun-2007 |
mjacob |
Initialize reqpage to zero.
|
170836 |
16-Jun-2007 |
alc |
If attempting to cache a "busy", panic instead of printing a diagnostic message and returning.
|
170818 |
16-Jun-2007 |
alc |
Update a comment.
|
170816 |
16-Jun-2007 |
alc |
Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map.
Approved by: re
|
170658 |
13-Jun-2007 |
alc |
Eliminate dead code: We have not performed pageouts on the kernel object in this millenium.
|
170529 |
11-Jun-2007 |
alc |
Conditionally acquire Giant in vm_contig_launder_page().
|
170517 |
10-Jun-2007 |
attilio |
Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments
Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)
|
170477 |
10-Jun-2007 |
alc |
Add a new physical memory allocator. However, do not yet connect it to the build.
This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map.
Approved by: re
|
170307 |
05-Jun-2007 |
jeff |
Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization.
Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
|
170292 |
04-Jun-2007 |
attilio |
Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs).
Reviewed by: alc, bde Approved by: jeff (mentor)
|
170291 |
04-Jun-2007 |
attilio |
Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value.
Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days.
Reviewed by: alc, bde Approved by: jeff (mentor)
|
170174 |
01-Jun-2007 |
jeff |
- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits.
Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
|
170170 |
31-May-2007 |
attilio |
Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately.
Requested by: alc Approved by: jeff (mentor)
|
170152 |
31-May-2007 |
kib |
Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file.
Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
|
170149 |
31-May-2007 |
attilio |
Add functions sx_xlock_sig() and sx_slock_sig(). These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all.
Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine.
Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)
|
169849 |
22-May-2007 |
alc |
Eliminate the reactivation of cached pages in vm_fault_prefault() and vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With the exception of calls to vm_map_pmap_enter() from madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter() are both used to create speculative mappings. Thus, always reactivating cached pages is a mistake. In principle, cached pages should only be reactivated by an actual access. Otherwise, the following misbehavior can occur. On a hard fault for a text page the clustering algorithm fetches not only the required page but also several of the adjacent pages. Now, suppose that one or more of the adjacent pages are never accessed. Ultimately, these unused pages become cached pages through the efforts of the page daemon. However, the next activation of the executable reactivates and maps these unused pages. Consequently, they are never replaced. In effect, they become pinned in memory.
|
169805 |
20-May-2007 |
jeff |
- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.
Suggested by: julian@ Contributed by: attilio@
|
169667 |
18-May-2007 |
jeff |
- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
|
169431 |
09-May-2007 |
rwatson |
Update stale comment on protecting UMA per-CPU caches: we now use critical sections rather than mutexes.
|
169291 |
05-May-2007 |
alc |
Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing.
This change is a combination of Nathan Whitehorn's patch and my own work in perforce.
Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194
|
169048 |
26-Apr-2007 |
alc |
Remove some code from vmspace_fork() that became redundant after revision 1.334 modified _vm_map_init() to initialize the new vm map's flags to zero.
|
168979 |
23-Apr-2007 |
rwatson |
Audit pathnames looked up in swapon(2) and swapoff(2).
MFC after: 2 weeks Obtained from: TrustedBSD Project
|
168852 |
19-Apr-2007 |
alc |
Correct contigmalloc2()'s implementation of M_ZERO. Specifically, contigmalloc2() was always testing the first physical page for PG_ZERO, not the current page of interest.
Submitted by: Michael Plass PR: 81301 MFC after: 1 week
|
168851 |
19-Apr-2007 |
alc |
Correct two comments.
Submitted by: Michael Plass
|
168581 |
10-Apr-2007 |
keramida |
Minor typo fix, noticed while I was going through *_pager.c files.
|
168395 |
05-Apr-2007 |
pjd |
When KVA is exhausted, try the vm_lowmem event for the last time before panicing. This helps a lot in ZFS stability.
|
168394 |
05-Apr-2007 |
pjd |
Fix a problem for file systems that don't implement VOP_BMAP() operation.
The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(), which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg. EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before' and 'after' arguments, so we have some accidental values there. This bascially was causing this condition to be meet:
if ((rahead + rbehind) > ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) { pagedaemon_wakeup(); [...] }
(we have some random values in rahead and rbehind variables)
I'm not entirely sure this is the right fix, maybe we should just return FALSE in vnode_pager_haspage() when VOP_BMAP() fails?
alc@ knows about this problem, maybe he will be able to come up with a better fix if this is not the right one.
|
167939 |
27-Mar-2007 |
alc |
Prevent a race between vm_object_collapse() and vm_object_split() from causing a crash.
Suppose that we have two objects, obj and backing_obj, where backing_obj is obj's backing object. Further, suppose that backing_obj has a reference count of two. One being the reference held by obj and the other by a map entry. Now, suppose that the map entry is deallocated and its reference removed by vm_object_deallocate(). vm_object_deallocate() recognizes that the only remaining reference is from a shadow object, obj, and calls vm_object_collapse() on obj. vm_object_collapse() executes
if (backing_object->ref_count == 1) { /* * If there is exactly one reference to the backing * object, we can collapse it into the parent. */ vm_object_backing_scan(object, OBSC_COLLAPSE_WAIT);
vm_object_backing_scan(OBSC_COLLAPSE_WAIT) executes
if (op & OBSC_COLLAPSE_WAIT) { vm_object_set_flag(backing_object, OBJ_DEAD); }
Finally, suppose that either vm_object_backing_scan() or vm_object_collapse() sleeps releasing its locks. At this instant, another thread executes vm_object_split(). It crashes in vm_object_reference_locked() on the assertion that the object is not dead. If, however, assertions are not enabled, it crashes much later, after the object has been recycled, in vm_object_deallocate() because the shadow count and shadow list are inconsistent.
Reviewed by: tegge Reported by: jhb MFC after: 1 week
|
167880 |
25-Mar-2007 |
alc |
Two small changes to vm_map_pmap_enter():
1) Eliminate an unnecessary check for fictitious pages. Specifically, only device-backed objects contain fictitious pages and the object is not device-backed.
2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to prevent possible wrap around with extremely large maps and objects, respectively. Observed by: tegge (last summer)
|
167829 |
23-Mar-2007 |
alc |
vm_page_busy() no longer requires the page queues lock to be held. Reduce the scope of the page queues lock in vm_fault() accordingly.
|
167795 |
22-Mar-2007 |
alc |
Change the order of lock reacquisition in vm_object_split() in order to simplify the code slightly. Add a comment concerning lock ordering.
|
167243 |
05-Mar-2007 |
alc |
Use PCPU_LAZY_INC() to update page fault statistics.
|
167091 |
27-Feb-2007 |
jhb |
Use pause() in vm_object_deallocate() to yield the CPU to the lock holder rather than a tsleep() on &proc0. The only wakeup on &proc0 is intended to awaken the swapper, not random threads blocked in vm_object_deallocate().
|
167086 |
27-Feb-2007 |
jhb |
Use pause() rather than tsleep() on stack variables and function pointers.
|
166964 |
25-Feb-2007 |
alc |
Change the way that unmanaged pages are created. Specifically, immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock.
Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc().
Remove vm_page_unmanage(). It is no longer used.
|
166882 |
22-Feb-2007 |
alc |
Change the page's CLEANCHK flag from being a page queue mutex synchronized flag to a vm object mutex synchronized flag.
|
166808 |
18-Feb-2007 |
alc |
Enable vm_page_free() and vm_page_free_zero() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue.
|
166805 |
17-Feb-2007 |
alc |
Remove a stale comment. Add punctuation to a nearby comment.
|
166736 |
15-Feb-2007 |
alc |
Relax the page queue lock assertions in vm_page_remove() and vm_page_free_toq() to account for recent changes that allow vm_page_free_toq() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue. (Examples of such pages include page table pages, pv entry pages, and uma small alloc pages.)
|
166699 |
14-Feb-2007 |
alc |
Avoid the unnecessary acquisition of the free page queues lock when a page is actually being added to the hold queue, not the free queue. At the same time, avoid unnecessary tests to wake up threads waiting for free memory and the idle thread that zeroes free pages. (These tests will be performed later when the page finally moves from the hold queue to the free queue.)
|
166654 |
11-Feb-2007 |
rwatson |
Add uma_set_align() interface, which will be called at most once during boot by MD code to indicated detected alignment preference. Rather than cache alignment being encoded in UMA consumers by defining a global alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now a special value (-1) that causes UMA to look at registered alignment. If no preferred alignment has been selected by MD code, a default alignment of (16 - 1) will be used.
Currently, no hardware platforms specify alignment; architecture maintainers will need to modify MD startup code to specify an alignment if desired. This must occur before initialization of UMA so that all UMA zones pick up the requested alignment.
Reviewed by: jeff, alc Submitted by: attilio
|
166637 |
11-Feb-2007 |
alc |
Use the free page queue mutex instead of the page queue mutex to synchronize sleeping and waking of the zero idle thread.
|
166550 |
07-Feb-2007 |
jhb |
- Move 'struct swdevt' back into swap_pager.h and expose it to userland. - Restore support for fetching swap information from crash dumps via kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps.
Reviewed by: arch@, phk MFC after: 1 week
|
166544 |
07-Feb-2007 |
alc |
Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the vm page queue free mutex instead of the vm page queue mutex.
|
166508 |
05-Feb-2007 |
alc |
Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.
|
166213 |
25-Jan-2007 |
mohans |
Fix for problems that occur when all mbuf clusters migrate to the mbuf packet zone. Cluster allocations fail when this happens. Also processes that may have blocked on cluster allocations will never be woken up. Thanks to rwatson for an overview of the issue and pointers to the mbuma paper and his tool to dump out UMA zones.
Reviewed by: andre@
|
166211 |
24-Jan-2007 |
mohans |
Fix for a bug where only one process (of multiple) blocked on maxpages on a zone is woken up, with the rest never being woken up as a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked procsses instead. This change introduces a thundering herd, but since this should be relatively infrequent, optimizing this (by introducing a count of blocked processes, for example) may be premature.
Reviewd by: ups@
|
166188 |
23-Jan-2007 |
jeff |
- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched.
Discussed with: julian
- Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers.
Suggested by: jhb
Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
|
166074 |
17-Jan-2007 |
delphij |
Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.
|
165928 |
10-Jan-2007 |
rwatson |
Remove uma_zalloc_arg() hack, which coerced M_WAITOK to M_NOWAIT when allocations were made using improper flags in interrupt context. Replace with a simple WITNESS warning call. This restores the invariant that M_WAITOK allocations will always succeed or die horribly trying, which is relied on by many UMA consumers.
MFC after: 3 weeks Discussed with: jhb
|
165854 |
07-Jan-2007 |
alc |
Declare the map entry created by kmem_init() for the range from VM_MIN_KERNEL_ADDRESS to the end of the kernel's bootstrap data as MAP_NOFAULT.
|
165809 |
05-Jan-2007 |
jhb |
- Add a new function uma_zone_exhausted() to see if a zone is full. - Add a printf in swp_pager_meta_build() to warn if the swapzone becomes exhausted so that there's at least a warning before a box that runs out of swapzone space before running out of swap space deadlocks.
MFC after: 1 week Reviwed by: alc
|
165309 |
17-Dec-2006 |
alc |
Optimize vm_object_split(). Specifically, make the number of iterations equal to the number of physical pages that are renamed to the new object rather than the new object's virtual size.
|
165278 |
16-Dec-2006 |
alc |
Simplify the computation of the new object's size in vm_object_split().
|
165007 |
08-Dec-2006 |
kmacy |
Remove the requirement that phys_avail be sorted in ascending order by explicitly finding the lowest and highest addresses when calculating the size of the vm_pages array
Reviewed by :alc
|
164936 |
06-Dec-2006 |
julian |
Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it.
Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable.
The ULE scheduler compiles again but I have no idea if it works.
The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit.
Tested by David Xu, and Dan Eischen using libthr and libpthread.
|
164446 |
20-Nov-2006 |
ru |
The clean_map has been made local to vm_init.c long ago.
|
164437 |
20-Nov-2006 |
ru |
Remove a redundant pointer-type variable.
|
164429 |
20-Nov-2006 |
ru |
When counting vm totals, skip unreferenced objects, including vnodes representing mounted file systems.
Reviewed by: alc MFC after: 3 days
|
164234 |
13-Nov-2006 |
alc |
There is no point in setting PG_REFERENCED on kmem_object pages because they are "unmanaged", i.e., non-pageable, pages.
Remove a stale comment.
|
164229 |
12-Nov-2006 |
alc |
Make pmap_enter() responsible for setting PG_WRITEABLE instead of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)
|
164101 |
08-Nov-2006 |
alc |
I misplaced the assertion that was added to vm_page_startup() in the previous change. Correct its placement.
|
164100 |
08-Nov-2006 |
alc |
Simplify the construction of the free queues in vm_page_startup(). Add an assertion to test a hypothesis concerning other redundant computation in vm_page_startup().
|
164089 |
08-Nov-2006 |
alc |
Ensure that the page's oflags field is initialized by contigmalloc().
|
164033 |
06-Nov-2006 |
rwatson |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
163709 |
26-Oct-2006 |
jb |
Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE).
Reviewed by: davidxu@
|
163702 |
26-Oct-2006 |
rwatson |
Better align output of "show uma" by moving from displaying the basic counters of allocs/frees/use for each zone to the same statistics shown by userspace "vmstat -z".
MFC after: 3 days
|
163622 |
23-Oct-2006 |
alc |
The page queues lock is no longer required by vm_page_wakeup().
|
163614 |
22-Oct-2006 |
alc |
The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.
|
163606 |
22-Oct-2006 |
rwatson |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
163604 |
22-Oct-2006 |
alc |
Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.
|
163594 |
21-Oct-2006 |
alc |
Eliminate unnecessary PG_BUSY tests. They originally served a purpose that is now handled by vm object locking.
|
163361 |
14-Oct-2006 |
alc |
Long ago, revision 1.22 of vm/vm_pager.h introduced a bug. Specifically, it introduced a check after the call to file system's get pages method that assumes that the get pages method does not change the array of pages that is passed to it. In the case of vnode_pager_generic_getpages(), this assumption has been incorrect. The contents of the array of pages may be shifted by vnode_pager_generic_getpages(). Likely, the problem has been hidden by vnode_pager_haspage() limiting the set of pages that are passed to vnode_pager_generic_getpages() such that a shift never occurs.
The fix implemented herein is to adjust the pointer to the array of pages rather than shifting the pages within the array.
MFC after: 3 weeks Fix suggested by: tegge
|
163359 |
14-Oct-2006 |
alc |
Change vnode_pager_addr() such that on returning it distinguishes between an error returned by VOP_BMAP() and a hole in the file.
Change the callers to vnode_pager_addr() such that they return VM_PAGER_ERROR when VOP_BMAP fails instead of a zero-filled page.
Reviewed by: tegge MFC after: 3 weeks
|
163259 |
12-Oct-2006 |
kmacy |
sun4v requires TSBs (translation storage buffers) to be contiguous and be size aligned requiring heavy usage of vm_page_alloc_contig
This change makes vm_page_alloc_contig SMP safe
Approved by: scottl (acting as backup for mentor rwatson)
|
163210 |
10-Oct-2006 |
alc |
Distinguish between two distinct kinds of errors from VOP_BMAP() in vnode_pager_generic_getpages(): (1) that VOP_BMAP() is unsupported by the underlying file system and (2) an error in performing the VOP_BMAP(). Previously, vnode_pager_generic_getpages() assumed that all errors were of the first type. If, in fact, the error was of the second type, the likely outcome was for the process to become permanently blocked on a busy page.
MFC after: 3 weeks Reviewed by: tegge
|
163140 |
08-Oct-2006 |
alc |
Change vnode_pager_generic_getpages() so that it does not panic if the given file is sparse. Instead, it zeroes the requested page.
Reviewed by: tegge PR: kern/98116 MFC after: 3 days
|
162750 |
29-Sep-2006 |
kensmith |
Fix two minor style(9) nits in v1.313 which were noticed during an MFC review. alc@ will be MFCing V1.313 plus style fix to RELENG_6.
|
161968 |
03-Sep-2006 |
alc |
Make vm_page_release_contig() static.
|
161674 |
27-Aug-2006 |
alc |
Refactor vm_page_sleep_if_busy() so that the test for a busy page is inlined and a procedure call is made in the rare case, i.e., when it is necessary to sleep. In this case, inlining the test actually makes the kernel smaller.
|
161629 |
26-Aug-2006 |
alc |
Prevent a call to contigmalloc() that asks for more physical memory than the machine has from causing a panic.
Submitted by: Michael Plass PR: 101668 MFC after: 3 days
|
161597 |
25-Aug-2006 |
alc |
The return value from vm_pageq_add_new_page() is not used. Eliminate it.
|
161492 |
21-Aug-2006 |
alc |
Add _vm_stats and _vm_stats_misc to the sysctl declarations in sysctl.h and eliminate their declarations from various source files.
|
161489 |
21-Aug-2006 |
alc |
vm_page_zero_idle()'s return value serves no purpose. Eliminate it.
|
161486 |
21-Aug-2006 |
alc |
Page flags are reset on (re)allocation. There is no need to clear any flags except for PG_ZERO in vm_page_free_toq().
|
161257 |
13-Aug-2006 |
alc |
Reimplement the page's NOSYNC flag as an object-synchronized instead of a page queues-synchronized flag. Reduce the scope of the page queues lock in vm_fault() accordingly.
Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the scope of the page queues lock. Reviewed by: tegge Additionally, eliminate an unnecessary dereference in computing the argument that is passed to vm_object_set_writeable_dirty().
|
161213 |
11-Aug-2006 |
alc |
Ensure that the page's new field for object-synchronized flags is always initialized to zero.
Call vm_page_sleep_if_busy() instead of duplicating its implementation in vm_page_grab().
|
161143 |
10-Aug-2006 |
alc |
Change vm_page_cowfault() so that it doesn't allocate a pre-busied page.
|
161125 |
09-Aug-2006 |
alc |
Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page.
Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively.
Eliminate the assertion that the page queues lock is held in vm_page_io_finish().
Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().
|
161014 |
06-Aug-2006 |
alc |
Eliminate the acquisition and release of the page queues lock around a call to vm_page_sleep_if_busy().
|
161013 |
06-Aug-2006 |
alc |
Change vm_page_sleep_if_busy() so that it no longer requires the caller to hold the page queues lock.
|
161005 |
05-Aug-2006 |
alc |
Remove a stale comment.
|
160960 |
03-Aug-2006 |
alc |
When sleeping on a busy page, use the lock from the containing object rather than the global page queues lock.
|
160889 |
01-Aug-2006 |
alc |
Complete the transition from pmap_page_protect() to pmap_remove_write(). Originally, I had adopted sparc64's name, pmap_clear_write(), for the function that is now pmap_remove_write(). However, this function is more like pmap_remove_all() than like pmap_clear_modify() or pmap_clear_reference(), hence, the name change.
The higher-level rationale behind this change is described in src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm trying to clean up and fix our support for execute access.
Reviewed by: marcel@ (ia64)
|
160585 |
22-Jul-2006 |
alc |
Export the number of object bypasses and collapses through sysctl.
|
160561 |
21-Jul-2006 |
alc |
Retire debug.mpsafevm. None of the architectures supported in CVS require it any longer.
|
160540 |
21-Jul-2006 |
alc |
Eliminate OBJ_WRITEABLE. It hasn't been used in a long time.
|
160525 |
20-Jul-2006 |
alc |
Add pmap_clear_write() to the interface between the virtual memory system's machine-dependent and machine-independent layers. Once pmap_clear_write() is implemented on all of our supported architectures, I intend to replace all calls to pmap_page_protect() by calls to pmap_clear_write(). Why? Both the use and implementation of pmap_page_protect() in our virtual memory system has subtle errors, specifically, the management of execute permission is broken on some architectures. The "prot" argument to pmap_page_protect() should behave differently from the "prot" argument to other pmap functions. Instead of meaning, "give the specified access rights to all of the physical page's mappings," it means "don't take away the specified access rights from all of the physical page's mappings, but do take away the ones that aren't specified." However, owing to our i386 legacy, i.e., no support for no-execute rights, all but one invocation of pmap_page_protect() specifies VM_PROT_READ only, when the intent is, in fact, to remove only write permission. Consequently, a faithful implementation of pmap_page_protect(), e.g., ia64, would remove execute permission as well as write permission. On the other hand, some architectures that support execute permission have basically ignored whether or not VM_PROT_EXECUTE is passed to pmap_page_protect(), e.g., amd64 and sparc64. This change represents the first step in replacing pmap_page_protect() by the less subtle pmap_clear_write() that is already implemented on amd64, i386, and sparc64.
Discussed with: grehan@ and marcel@
|
160460 |
18-Jul-2006 |
rwatson |
Fix build of uma_core.c when DDB is not compiled into the kernel by making uma_zone_sumstat() ifdef DDB, as it's only used with DDB now.
Submitted by: Wolfram Fenske <Wolfram.Fenske at Student.Uni-Magdeburg.DE>
|
160421 |
17-Jul-2006 |
alc |
Ensure that vm_object_deallocate() doesn't dereference a stale object pointer: When vm_object_deallocate() sleeps because of a non-zero paging in progress count on either object or object's shadow, vm_object_deallocate() must ensure that object is still the shadow's backing object when it reawakens. In fact, object may have been deallocated while vm_object_deallocate() slept. If so, reacquiring the lock on object can lead to a deadlock.
Submitted by: ups@ MFC after: 3 weeks
|
160414 |
16-Jul-2006 |
rwatson |
Remove sysctl_vm_zone() and vm.zone sysctl from 7.x. As of 6.x, libmemstat(3) is used by vmstat (and friends) to produce more accurate and more detailed statistics information in a machine-readable way, and vmstat continues to provide the same text-based front-end.
This change should not be MFC'd.
|
160236 |
10-Jul-2006 |
alc |
Set debug.mpsafevm to true on PowerPC. (Now, by default, all architectures in CVS have debug.mpsafevm set to true.)
Tested by: grehan@
|
159880 |
23-Jun-2006 |
jhb |
Move the code to handle the vm.blacklist tunable up a layer into vm_page_startup(). As a result, we now only lookup the tunable once instead of looking it up once for every physical page of memory in the system. This cuts out about a 1 second or so delay in boot on x86 systems. The delay is much larger and more noticable on sun4v apparently.
Reported by: kmacy MFC after: 1 week
|
159837 |
21-Jun-2006 |
kib |
Make the mincore(2) return ENOMEM when requested range is not fully mapped.
Requested by: Bruno Haible <bruno at clisp org> Reviewed by: alc Approved by: pjd (mentor) MFC after: 1 month
|
159681 |
17-Jun-2006 |
alc |
Use ptoa(psize) instead of size to compute the end of the mapping in vm_map_pmap_enter().
|
159627 |
15-Jun-2006 |
ups |
Remove mpte optimization from pmap_enter_quick(). There is a race with the current locking scheme and removing it should have no measurable performance impact. This fixes page faults leading to panics in pmap_enter_quick_locked() on amd64/i386.
Reviewed by: alc,jhb,peter,ps
|
159620 |
14-Jun-2006 |
alc |
Correct an error in the previous revision that could lead to a panic: Found mapped cache page. Specifically, if cnt.v_free_count dips below cnt.v_free_reserved after p_start has been set to a non-NULL value, then vm_map_pmap_enter() would break out of the loop and incorrectly call pmap_enter_object() for the remaining address range. To correct this error, this revision truncates the address range so that pmap_enter_object() will not map any cache pages.
In collaboration with: tegge@ Reported by: kris@
|
159475 |
10-Jun-2006 |
alc |
Enable debug.mpsafevm on arm by default.
Tested by: cognet@
|
159303 |
05-Jun-2006 |
alc |
Introduce the function pmap_enter_object(). It maps a sequence of resident pages from the same object. Use it in vm_map_pmap_enter() to reduce the locking overhead of premapping objects.
Reviewed by: tegge@
|
159121 |
31-May-2006 |
ps |
Fix minidumps to include pages allocated via pmap_map on amd64. These pages are allocated from the direct map, and were not previous tracked. This included the vm_page_array and the early UMA bootstrap pages.
Reviewed by: peter
|
159054 |
29-May-2006 |
tegge |
Close race between vmspace_exitfree() and exit1() and races between vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice.
Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier.
Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice.
Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().
Reviewed by: alc
|
158803 |
21-May-2006 |
rwatson |
When allocating a bucket to hold a free'd item in UMA fails, don't report this as an allocation failure for the item type. The failure will be separately recorded with the bucket type. This my eliminate high mbuf allocation failure counts under some circumstances, which can be alarming in appearance, but not actually a problem in practice.
MFC after: 2 weeks Reported by: ps, Peter J. Blok <pblok at bsd4all dot org>, OxY <oxy at field dot hu>, Gabor MICSKO <gmicskoa at szintezis dot hu>
|
158525 |
13-May-2006 |
alc |
Simplify the implementation of vm_fault_additional_pages() based upon the object's memq being ordered. Specifically, replace repeated calls to vm_page_lookup() by two simple constant-time operations.
Reviewed by: tegge
|
158387 |
10-May-2006 |
pjd |
Use better order here.
|
158020 |
25-Apr-2006 |
alc |
Add synchronization to vm_pageq_add_new_page() so that it can be called safely after kernel initialization. Remove GIANT_REQUIRED.
MFC after: 6 weeks
|
157920 |
21-Apr-2006 |
trhodes |
It seems that POSIX would rather ENODEV returned in place of EINVAL when trying to mmap() an fd that isn't a normal file.
Reference: http://www.opengroup.org/onlinepubs/009695399/functions/mmap.html Submitted by: fanf
|
157908 |
21-Apr-2006 |
peter |
Introduce minidumps. Full physical memory crash dumps are still available via the debug.minidump sysctl and tunable.
Traditional dumps store all physical memory. This was once a good thing when machines had a maximum of 64M of ram and 1GB of kvm. These days, machines often have many gigabytes of ram and a smaller amount of kvm. libkvm+kgdb don't have a way to access physical ram that is not mapped into kvm at the time of the crash dump, so the extra ram being dumped is mostly wasted.
Minidumps invert the process. Instead of dumping physical memory in in order to guarantee that all of kvm's backing is dumped, minidumps instead dump only memory that is actively mapped into kvm.
amd64 has a direct map region that things like UMA use. Obviously we cannot dump all of the direct map region because that is effectively an old style all-physical-memory dump. Instead, introduce a bitmap and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that allow certain critical direct map pages to be included in the dump. uma_machdep.c's allocator is the intended consumer.
Dumps are a custom format. At the very beginning of the file is a header, then a copy of the message buffer, then the bitmap of pages present in the dump, then the final level of the kvm page table trees (2MB mappings are expanded into a 4K page mappings), then the sparse physical pages according to the bitmap. libkvm can now conveniently access the kvm page table entries.
Booting my test 8GB machine, forcing it into ddb and forcing a dump leads to a 48MB minidump. While this is a best case, I expect minidumps to be in the 100MB-500MB range. Obviously, never larger than physical memory of course.
minidumps are on by default. It would want be necessary to turn them off if it was necessary to debug corrupt kernel page table management as that would mess up minidumps as well.
Both minidumps and regular dumps are supported on the same machine.
|
157815 |
17-Apr-2006 |
jhb |
Change msleep() and tsleep() to not alter the calling thread's priority if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway.
The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.
|
157628 |
10-Apr-2006 |
pjd |
On shutdown try to turn off all swap devices. This way GEOM providers are properly closed on shutdown.
Requested by: ru Reviewed by: alc MFC after: 2 weeks
|
157443 |
03-Apr-2006 |
peter |
Remove the unused sva and eva arguments from pmap_remove_pages().
|
157144 |
26-Mar-2006 |
jkoshy |
MFP4: Support for profiling dynamically loaded objects.
Kernel changes:
Inform hwpmc of executable objects brought into the system by kldload() and mmap(), and of their removal by kldunload() and munmap(). A helper function linker_hwpmc_list_objects() has been added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve the list of currently loaded kernel modules.
The unused `MAPPINGCHANGE' event has been deprecated in favour of separate `MAP_IN' and `MAP_OUT' events; this change reduces space wastage in the log.
Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to handle the map change callbacks.
Change the default per-cpu sample buffer size to hold 32 samples (up from 16).
Increment __FreeBSD_version.
libpmc(3) changes:
Update libpmc(3) to deal with the new events in the log file; bring the pmclog(3) manual page in sync with the code.
pmcstat(8) changes:
Introduce new options to pmcstat(8): "-r" (root fs path), "-M" (mapfile name), "-q"/"-v" (verbosity control). Option "-k" now takes a kernel directory as its argument but will also work with the older invocation syntax.
Rework string handling in pmcstat(8) to use an opaque type for interned strings. Clean up ELF parsing code and add support for tracking dynamic object mappings reported by a v2.0.00 hwpmc(4).
Report statistics at the end of a log conversion run depending on the requested verbosity level.
Reviewed by: jhb, dds (kernel parts of an earlier patch) Tested by: gallatin (earlier patch)
|
156420 |
08-Mar-2006 |
imp |
Remove leading __ from __(inline|const|signed|volatile). They are obsolete. This should reduce diffs to NetBSD as well.
|
156415 |
08-Mar-2006 |
tegge |
Ignore dirty pages owned by "dead" objects.
|
156225 |
02-Mar-2006 |
tegge |
Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.
|
156224 |
02-Mar-2006 |
tegge |
Hold extra reference to vm object while cleaning pages.
|
155884 |
21-Feb-2006 |
jhb |
Lock the vm_object while checking its type to see if it is a vnode-backed object that requires Giant in vm_object_deallocate(). This is somewhat hairy in that if we can't obtain Giant directly, we have to drop the object lock, then lock Giant, then relock the object lock and verify that we still need Giant. If we don't (because the object changed to OBJT_DEAD for example), then we drop Giant before continuing.
Reviewed by: alc Tested by: kris
|
155790 |
17-Feb-2006 |
tegge |
Expand scope of marker to reduce the number of page queue scan restarts.
|
155784 |
17-Feb-2006 |
tegge |
Check return value from nonblocking call to vn_start_write().
|
155737 |
15-Feb-2006 |
ups |
When the VM needs to allocated physical memory pages (for non interrupt use) and it has not plenty of free pages it tries to free pages in the cache queue. Unfortunately freeing a cached page requires the locking of the object that owns the page. However in the context of allocating pages we may not be able to lock the object and thus can only TRY to lock the object. If the locking try fails the cache page can not be freed and is activated to move it out of the way so that we may try to free other cache pages.
If all pages in the cache belong to objects that are currently locked the cache queue can be emptied without freeing a single page. This scenario caused two problems:
1) vm_page_alloc always failed allocation when it tried freeing pages from the cache queue and failed to do so. However if there are more than cnt.v_interrupt_free_min pages on the free list it should return pages when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause resource exhaustion deadlocks.
2) Threads than need to allocate pages spend a lot of time cleaning up the page queue without really getting anything done while the pagedaemon needs to work overtime to refill the cache.
This change fixes the first problem. (1)
Reviewed by: tegge@
|
155551 |
11-Feb-2006 |
rwatson |
Skip per-cpu caches associated with absent CPUs when generating a memory statistics record stream via sysctl.
MFC after: 3 days
|
155384 |
06-Feb-2006 |
jeff |
- Fix silly VI locking that is used to check a single flag. The vnode lock also protects this flag so it is not necessary. - Don't rely on v_mount to detect whether or not we've been recycled, use the more appropriate VI_DOOMED instead.
Sponsored by: Isilon Systems, Inc. MFC After: 1 week
|
155320 |
04-Feb-2006 |
alc |
Remove an unnecessary call to pmap_remove_all(). The given page is not mapped because its contents are invalid.
|
155230 |
02-Feb-2006 |
tegge |
Adjust old comment (present in rev 1.1) to match changes in rev 1.82.
PR: kern/92509 Submitted by: "Bryan Venteicher" <bryanv@daemoninthecloset.org>
|
155177 |
01-Feb-2006 |
yar |
Use off_t for file size passed to vnode_create_vobject(). The former type, size_t, was causing truncation to 32 bits on i386, which immediately led to undersizing of VM objects backed by files >4GB. In particular, sendfile(2) was broken for such files.
PR: kern/92243 MFC after: 5 days
|
155169 |
01-Feb-2006 |
jeff |
- Install a temporary bandaid in vm_object_reference() that will stop mtx_assert()s from triggering until I find a real long-term solution.
|
155128 |
31-Jan-2006 |
alc |
Change #if defined(DIAGNOSTIC) to KASSERT.
|
155086 |
31-Jan-2006 |
pjd |
Add buffer corruption protection (RedZone) for kernel's malloc(9). It detects both: buffer underflows and buffer overflows bugs at runtime (on free(9) and realloc(9)) and prints backtraces from where memory was allocated and from where it was freed.
Tested by: kris
|
154989 |
29-Jan-2006 |
scottl |
The change a few years ago of having contigmalloc start its scan at the top of physical RAM instead of the bottom was a sound idea, but the implementation left a lot to be desired. Scans would spend considerable time looking at pages that are above of the address range given by the caller, and multiple calls (like what happens in busdma) would spend more time on top of that rescanning the same pages over and over.
Solve this, at least for now, with two simple optimizations. The first is to not bother scanning high ordered pages that are outside of the provided address range. Second is to cache the page index from the last successful operation so that subsequent scans don't have to restart from the top. This is conditional on the numpages argument being the same or greater between calls.
MFC After: 2 weeks
|
154934 |
27-Jan-2006 |
jhb |
Add a new macro wrapper WITNESS_CHECK() around the witness_warn() function. The difference between WITNESS_CHECK() and WITNESS_WARN() is that WITNESS_CHECK() should be used in the places that the return value of witness_warn() is checked, whereas WITNESS_WARN() should be used in places where the return value is ignored. Specifically, in a kernel without WITNESS enabled, WITNESS_WARN() evaluates to an empty string where as WITNESS_CHECK evaluates to 0. I also updated the one place that was checking the return value of WITNESS_WARN() to use WITNESS_CHECK.
|
154929 |
27-Jan-2006 |
cognet |
Make sure b_vp and b_bufobj are NULL before calling relpbuf(), as it asserts they are. They should be NULL at this point, except if we're coming from swapdev_strategy(). It should only affect the case where we're swapping directly on a file over NFS.
|
154927 |
27-Jan-2006 |
alc |
Style: Add blank line after local variable declarations.
|
154896 |
27-Jan-2006 |
alc |
Use the new macros abstracting the page coloring/queues implementation. (There are no functional changes.)
|
154889 |
27-Jan-2006 |
alc |
Use the new macros abstracting the page coloring/queues implementation. (There are no functional changes.)
|
154849 |
26-Jan-2006 |
alc |
Plug a leak in the newer contigmalloc() implementation. Specifically, if a multipage allocation was aborted midway, the pages that were already allocated were not always returned to the free list.
Submitted by: tegge
|
154805 |
25-Jan-2006 |
jeff |
- Avoid calling vm_object_backing_scan() when collapsing an object when the resident page count matches the object size. We know it fully backs its parent in this case.
Reviewed by: acl, tegge Sponsored by: Isilon Systems, Inc.
|
154799 |
25-Jan-2006 |
alc |
The previous revision incorrectly changed a switch statement into an if statement. Specifically, a break statement that previously broke out of the enclosing switch was not changed. Consequently, the enclosing loop terminated prematurely.
This could result in "vm_page_insert: page already inserted" panics.
Submitted by: tegge
|
154788 |
24-Jan-2006 |
alc |
With the recent changes to the implementation of page coloring, the the option PQ_NOOPT is used exclusively by vm_pageq.c. Thus, the include of opt_vmpage.h can be removed from vm_page.h.
|
154764 |
24-Jan-2006 |
alc |
In vm_page_set_invalid() invalidate all of the page's mappings as soon as any part of the page's contents is invalidated.
Submitted by: tegge
|
154694 |
22-Jan-2006 |
alc |
Make vm_object_vndeallocate() static. The external calls to it were eliminated in ufs/ffs/ffs_vnops.c's revision 1.125.
|
154076 |
06-Jan-2006 |
jhb |
Reduce the scope of one #ifdef to avoid duplicating a SYSCTL_INT() macro and trim another unneeded #ifdef (it was just around a macro that is already conditionally defined).
|
154035 |
04-Jan-2006 |
netchild |
Convert the PAGE_SIZE check into a CTASSERT.
Suggested by: jhb
|
154031 |
04-Jan-2006 |
netchild |
Prevent divide by zero, use default values in case one of the divisor's is zero.
Tested by: Randy Bush <randy@psg.com>
|
153940 |
31-Dec-2005 |
netchild |
MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible)
MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's)
Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not.
Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)
|
153880 |
30-Dec-2005 |
pjd |
Improve memguard a bit: - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.
|
153555 |
20-Dec-2005 |
tegge |
Don't access fs->first_object after dropping reference to it. The result could be a missed or extra giant unlock.
Reviewed by: alc
|
153485 |
16-Dec-2005 |
alc |
Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors.
Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks
|
153385 |
13-Dec-2005 |
alc |
Assert that the page that is given to vm_page_free_toq() does not have any managed mappings.
|
153311 |
11-Dec-2005 |
alc |
Remove unneeded calls to pmap_remove_all(). The given page is not mapped.
Reviewed by: tegge
|
153095 |
04-Dec-2005 |
alc |
Simplify vmspace_dofree().
|
153068 |
03-Dec-2005 |
alc |
Eliminate unneeded preallocation at initialization.
Reviewed by: tegge
|
153060 |
03-Dec-2005 |
alc |
Eliminate unneeded preallocation at initialization.
Reviewed by: tegge
|
152630 |
20-Nov-2005 |
alc |
Eliminate pmap_init2(). It's no longer used.
|
152224 |
09-Nov-2005 |
alc |
Reimplement the reclamation of PV entries. Specifically, perform reclamation synchronously from get_pv_entry() instead of asynchronously as part of the page daemon. Additionally, limit the reclamation to inactive pages unless allocation from the PV entry zone or reclamation from the inactive queue fails. Previously, reclamation destroyed mappings to both inactive and active pages. get_pv_entry() still, however, wakes up the page daemon when reclamation occurs. The reason being that the page daemon may move some pages from the active queue to the inactive queue, making some new pages available to future reclamations.
Print the "reclaiming PV entries" message at most once per minute, but don't stop printing it after the fifth time. This way, we do not give the impression that the problem has gone away.
Reviewed by: tegge
|
152178 |
08-Nov-2005 |
alc |
If a physical page is mapped by two or more virtual addresses, transmitted by the zero-copy sockets method, and written to before the transmission completes, we need to destroy all of the existing mappings to the page, not just the one that we fault on. Otherwise, the mappings will no longer be to the same page and changes made through one of the mappings will not be visible through the others.
Observed by: tegge
|
151951 |
01-Nov-2005 |
ps |
Rate limit vnode_pager_putpages printfs to once a second.
|
151918 |
01-Nov-2005 |
alc |
Consider the zero-copy transmission of a page that was wired by mlock(2). If a copy-on-write fault occurs on the page, the new copy should inherit a part of the original page's wire count.
Submitted by: tegge MFC after: 1 week
|
151897 |
31-Oct-2005 |
rwatson |
Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
|
151558 |
22-Oct-2005 |
alc |
Use of the ZERO_COPY_SOCKETS options can result in an unusual state that vm_object_backing_scan() was not written to handle. Specifically, a wired page within a backing object that is shadowed by a page within the shadow object. Handle this state by removing the wired page from the backing object. The wired page will be freed by socow_iodone().
Stop masking errors: If a page is being freed by vm_object_backing_scan(), assert that it is no longer mapped rather than quietly destroying any mappings.
Tested by: Harald Schmalzbauer
|
151526 |
20-Oct-2005 |
rwatson |
Change format string for u_int64_t to %ju from %llu, in order to use the correct format string on 64-bit systems.
Pointed out by: pjd
|
151516 |
20-Oct-2005 |
rwatson |
Add a "show uma" command to DDB, which prints out the current stats for available UMA zones. Quite useful for post-mortem debugging of memory leaks without a dump device configured on a panicked box.
MFC after: 2 weeks
|
151252 |
12-Oct-2005 |
dds |
Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap().
Reviewed by: bde MFC after: 2 weeks
|
151104 |
08-Oct-2005 |
des |
As alc pointed out to me, vm_page.c 1.305 was incomplete: uma_startup() still uses the constant UMA_BOOT_PAGES. Change it to accept boot_pages as an additional argument.
MFC after: 2 weeks
|
150926 |
04-Oct-2005 |
dds |
Update the vnode's access time after an mmap operation on it. Before this change a copy operation with cp(1) would not update the file access times.
According to the POSIX mmap(2) documentation: the st_atime field of the mapped file may be marked for update at any time between the mmap() call and the corresponding munmap() call. The initial read or write reference to a mapped region shall cause the file's st_atime field to be marked for update if it has not already been marked for update.
|
150727 |
29-Sep-2005 |
jhb |
Trim a couple of unneeded includes.
|
150418 |
21-Sep-2005 |
cognet |
Make sure we have a bufobj before calling bstrategy(). I'm not sure this is the right thing to do, but at least I don't panic anymore when swapping on a NFS file without using md(4).
X-MFC after: proper review
|
150397 |
20-Sep-2005 |
peter |
Remove unused (but initialized) variable 'objsize' from vm_mmap()
|
149900 |
09-Sep-2005 |
alc |
Introduce a new lock for the purpose of synchronizing access to the UMA boot pages.
Disable recursion on the general UMA lock now that startup_alloc() no longer uses it.
Eliminate the variable uma_boot_free. It serves no purpose.
Note: This change eliminates a lock-order reversal between a system map mutex and the UMA lock. See http://sources.zabbadoz.net/freebsd/lor.html#109 for details.
MFC after: 3 days
|
149839 |
07-Sep-2005 |
alc |
Eliminate an incorrect cast.
|
149768 |
03-Sep-2005 |
alc |
Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine whether the mapping should permit execute access.
|
149035 |
13-Aug-2005 |
kan |
Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable. vm_pager_init() is run before required nswbuf variable has been set to correct value. This caused system to run with single pbuf available for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt variable in the same way.
Reported by: ade Obtained from: alc MFC after: 2 days
|
148997 |
12-Aug-2005 |
tegge |
Check for marker pages when scanning active and inactive page queues.
Reviewed by: alc
|
148985 |
12-Aug-2005 |
des |
Introduce the vm.boot_pages tunable and sysctl, which controls the number of pages reserved to bootstrap the kernel memory allocator.
MFC after: 2 weeks
|
148909 |
10-Aug-2005 |
tegge |
Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE due to the vm object being locked.
When a process writes large amounts of data to a file, the vm object associated with that file can contain most of the physical pages on the machine. If the process is preempted while holding the lock on the vm object, pagedaemon would be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to other vm objects.
Temporarily unlock the page queues lock while locking vm objects to avoid lock order violation. Detect and handle relevant page queue changes.
This change depends on both the lock portion of struct vm_object and normal struct vm_page being type stable.
Reviewed by: alc
|
148875 |
08-Aug-2005 |
ssouhlal |
Use atomic operations on runningbufspace.
PR: kern/84318 Submitted by: ade MFC after: 3 days
|
148691 |
04-Aug-2005 |
rwatson |
Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined, as opt_vmpage.h will not be available to user space library builds. A similar existing check is present for KLD_MODULE for similar reasons.
MFC after: 3 days
|
148690 |
04-Aug-2005 |
rwatson |
Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can be used from memstat_uma.c for the purposes of kvm access without lots of additional unsafe includes.
MFC after: 3 days
|
148371 |
25-Jul-2005 |
rwatson |
Rename UMA_MAX_NAME to UTH_MAX_NAME, since it's a maximum in the monitoring API, which might or might not be the same as the internal maximum (currently none).
Export flag information on UMA zones -- in particular, whether or not this is a secondary zone, and so the keg free count should be considered in that light.
MFC after: 1 day
|
148200 |
20-Jul-2005 |
alc |
Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio().
Note: I don't believe that the other, more widely-used b_iodone callbacks are affected.
Discussed with: jeff Reviewed by: phk MFC after: 2 weeks
|
148194 |
20-Jul-2005 |
rwatson |
Further UMA statistics related changes:
- Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to to update the zone's uz_frees statistic. Previously, the statistic was updated unconditionally.
- Use the flag in situations where a "real" free occurs: i.e., one where the caller is freeing an allocated item, to be differentiated from situations where uma_zfree_internal() is used to tear down the item during slab teardown in order to invoke its fini() method. Also use the flag when UMA is freeing its internal objects.
- When exchanging a bucket with the zone from the per-CPU cache when freeing an item, flush cache statistics back to the zone (since the zone lock and critical section are both held) to match the allocation case.
MFC after: 3 days
|
148193 |
20-Jul-2005 |
alc |
Eliminate an incorrect (and unnecessary) cast.
|
148079 |
16-Jul-2005 |
rwatson |
Use mp_maxid in preference to MAXCPU when creating exports of UMA per-CPU cache statistics. UMA sizes the cache array based on the number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU could read off the end of the array (into the next zone).
Reported by: yongari MFC after: 1 week
|
148078 |
16-Jul-2005 |
rwatson |
Improve canonicalization of copyrights. Order copyrights by order of assertion (jeff, bmilekic, rwatson).
Suggested ages ago by: bde MFC after: 1 week
|
148077 |
16-Jul-2005 |
rwatson |
Move the unlocking of the zone mutex in sysctl_vm_zone_stats() so that it covers the following of the uc_alloc/freebucket cache pointers. Originally, I felt that the race wasn't helped by holding the mutex, hence a comment in the code and not holding it across the cache access. However, it does improve consistency, as while it doesn't prevent bucket exchange, it does prevent bucket pointer invalidation. So a race in gathering cache free space statistics still can occur, but not one that follows an invalid bucket pointer, if the mutex is held.
Submitted by: yongari MFC after: 1 week
|
148072 |
16-Jul-2005 |
silby |
Increase the flags field for kegs from a 16 to a 32 bit value; we have exhausted all 16 flags.
|
148070 |
15-Jul-2005 |
rwatson |
Track UMA(9) allocation failures by zone, and export via sysctl.
Requested by: victor cruceru <victor dot cruceru at gmail dot com> MFC after: 1 week
|
148014 |
14-Jul-2005 |
jhb |
Convert a remaining !fs.map->system_map to fs.first_object->flags & OBJ_NEEDGIANT test that was missed in an earlier revision. This fixes mutex assertion failures in the debug.mpsafevm=0 case.
Reported by: ps MFC after: 3 days
|
147996 |
14-Jul-2005 |
rwatson |
Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator statistics via a binary structure stream:
- Add structure 'uma_stream_header', which defines a stream version, definition of MAXCPUs used in the stream, and the number of zone records in the stream.
- Add structure 'uma_type_header', which defines the name, alignment, size, resource allocation limits, current pages allocated, preferred bucket size, and central zone + keg statistics.
- Add structure 'uma_percpu_stat', which, for each per-CPU cache, includes the number of allocations and frees, as well as the number of free items in the cache.
- When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUs uma_percpu_stat structures holding per-CPU allocation information. Typical values of MAXCPU will be 1 (UP compiled kernel) and 16 (SMP compiled kernel).
This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs.
While here, also export the number of UMA zones as a sysctl vm.uma_count, in order to assist in sizing user swpace buffers to receive the stream.
A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. This change directly supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats rather than separately maintained mbuf allocator statistics.
MFC after: 1 week
|
147995 |
14-Jul-2005 |
rwatson |
In addition to tracking allocs in the zone, also track frees. Add a zone free counter, as well as a cache free counter.
MFC after: 1 week
|
147994 |
14-Jul-2005 |
rwatson |
In an earlier world order, UMA would flush per-CPU statistics to the zone whenever it was moving buckets between the zone and the cache, or when coalescing statistics across the CPU. Remove flushing of statistics to the zone when coalescing statistics as part of sysctl, as we won't be running on the right CPU to write to the cache statistics.
Add a missed gathering of statistics: when uma_zalloc_internal() does a special case allocation of a single item, make sure to update the zone statistics to represent this. Previously this case wasn't accounted for in user-visible statistics.
MFC after: 1 week
|
147615 |
26-Jun-2005 |
silby |
Change the panic in trash_ctor into just a printf for now. Once the reports of panics in trash_ctor relating to mbufs have been examined and a fix found, this will be turned back into a panic.
Approved by: re (rwatson)
|
147422 |
16-Jun-2005 |
alc |
Increase UMA_BOOT_PAGES to prevent a crash during initialization. See http://docs.FreeBSD.org/cgi/mid.cgi?42AD8270.8060906 for a detailed description of the crash.
Reported by: Eric Anderson Approved by: re (scottl) MFC after: 3 days
|
147283 |
11-Jun-2005 |
green |
The new contigmalloc(9) has a bad degenerate case where there were many regions checked again and again despite knowing the pages contained were not usable and only satisfied the alignment constraints This case was compounded, especially for large allocations, by the practice of looping from the top of memory so as to keep out of the important low-memory regions. While the old contigmalloc(9) has the same problem, it is not as noticeable due to looping from the low memory to high.
This degenerate case is fixed, as well as reversing the sense of the rest of the loops within it, to provide a tremendous speed increase. This makes the best case O(n * VM overhead) much more likely than the worst case O(4 * VM overhead). For comparison, the worst case for old contigmalloc would be O(5 * VM overhead) in addition to its strategy of turning used memory into free being highly pessimal.
Also, fix a bug that in practice most likely couldn't have been triggered, int the new contigmalloc(9): it walked backwards from the end of memory without accounting for how many pages it needed. Potentially, nonexistant pages could have been mapped. This hasn't occurred because the kernel generally requests as its first contigmalloc(9) a single page.
Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes MFC After: 1 month More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes
|
147262 |
10-Jun-2005 |
alc |
Add a comment to the effect that fictitious pages do not require the initialization of their machine-dependent fields.
|
147217 |
10-Jun-2005 |
alc |
Introduce a procedure, pmap_page_init(), that initializes the vm_page's machine-dependent fields. Use this function in vm_pageq_add_new_page() so that the vm_page's machine-dependent and machine-independent fields are initialized at the same time.
Remove code from pmap_init() for initializing the vm_page's machine-dependent fields.
Remove stale comments from pmap_init().
Eliminate the Boolean variable pmap_initialized from the alpha, amd64, i386, and ia64 pmap implementations. Its use is no longer required because of the above changes and earlier changes that result in physical memory that is being mapped at initialization time being mapped without pv entries.
Tested by: cognet, kensmith, marcel
|
146727 |
28-May-2005 |
alc |
Update some comments to reflect the change from spl-based to lock-based synchronization.
|
146554 |
23-May-2005 |
ups |
Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.
|
146501 |
22-May-2005 |
alc |
Swap in can occur safely without Giant. Release Giant on entry to scheduler().
|
146484 |
22-May-2005 |
alc |
Remove GIANT_REQUIRED from swapout_procs().
|
146459 |
20-May-2005 |
alc |
Reduce the number of times that we acquire and release locks in swap_pager_getpages().
MFC after: 1 week
|
146367 |
19-May-2005 |
alc |
Remove calls to spl*().
|
146363 |
19-May-2005 |
alc |
Remove a stale comment concerning spl* usage.
|
146355 |
18-May-2005 |
alc |
Update some comments to reflect the change from spl-based to lock-based synchronization.
|
146351 |
18-May-2005 |
alc |
Remove calls to spl*().
|
146350 |
18-May-2005 |
alc |
Revert revision 1.270: swp_pager_async_iodone() need not perform VM_LOCK_GIANT().
Discussed with: jeff
|
146340 |
18-May-2005 |
bz |
Correct 32 vs 64 bit signedness issues.
Approved by: pjd (mentor) MFC after: 2 weeks
|
146126 |
12-May-2005 |
grehan |
The final test in unlock_and_deallocate() to determine if GIANT needs to be unlocked wasn't updated to check for OBJ_NEEDGIANT. This caused a WITNESS panic when debug_mpsafevm was set to 0.
Approved by: jeffr
|
146017 |
08-May-2005 |
marcel |
Enable debug_mpsafevm on ia64 due to the severe functional regression caused by recent locking changes when it's off. Revert the logic to trim down the conditional.
Clued-in by: alc@
|
145888 |
04-May-2005 |
jeff |
- We need to inhert the OBJ_NEEDGIANT flag from the original object in vm_object_split().
Spotted by: alc
|
145826 |
03-May-2005 |
jeff |
- Add a new object flag "OBJ_NEEDSGIANT". We set this flag if the underlying vnode requires Giant. - In vm_fault only acquire Giant if the underlying object has NEEDSGIANT set. - In vm_object_shadow inherit the NEEDSGIANT flag from the backing object.
|
145788 |
02-May-2005 |
alc |
Remove GIANT_REQUIRED from vmspace_exec().
Prodded by: jeff
|
145699 |
30-Apr-2005 |
jeff |
- VM_LOCK_GIANT in the swap pager's iodone routine as VFS will soon call it without Giant.
Sponsored by: Isilon Systems, Inc.
|
145686 |
29-Apr-2005 |
rwatson |
Modify UMA to use critical sections to protect per-CPU caches, rather than mutexes, which offers lower overhead on both UP and SMP. When allocating from or freeing to the per-cpu cache, without INVARIANTS enabled, we now no longer perform any mutex operations, which offers a 1%-3% performance improvement in a variety of micro-benchmarks. We rely on critical sections to prevent (a) preemption resulting in reentrant access to UMA on a single CPU, and (b) migration of the thread during access. In the event we need to go back to the zone for a new bucket, we release the critical section to acquire the global zone mutex, and must re-acquire the critical section and re-evaluate which cache we are accessing in case migration has occured, or circumstances have changed in the current cache.
Per-CPU cache statistics are now gathered lock-free by the sysctl, which can result in small races in statistics reporting for caches.
Reviewed by: bmilekic, jeff (somewhat) Tested by: rwatson, kris, gnn, scottl, mike at sentex dot net, others
|
145584 |
27-Apr-2005 |
jeff |
- Pass the ISOPEN flag to namei so filesystems will know we're about to open them or otherwise access the data.
|
145530 |
25-Apr-2005 |
kris |
Add the vm.exec_map_entries tunable and read-only sysctl, which controls the number of entries in exec_map (maximum number of simultaneous execs that can be handled by the kernel). The default value of 16 is insufficient on heavily loaded machines (particularly SMP machines), and if it is exceeded then executing further processes will generate a SIGABRT.
This is a workaround until a better solution can be implemented.
Reviewed by: alc MFC after: 3 days
|
145144 |
16-Apr-2005 |
des |
Unbreak the build on 64-bit architectures.
|
145127 |
15-Apr-2005 |
jhb |
Add a vm.blacklist tunable which can hold a space or comma seperated list of physical addresses. The pages containing these physical addresses will not be added to the free list and thus will effectively be ignored by the VM system. This is mostly useful for the case when one knows of specific physical addresses that have bit errors (such as from a memtest run) so that one can blacklist the bad pages while waiting for the new sticks of RAM to arrive. The physical addresses of any ignored pages are listed in the message buffer as well.
|
145076 |
14-Apr-2005 |
csjp |
Move MAC check_vnode_mmap entry point out from being exclusive to MAP_SHARED so that the entry point gets executed un-conditionally. This may be useful for security policies which want to perform access control checks around run-time linking.
-add the mmap(2) flags argument to the check_vnode_mmap entry point so that we can make access control decisions based on the type of mapped object. -update any dependent API around this parameter addition such as function prototype modifications, entry point parameter additions and the inclusion of sys/mman.h header file. -Change the MLS, BIBA and LOMAC security policies so that subject domination routines are not executed unless the type of mapping is shared. This is done to maintain compatibility between the old vm_mmap_vnode(9) and these policies.
Reviewed by: rwatson MFC after: 1 month
|
144970 |
12-Apr-2005 |
jhb |
Tidy vcnt() by moving a duplicated line above #ifdef and removing a useless variable.
|
144635 |
04-Apr-2005 |
jhb |
Flip the switch and turn mpsafevm on by default for sparc64.
Approved by: alc
|
144610 |
03-Apr-2005 |
jeff |
- Don't NULL the vnode's v_object pointer until after the object is torn down. If we have dirty pages, the putpages routine will need to know what the vnode's object is so that it may write out dirty pages.
Pointy hat: phk Found by: obrien
|
144501 |
01-Apr-2005 |
jhb |
- Change the vm_mmap() function to accept an objtype_t parameter specifying the type of object represented by the handle argument. - Allow vm_mmap() to map device memory via cdev objects in addition to vnodes and anonymous memory. Note that mmaping a cdev directly does not currently perform any MAC checks like mapping a vnode does. - Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the cdev the ioctl is acting on rather than trying to find a suitable vnode to map from.
Reviewed by: alc, arch@
|
144367 |
31-Mar-2005 |
jeff |
- LK_NOPAUSE is a nop now.
Sponsored by: Isilon Systems, Inc.
|
144322 |
30-Mar-2005 |
alc |
Eliminate (now) unnecessary acquisition and release of the global page queues lock in vm_object_backing_scan(). Updates to the page's PG_BUSY flag and busy field are synchronized by the containing object's lock.
Testing the page's hold_count and wire_count in vm_object_backing_scan()'s OBSC_COLLAPSE_NOWAIT case is unnecessary. There is no reason why the held or wired pages cannot be migrated to the shadow object.
Reviewed by: tegge
|
143821 |
18-Mar-2005 |
das |
Move the swap_zone == NULL check earlier (i.e. before we dereference the pointer.)
Found by: Coverity Prevent analysis tool
|
143745 |
17-Mar-2005 |
jeff |
- Don't lock the vnode interlock in vm_object_set_writeable_dirty() if we've already set the object flags.
Reviewed by: alc
|
143646 |
15-Mar-2005 |
jeff |
- In vm_page_insert() hold the backing vnode when the first page is inserted. - In vm_page_remove() drop the backing vnode when the last page is removed. - Don't check the vnode to see if it must be reclaimed on every call to vm_page_free_toq() as we only check it now when it is actually required. This saves us two lock operations per call.
Sponsored by: Isilon Systems, Inc.
|
143559 |
14-Mar-2005 |
jeff |
- Don't directly adjust v_usecount, use vref() instead.
Sponsored by: Isilon Systems, Inc.
|
143554 |
14-Mar-2005 |
jeff |
- Retire OLOCK and OWANT. All callers hold the vnode lock when creating a vnode object. There has been an assert to prove this for some time.
Sponsored by: Isilon Systems, Inc.
|
143505 |
13-Mar-2005 |
jeff |
- Don't acquire the vnode lock in destroy_vobject, assert that it has already been acquired by the caller.
Sponsored by: Isilon Systems, Inc.
|
142367 |
24-Feb-2005 |
alc |
Revert the first part of revision 1.114 and modify the second part. On architectures implementing uma_small_alloc() pages do not necessarily belong to the kmem object.
|
142079 |
19-Feb-2005 |
phk |
Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@).
Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.
|
141991 |
16-Feb-2005 |
bmilekic |
Well, it seems that I pre-maturely removed the "All rights reserved" statement from some files, so re-add it for the moment, until the related legalese is sorted out. This change affects:
sys/kern/kern_mbuf.c sys/vm/memguard.c sys/vm/memguard.h sys/vm/uma.h sys/vm/uma_core.c sys/vm/uma_dbg.c sys/vm/uma_dbg.h sys/vm/uma_int.h
|
141983 |
16-Feb-2005 |
bmilekic |
Make UMA set the overloaded page->object back to kmem_object for UMA_ZONE_REFCNT and UMA_ZONE_MALLOC zones, as the page(s) undoubtedly came from kmem_map for those two. Previously it would set it back to NULL for UMA_ZONE_REFCNT zones and although this was probably not fatal, it added MORE code for no reason.
|
141955 |
15-Feb-2005 |
bmilekic |
Rather than overloading the page->object field like UMA does, use instead an unused pageq queue reference in the page structure to stash a pointer to the MemGuard FIFO. Using the page->object field caused problems because when vm_map_protect() was called the second time to set VM_PROT_DEFAULT back onto a set of pages in memguard_map, the protection in the VM would be changed but the PMAP code would lazily not restore the PG_RW bit on the underlying pages right away (see pmap_protect()). So when a page fault finally occured and the VM noticed the faulting address corresponds to a page that _does_ have write access now, it would then call into PMAP to set back PG_RW (i386 case being discussed here). However, before it got to do that, an assertion on the object lock not being owned would get triggered, as the object of the faulting page would need to be locked but was overloaded by MemGuard. This is precisely why MemGuard cannot overload page->object.
Submitted by: Alan Cox (alc@)
|
141696 |
11-Feb-2005 |
phk |
sysctl node vm.stats can not be static (for ia64 reasons).
|
141670 |
10-Feb-2005 |
bmilekic |
Implement support for buffers larger than PAGE_SIZE in MemGuard. Adds a little bit of complexity but performance requirements lacking (this is a debugging allocator after all), it's really not too bad (still only 317 lines).
Also add an additional check to help catch really weird 3-threads-involved races: make memguard_free() write to the first page handed back, always, before it does anything else.
Note that there is still a problem in VM+PMAP (specifically with vm_map_protect) w.r.t. MemGuard uses it, but this will be fixed shortly and this change stands on its own.
|
141630 |
10-Feb-2005 |
phk |
Make three SYSCTL_NODEs static
|
141629 |
10-Feb-2005 |
phk |
Make npages static and const.
|
141247 |
04-Feb-2005 |
ssouhlal |
Set the scheduling class of the zeroidle thread to PRI_IDLE.
Reviewed by: jhb Approved by: grehan (mentor) MFC after: 1 week
|
141068 |
30-Jan-2005 |
alc |
Update the text of an assertion to reflect changes made in revision 1.148. Submitted by: tegge
Eliminate an unnecessary, temporary increment of the backing object's reference count in vm_object_qcollapse(). Reviewed by: tegge
|
140929 |
28-Jan-2005 |
phk |
Move the contents of vop_stddestroyvobject() to the new vnode_pager function vnode_destroy_vobject().
Make the new function zero the vp->v_object pointer so we can tell if a call is missing.
|
140782 |
25-Jan-2005 |
phk |
Don't use VOP_GETVOBJECT, use vp->v_object directly.
|
140767 |
24-Jan-2005 |
phk |
Move the body of vop_stdcreatevobject() over to the vnode_pager under the name Sande^H^H^H^H^Hvnode_create_vobject().
Make the new function take a size argument which removes the need for a VOP_STAT() or a very pessimistic guess for disks.
Call that new function from vop_stdcreatevobject().
Make vnode_pager_alloc() private now that its only user came home.
|
140734 |
24-Jan-2005 |
phk |
Kill the VV_OBJBUF and test the v_object for NULL instead.
|
140723 |
24-Jan-2005 |
jeff |
- Remove GIANT_REQUIRED where giant is no longer required. - Use VFS_LOCK_GIANT() rather than directly acquiring giant in places where giant is only held because vfs requires it.
Sponsored By: Isilon Systems, Inc.
|
140622 |
22-Jan-2005 |
alc |
Guard against address wrap in kernacc(). Otherwise, a program accessing a bad address range through /dev/kmem can panic the machine.
Submitted by: Mark W. Krentel Reported by: Kris Kennaway MFC after: 1 week
|
140605 |
22-Jan-2005 |
bmilekic |
s/round_page/trunc_page/g
I meant trunc_page. It's only a coincidence this hasn't caused problems yet.
Pointed out by: Antoine Brodin <antoine.brodin@laposte.net>
|
140587 |
21-Jan-2005 |
bmilekic |
Bring in MemGuard, a very simple and small replacement allocator designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent.
Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must:
1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10.
ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary.
This work is partly based on some old patches from Ian Dowse.
|
140439 |
18-Jan-2005 |
alc |
Add checks to vm_map_findspace() to test for address wrap. The conditions where this could occur are very rare, but possible.
Submitted by: Mark W. Krentel MFC after: 2 weeks
|
140319 |
15-Jan-2005 |
alc |
Consider three objects, O, BO, and BBO, where BO is O's backing object and BBO is BO's backing object. Now, suppose that O and BO are being collapsed. Furthermore, suppose that BO has been marked dead (OBJ_DEAD) by vm_object_backing_scan() and that either vm_object_backing_scan() has been forced to sleep due to encountering a busy page or vm_object_collapse() has been forced to sleep due to memory allocation in the swap pager. If vm_object_deallocate() is then called on BBO and BO is BBO's only shadow object, vm_object_deallocate() will collapse BO and BBO. In doing so, it adds a necessary temporary reference to BO. If this collapse also sleeps and the prior collapse resumes first, the temporary reference will cause vm_object_collapse to panic with the message "backing_object %p was somehow re-referenced during collapse!"
Resolve this race by changing vm_object_deallocate() such that it doesn't collapse BO and BBO if BO is marked dead. Once O and BO are collapsed, vm_object_collapse() will attempt to collapse O and BBO. So, vm_object_deallocate() on BBO need do nothing.
Reported by: Peter Holm on 20050107 URL: http://www.holm.cc/stress/log/cons102.html
In collaboration with: tegge@ Candidate for RELENG_4 and RELENG_5 MFC after: 2 weeks
|
140220 |
14-Jan-2005 |
phk |
Eliminate unused and unnecessary "cred" argument from vinvalbuf()
|
140048 |
11-Jan-2005 |
phk |
Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well.
If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data.
Discussed with: rwatson
|
140031 |
11-Jan-2005 |
bmilekic |
While we want the recursion protection for the bucket zones so that recursion from the VM is handled (and the calling code that allocates buckets knows how to deal with it), we do not want to prevent allocation from the slab header zones (slabzone and slabrefzone) if uk_recurse is not zero for them. The reason is that it could lead to NULL being returned for the slab header allocations even in the M_WAITOK case, and the caller can't handle that (this is also explained in a comment with this commit).
The problem analysis is documented in our mailing lists: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current
(see entire thread for proper context).
Crash dump data provided by: Peter Holm <peter@holm.cc>
|
139996 |
10-Jan-2005 |
stefanf |
ISO C requires at least one element in an initialiser list.
|
139921 |
08-Jan-2005 |
alc |
Move the acquisition and release of the page queues lock outside of a loop in vm_object_split() to avoid repeated acquisition and release.
|
139835 |
07-Jan-2005 |
alc |
Transfer responsibility for freeing the page taken from the cache queue and (possibly) unlocking the containing object from vm_page_alloc() to vm_page_select_cache(). Recent optimizations to vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and pmap_enter_quick() have resulted in panic()s because vm_page_alloc() mistakenly unlocked objects that had not been locked by vm_page_select_cache().
Reported by: Peter Holm and Kris Kennaway
|
139825 |
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
139779 |
06-Jan-2005 |
alc |
Revise the part of vm_pageout_scan() that moves pages from the cache queue to the free queue. With this change, if a page from the cache queue belongs to a locked object, it is simply skipped over rather than moved to the inactive queue.
|
139629 |
03-Jan-2005 |
phk |
When allocating bio's in the swap_pager use M_WAITOK since the alternative is much worse.
|
139495 |
31-Dec-2004 |
alc |
Assert that page allocations during an interrupt specify VM_ALLOC_INTERRUPT.
Assert that pages removed from the cache queue are not busy.
|
139391 |
29-Dec-2004 |
alc |
Access to the page's busy field is (now) synchronized by the containing object's lock. Therefore, the assertion that the page queues lock is held can be removed from vm_page_io_start().
|
139338 |
27-Dec-2004 |
alc |
Note that access to the page's busy count is synchronized by the containing object's lock.
|
139332 |
26-Dec-2004 |
alc |
Assert that the vm object is locked on entry to vm_page_sleep_if_busy(); remove some unneeded code.
|
139318 |
26-Dec-2004 |
bmilekic |
Add my copyright and update Jeff's copyright on UMA source files, as per his request.
Discussed with: Jeffrey Roberson
|
139296 |
25-Dec-2004 |
phk |
fix comment
|
139265 |
24-Dec-2004 |
alc |
Continue the transition from synchronizing access to the page's PG_BUSY flag and busy field with the global page queues lock to synchronizing their access with the containing object's lock. Specifically, acquire the containing object's lock before reading the page's PG_BUSY flag and busy field in vm_fault().
Reviewed by: tegge@
|
139241 |
23-Dec-2004 |
alc |
Modify pmap_enter_quick() so that it expects the page queues to be locked on entry and it assumes the responsibility for releasing the page queues lock if it must sleep.
Remove a bogus comment from pmap_enter_quick().
Using the first change, modify vm_map_pmap_enter() so that the page queues lock is acquired and released once, rather than each time that a page is mapped.
|
138986 |
17-Dec-2004 |
alc |
Eliminate another unnecessary call to vm_page_busy(). (See revision 1.333 for a detailed explanation.)
|
138981 |
17-Dec-2004 |
alc |
Enable debug.mpsafevm by default on alpha.
|
138897 |
15-Dec-2004 |
alc |
In the common case, pmap_enter_quick() completes without sleeping. In such cases, the busying of the page and the unlocking of the containing object by vm_map_pmap_enter() and vm_fault_prefault() is unnecessary overhead. To eliminate this overhead, this change modifies pmap_enter_quick() so that it expects the object to be locked on entry and it assumes the responsibility for busying the page and unlocking the object if it must sleep. Note: alpha, amd64, i386 and ia64 are the only implementations optimized by this change; arm, powerpc, and sparc64 still conservatively busy the page and unlock the object within every pmap_enter_quick() call.
Additionally, this change is the first case where we synchronize access to the page's PG_BUSY flag and busy field using the containing object's lock rather than the global page queues lock. (Modifications to the page's PG_BUSY flag and busy field have asserted both locks for several weeks, enabling an incremental transition.)
|
138538 |
08-Dec-2004 |
alc |
With the removal of kern/uipc_jumbo.c and sys/jumbo.h, vm_object_allocate_wait() is not used. Remove it.
|
138531 |
07-Dec-2004 |
alc |
Almost nine years ago, when support for 1TB files was introduced in revision 1.55, the address parameter to vnode_pager_addr() was changed from an unsigned 32-bit quantity to a signed 64-bit quantity. However, an out-of-range check on the address was not updated. Consequently, memory-mapped I/O on files greater than 2GB could cause a kernel panic. Since the address is now a signed 64-bit quantity, the problem resolution is simply to remove a cast.
Reviewed by: bde@ and tegge@ PR: 73010 MFC after: 1 week
|
138406 |
05-Dec-2004 |
alc |
Correct a sanity check in vnode_pager_generic_putpages(). The cast used to implement the sanity check should have been changed when we converted the implementation of vm_pindex_t from 32 to 64 bits. (Thus, RELENG_4 is not affected.) The consequence of this error would be a legimate write to an extremely large file being treated as an errant attempt to write meta- data.
Discussed with: tegge@
|
138129 |
27-Nov-2004 |
das |
Don't include sys/user.h merely for its side-effect of recursively including other headers.
|
138114 |
26-Nov-2004 |
cognet |
Remove useless casts.
|
138066 |
24-Nov-2004 |
delphij |
Try to close a potential, but serious race in our VM subsystem.
Historically, our contigmalloc1() and contigmalloc2() assumes that a page in PQ_CACHE can be unconditionally reused by busying and freeing it. Unfortunatelly, when object happens to be not NULL, the code will set m->object to NULL and disregard the fact that the page is actually in the VM page bucket, resulting in page bucket hash table corruption and finally, a filesystem corruption, or a 'page not in hash' panic.
This commit has borrowed the idea taken from DragonFlyBSD's fix to the VM fix by Matthew Dillon[1]. This version of patch will do the following checks:
- When scanning pages in PQ_CACHE, check hold_count and skip over pages that are held temporarily. - For pages in PQ_CACHE and selected as candidate of being freed, check if it is busy at that time.
Note: It seems that this is might be unrelated to kern/72539.
Obtained from: DragonFlyBSD, sys/vm/vm_contig.c,v 1.11 and 1.12 [1] Reminded by: Matt Dillon Reworked by: alc MFC After: 1 week
|
137910 |
20-Nov-2004 |
das |
Disable U area swapping and remove the routines that create, destroy, copy, and swap U areas.
Reviewed by: arch@
|
137726 |
15-Nov-2004 |
phk |
Make VOP_BMAP return a struct bufobj for the underlying storage device instead of a vnode for it.
The vnode_pager does not and should not have any interest in what the filesystem uses for backend.
(vfs_cluster doesn't use the backing store argument.)
|
137725 |
15-Nov-2004 |
phk |
Add pbgetbo()/pbrelbo() lighter weight versions of pbgetvp()/pbrelvp().
|
137723 |
15-Nov-2004 |
phk |
More kasserts.
|
137722 |
15-Nov-2004 |
phk |
style polishing.
|
137721 |
15-Nov-2004 |
phk |
Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff.
|
137720 |
15-Nov-2004 |
phk |
expect the caller to have called pbrelvp() if necessary.
|
137719 |
15-Nov-2004 |
phk |
Explicitly call pbrelvp()
|
137457 |
09-Nov-2004 |
phk |
Improve readability with a bunch of typedefs for the pager ops.
These can also be used for prototypes in the pagers.
|
137393 |
08-Nov-2004 |
des |
#include <vm/vm_param.h> instead of <machine/vmparam.h> (the former includes the latter, but also declares variables which are defined in kern/subr_param.c).
Change som VM parameters from quad_t to unsigned long. They refer to quantities (size limits for text, heap and stack segments) which must necessarily be smaller than the size of the address space, so long is adequate on all platforms.
MFC after: 1 week
|
137324 |
06-Nov-2004 |
alc |
Eliminate an unnecessary atomic operation. Articulate the rationale in a comment.
|
137309 |
06-Nov-2004 |
rwatson |
Abstract the logic to look up the uma_bucket_zone given a desired number of entries into bucket_zone_lookup(), which helps make more clear the logic of consumers of bucket zones.
Annotate the behavior of bucket_init() with a comment indicating how the various data structures, including the bucket lookup tables, are initialized.
|
137306 |
06-Nov-2004 |
phk |
Remove dangling variable
|
137305 |
06-Nov-2004 |
rwatson |
Annotate what bucket_size[] array does; staticize since it's used only in uma_core.c.
|
137299 |
06-Nov-2004 |
das |
Fix the last known race in swapoff(), which could lead to a spurious panic:
swapoff: failed to locate %d swap blocks
The race occurred because putpages() can block between the time it allocates swap space and the time it updates the swap metadata to associate that space with a vm_object, so swapoff() would complain about the temporary inconsistency. I hoped to fix this by making swp_pager_getswapspace() and swp_pager_meta_build() a single atomic operation, but that proved to be inconvenient. With this change, swapoff() simply doesn't attempt to be so clever about detecting when all the pageout activity to the target device should have drained.
|
137297 |
06-Nov-2004 |
alc |
Move a call to wakeup() from vm_object_terminate() to vnode_pager_dealloc() because this call is only needed to wake threads that slept when they discovered a dead object connected to a vnode. To eliminate unnecessary calls to wakeup() by vnode_pager_dealloc(), introduce a new flag, OBJ_DISCONNECTWNT.
Reviewed by: tegge@
|
137268 |
05-Nov-2004 |
jhb |
- Set the priority of the page zeroing thread using sched_prio() when the thread is created rather than adjusting the priority in the main function. (kthread_create() should probably take the initial priority as an argument.) - Only yield the CPU in the !PREEMPTION case if there are any other runnable threads. Yielding when there isn't anything else better to do just wastes time in pointless context switches (albeit while the system is idle.)
|
137243 |
05-Nov-2004 |
alc |
During traversal of the inactive queue, try locking the page's containing object before accessing the page's flags or the object's reference count.
|
137242 |
05-Nov-2004 |
alc |
Eliminate another unnecessary call to vm_page_busy() that immediately precedes a call to vm_page_rename(). (See the previous revision for a detailed explanation.)
|
137239 |
05-Nov-2004 |
das |
Close a race in swapoff(). Here are the gory details:
In order to avoid livelock, swapoff() skips over objects with a nonzero pip count and makes another pass if necessary. Since it is impossible to know which objects we care about, it would choose an arbitrary object with a nonzero pip count and wait for it before making another pass, the theory being that this object would finish paging about as quickly as the ones we care about. Unfortunately, we may have slept since we acquired a reference to this object. Hack around this problem by tsleep()ing on the pointer anyway, but timeout after a fixed interval. More elegant solutions are possible, but the ones I considered unnecessarily complicate this rare case.
Also, kill some nits that seem to have crept into the swapoff() code in the last 75 revisions or so:
- Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since the latter can be derived from the former.
- Replace swp_pager_find_dev() with something simpler. There's no need to iterate over the entire list of swap devices just to determine if a given block is assigned to the one we're interested in.
- Expand the scope of the swhash_mtx in a couple of places so that it isn't released and reacquired once for every hash bucket.
- Don't drop the swhash_mtx while holding a reference to an object. We need to lock the object first. Unfortunately, doing so would violate the established lock order, so use VM_OBJECT_TRYLOCK() and try again on a subsequent pass if the object is already locked.
- Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.
|
137197 |
04-Nov-2004 |
phk |
Retire b_magic now, we have the bufobj containing the same hint.
|
137191 |
04-Nov-2004 |
phk |
De-couple our I/O bio request from the embedded bio in buf by explicitly copying the fields.
|
137186 |
04-Nov-2004 |
phk |
Remove buf->b_dev field.
|
137168 |
03-Nov-2004 |
alc |
The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol.
This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.
|
137104 |
31-Oct-2004 |
alc |
Introduce a Boolean variable wakeup_needed to avoid repeated, unnecessary calls to wakeup() by vm_page_zero_idle_wakeup().
|
137091 |
30-Oct-2004 |
alc |
During traversal of the active queue by vm_pageout_page_stats(), try locking the page's containing object before accessing the page's flags.
|
137079 |
30-Oct-2004 |
alc |
Eliminate an unused but initialized variable.
|
137060 |
30-Oct-2004 |
alc |
Add an assignment statement that I omitted from the previous revision.
|
137005 |
28-Oct-2004 |
alc |
Assert that the containing vm object is locked in vm_page_cache() and vm_page_try_to_cache().
|
137001 |
27-Oct-2004 |
bmilekic |
Fix a INVARIANTS-only bug introduced in Revision 1.104:
IF INVARIANTS is defined, and in the rare case that we have allocated some objects from the slab and at least one initializer on at least one of those objects failed, and we need to fail the allocation and push the uninitialized items back into the slab caches -- in that scenario, we would fail to [re]set the bucket cache's ub_bucket item references to NULL, which would eventually trigger a KASSERT.
|
136996 |
27-Oct-2004 |
alc |
During traversal of the active queue, try locking the page's containing object before accessing the page's flags or the object's reference count. If the trylock fails, handle the page as though it is busy.
|
136977 |
26-Oct-2004 |
phk |
Also check that the sectormask is bigger than zero.
Wrap this overly long KASSERT and remove newline.
|
136966 |
26-Oct-2004 |
phk |
Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.
|
136961 |
26-Oct-2004 |
phk |
Don't clear flags we just checked were not set.
|
136952 |
25-Oct-2004 |
alc |
Assert that the containing vm object is locked in vm_page_flash().
|
136931 |
24-Oct-2004 |
alc |
Assert that the containing vm object is locked in vm_page_busy() and vm_page_wakeup().
|
136927 |
24-Oct-2004 |
phk |
Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.
Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance.
Rename ibwrite to bufwrite().
Move the two NFS buf_ops to more sensible places, add bufstrategy to them.
Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}().
Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
|
136924 |
24-Oct-2004 |
alc |
Acquire the vm object lock before rather than after calling vm_page_sleep_if_busy(). (The motivation being to transition synchronization of the vm_page's PG_BUSY flag from the global page queues lock to the per-object lock.)
|
136923 |
24-Oct-2004 |
alc |
Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().
|
136850 |
24-Oct-2004 |
alc |
Introduce VM_ALLOC_NOBUSY, an option to vm_page_alloc() and vm_page_grab() that indicates that the caller does not want a page with its busy flag set. In many places, the global page queues lock is acquired and released just to clear the busy flag on a just allocated page. Both the allocation of the page and the clearing of the busy flag occur while the containing vm object is locked. So, the busy flag might as well never be set.
|
136767 |
22-Oct-2004 |
phk |
Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.
|
136751 |
21-Oct-2004 |
phk |
Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup().
Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible.
Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
|
136655 |
18-Oct-2004 |
alc |
Correct two errors in PG_BUSY management by vm_page_cowfault(). Both errors are in rarely executed paths. 1. Each time the retry_alloc path is taken, the PG_BUSY must be set again. Otherwise vm_page_remove() panics. 2. There is no need to set PG_BUSY on the newly allocated page before freeing it. The page already has PG_BUSY set by vm_page_alloc(). Setting it again could cause an assertion failure.
MFC after: 2 weeks
|
136627 |
17-Oct-2004 |
alc |
Assert that the containing object is locked in vm_page_io_start() and vm_page_io_finish(). The motivation being to transition synchronization of the vm_page's busy field from the global page queues lock to the per-object lock.
|
136621 |
17-Oct-2004 |
alc |
Remove unnecessary check for curthread == NULL.
|
136404 |
11-Oct-2004 |
peter |
Put on my peril sensitive sunglasses and add a flags field to the internal sysctl routines and state. Add some code to use it for signalling the need to downconvert a data structure to 32 bits on a 64 bit OS when requested by a 32 bit app.
I tried to do this in a generic abi wrapper that intercepted the sysctl oid's, or looked up the format string etc, but it was a real can of worms that turned into a fragile mess before I even got it partially working.
With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have it not abort. Things like netstat, ps, etc have a long way to go.
This also fixes a bug in the kern.ps_strings and kern.usrstack hacks. These do matter very much because they are used by libc_r and other things.
|
136334 |
09-Oct-2004 |
green |
In the previous revision, I did not intend to change the default value of "nosleepwithlocks."
Submitted by: ru
|
136276 |
08-Oct-2004 |
green |
Fix critical stability problems that can cause UMA mbuf cluster state management corruption, mbuf leaks, general mbuf corruption, and at least on i386 a first level splash damage radius that encompasses up to about half a megabyte of the memory after an mbuf cluster's allocation slab. In short, this has caused instability nightmares anywhere the right kind of network traffic is present.
When the polymorphic refcount slabs were added to UMA, the new types were not used pervasively. In particular, the slab management structure was turned into one for refcounts, and one for non-refcounts (supposed to be mostly like the old slab management structure), but the latter was almost always used through out. In general, every access to zones with UMA_ZONE_REFCNT turned on corrupted the "next free" slab offset offset and the refcount with each other and with other allocations (on i386, 2 mbuf clusters per 4096 byte slab).
Fix things so that the right type is used to access refcounted zones where it was not before. There are additional errors in gross overestimation of padding, it seems, that would cause a large kegs (nee zones) to be allocated when small ones would do. Unless I have analyzed this incorrectly, it is not directly harmful.
|
135746 |
24-Sep-2004 |
das |
Don't look for swap blocks in objects that aren't swap-backed. I expect that this will fix the following panic, reported by Jun: swap_pager_isswapped: failed to locate all swap meta blocks
MT5 candidate
|
135727 |
24-Sep-2004 |
phk |
XXX mark two places where we do not hold a threadcount on the dev when frobbing the cdevsw.
In both cases we examine only the cdevsw and it is a good question if we weren't better off copying those properties into the cdev in the first place. This question will be revisited.
|
135707 |
24-Sep-2004 |
phk |
Use dev_re[fl]thread() to maintain a ref on the device driver while we call the ->d_mmap function.
|
135470 |
19-Sep-2004 |
das |
The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.
|
135262 |
15-Sep-2004 |
phk |
Add new a function isa_dma_init() which returns an errno when it fails and which takes a M_WAITOK/M_NOWAIT flag argument.
Add compatibility isa_dmainit() macro which whines loudly if isa_dma_init() fails.
Problem uncovered by: tegge
|
135088 |
11-Sep-2004 |
alc |
System maps are prohibited from mapping vnode-backed objects. Take advantage of this restriction to avoid acquiring and releasing Giant when wiring pages within a system map.
In collaboration with: tegge@
|
134892 |
07-Sep-2004 |
phk |
add KASSERTS
|
134747 |
04-Sep-2004 |
alc |
Enable debug.mpsafevm by default on amd64 and i386. This enables copy-on- write and zero-fill faults to run without holding Giant. It is still possible to disable Giant-free operation by setting debug.mpsafevm to 0 in loader.conf.
|
134675 |
03-Sep-2004 |
alc |
Push Giant deep into vm_forkproc(), acquiring it only if the process has mapped System V shared memory segments (see shmfork_myhook()) or requires the allocation of an ldt (see vm_fault_wire()).
|
134649 |
02-Sep-2004 |
scottl |
Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.
|
134615 |
01-Sep-2004 |
alc |
Remove dead code.
|
134612 |
01-Sep-2004 |
alc |
In vm_fault_unwire() eliminate the acquisition and release of Giant in the case of non-kernel pmaps.
|
134586 |
01-Sep-2004 |
julian |
Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them.
MFC after: 2 days
|
134496 |
29-Aug-2004 |
alc |
Move the acquisition and release of the lock on the object at the head of the shadow chain outside of the loop in vm_object_madvise(), reducing the number of times that this lock is acquired and released.
|
134461 |
29-Aug-2004 |
iedowse |
Prevent vm_page_zero_idle_wakeup() from attempting to wake up the page zeroing thread before it has been created. It was possible for calls to free() very early in the boot process to panic here because the sleep queues were not yet initialised. Specifically, sysinit_add() running at SI_SUB_KLD would trigger this if the array of pointers became big enough to require uma_large_alloc() allocations.
Submitted by: peter
|
134184 |
22-Aug-2004 |
marcel |
Move the cow field between wire_count and hold_count. This is the position that is 64-bit aligned and makes sure that the valid and dirty fields are also 64-bit aligned. This means that if PAGE_SIZE is 32K, the size of the vm_page structure is only increased by 8 bytes instead of 16 bytes. More importantly, the vm_page structure is either 120 or 128 bytes on ia64. These are "interesting" sizes.
|
134139 |
22-Aug-2004 |
alc |
In the previous revision, I failed to condition an early release of Giant in vm_fault() on debug_mpsafevm. If debug_mpsafevm was not set, the result was an assertion failure early in the boot process.
Reported by: green@
|
134128 |
21-Aug-2004 |
alc |
Further reduce the use of Giant by vm_fault(): Giant is held only when manipulating a vnode, e.g., calling vput(). This reduces contention for Giant during many copy-on-write faults, resulting in some additional speedup on SMPs.
Note: debug_mpsafevm must be enabled for this optimization to take effect.
|
133996 |
19-Aug-2004 |
alc |
Acquire and release Giant around a call to VOP_BMAP(). (This is a prerequisite to any further reduction in Giant's use by vm_fault().)
|
133807 |
16-Aug-2004 |
alc |
- Introduce and use a new tunable "debug.mpsafevm". At present, setting "debug.mpsafevm" results in (almost) Giant-free execution of zero-fill page faults. (Giant is held only briefly, just long enough to determine if there is a vnode backing the faulting address.)
Also, condition the acquisition and release of Giant around calls to pmap_remove() on "debug.mpsafevm".
The effect on performance is significant. On my dual Opteron, I see a 3.6% reduction in "buildworld" time.
- Use atomic operations to update several counters in vm_fault().
|
133796 |
16-Aug-2004 |
green |
Rather than bringing back all of the changes to make VM map deletion wait for system wires to disappear, do so (much more trivially) by instead only checking for system wires of user maps and not kernel maps.
Alternative by: tor Reviewed by: alc
|
133726 |
14-Aug-2004 |
alc |
Remove spl calls.
|
133636 |
13-Aug-2004 |
alc |
Replace the linear search in vm_map_findspace() with an O(log n) algorithm built into the map entry splay tree. This replaces the first_free hint in struct vm_map with two fields in vm_map_entry: adj_free, the amount of free space following a map entry, and max_free, the maximum amount of free space in the entry's subtree. These fields make it possible to find a first-fit free region of a given size in one pass down the tree, so O(log n) amortized using splay trees.
This significantly reduces the overhead in vm_map_findspace() for applications that mmap() many hundreds or thousands of regions, and has a negligible slowdown (0.1%) on buildworld. See, for example, the discussion of a micro-benchmark titled "Some mmap observations compared to Linux 2.6/OpenBSD" on -hackers in late October 2003.
OpenBSD adopted this approach in March 2002, and NetBSD added it in November 2003, both with Red-Black trees.
Submitted by: Mark W. Krentel
|
133598 |
12-Aug-2004 |
tegge |
The vm map lock is needed in vm_fault() after the page has been found, to avoid later changes before pmap_enter() and vm_fault_prefault() has completed.
Simplify deadlock avoidance by not blocking on vm map relookup.
In collaboration with: alc
|
133587 |
12-Aug-2004 |
green |
Re-delete the comment from r1.352.
|
133435 |
10-Aug-2004 |
green |
Back out all behavioral chnages.
|
133401 |
09-Aug-2004 |
green |
Revamp VM map wiring.
* Allow no-fault wiring/unwiring to succeed for consistency; however, the wired count remains at zero, so it's a special case.
* Fix issues inside vm_map_wire() and vm_map_unwire() where the exact state of user wiring (one or zero) and system wiring (zero or more) could be confused; for example, system unwiring could succeed in removing a user wire, instead of being an error.
* Require all mappings to be unwired before they are deleted. When VM space is still wired upon deletion, it will be waited upon for the following unwire. This makes vslock(9) work rather than allowing kernel-locked memory to be deleted out from underneath of its consumer as it would before.
|
133398 |
09-Aug-2004 |
alc |
Make two changes to vm_fault(). 1. Move a comment to its proper place, updating it. (Except for white- space, this comment had been unchanged since revision 1.1!) 2. Remove spl calls.
|
133395 |
09-Aug-2004 |
alc |
Remove a stale comment from vm_map_lookup() that pertains to share maps. (The last vestiges of the share map code were removed in revisions 1.153 and 1.159.)
|
133355 |
09-Aug-2004 |
alc |
Make two changes to vm_fault(). 1. Retain the map lock until after the calls to pmap_enter() and vm_fault_prefault(). 2. Remove a stale comment. Submitted by: tegge@
|
133318 |
08-Aug-2004 |
phk |
Tag all geom classes in the tree with a version number.
|
133253 |
07-Aug-2004 |
alc |
Remove dead code. A vm_map's first_free is never NULL (even if the map is full).
(This is preparation for an O(log n) implementation of vm_map_findspace().)
Submitted by: Mark W. Krentel
|
133230 |
06-Aug-2004 |
rwatson |
Generate KTR trace records for uma_zalloc_arg() and uma_zfree_arg(). This doesn't trace every event of interest in UMA, but provides enough basic information to explain lock traces and sleep patterns.
|
133185 |
05-Aug-2004 |
green |
Turn on the new contigmalloc(9) by default. There should not actually be a reason to use the old contigmalloc(9), but if desired, it the vm.old_contigmalloc setting can be tuned/sysctld back to 0 for now.
|
133158 |
05-Aug-2004 |
phk |
Remove a product specific workaround for wrong modes when mmap(2)'ing devices. They have had plenty of time to adjust now.
|
133143 |
04-Aug-2004 |
alc |
- Push down the acquisition and release of Giant into pmap_enter_quick() on those architectures without pmap locking. - Eliminate the acquisition and release of Giant in vm_map_pmap_enter().
|
133113 |
04-Aug-2004 |
dfr |
In dev_pager_updatefake, m->valid is typically 0 on entry. It should be set to VM_PAGE_BITS_ALL before returning, to ensure that neither vm_pager_get_pages nor vm_fault calls vm_page_zero_invalid after dev_pager_getpages has returned.
Submitted by: tegge
|
132999 |
02-Aug-2004 |
alc |
Eliminate the acquisition and release of Giant around the call to pmap_mincore() in mincore(2). Either pmap locking exists (alpha, amd64, i386, ia64) or pmap_mincore() is unimplemented (arm, powerpc, sparc64).
|
132987 |
02-Aug-2004 |
green |
* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default.
Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.
|
132899 |
30-Jul-2004 |
alc |
- Push down the acquisition and release of Giant into pmap_protect() on those architectures without pmap locking. - Eliminate the acquisition and release of Giant from vm_map_protect().
(Translation: mprotect(2) runs to completion without touching Giant on alpha, amd64, i386 and ia64.)
|
132898 |
30-Jul-2004 |
alc |
Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().
|
132884 |
30-Jul-2004 |
dfr |
Fix a memory leak in the device pager which is exposed by the NVIDIA OpenGL driver.
Submitted by: nvidia (possibly also tegge)
|
132883 |
30-Jul-2004 |
dfr |
Fix handling of msync(2) for character special files.
Submitted by: nvidia
|
132880 |
30-Jul-2004 |
mux |
Get rid of another lockmgr(9) consumer by using sx locks for the user maps. We always acquire the sx lock exclusively here, but we can't use a mutex because we want to be able to sleep while holding the lock. This is completely equivalent to what we were doing with the lockmgr(9) locks before.
Approved by: alc
|
132852 |
29-Jul-2004 |
alc |
Advance the state of pmap locking on alpha, amd64, and i386.
- Enable recursion on the page queues lock. This allows calls to vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page queues lock held. Such calls are made to allocate page table pages and pv entries. - The previous change enables a partial reversion of vm/vm_page.c revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault() now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT. - Add partial locking to pmap_copy(). (As a side-effect, pmap_copy() should now be faster on i386 SMP because it no longer generates IPIs for TLB shootdown on the other processors.) - Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now, all changes to a user-level pmap on alpha, amd64, and i386 are performed with appropriate locking.)
|
132842 |
29-Jul-2004 |
bmilekic |
Rework the way slab header storage space is calculated in UMA.
- zone_large_init() stays pretty much the same. - zone_small_init() will try to stash the slab header in the slab page being allocated if the amount of calculated wasted space is less than UMA_MAX_WASTE (for both the UMA_ZONE_REFCNT case and regular case). If the amount of wasted space is >= UMA_MAX_WASTE, then UMA_ZONE_OFFPAGE will be set and the slab header will be allocated separately for better use of space. - uma_startup() calculates the maximum ipers required in offpage slabs (so that the offpage slab header zone(s) can be sized accordingly). The algorithm used to calculate this replaces the old calculation (which only happened to work coincidentally). We now iterate over possible object sizes, starting from the smallest one, until we determine that wastedspace calculated in zone_small_init() might end up being greater than UMA_MAX_WASTE, at which point we use the found object size to compute the maximum possible ipers. The reason this works is because: - wastedspace versus objectsize is a see-saw function with local minima all equal to zero and local maxima growing directly proportioned to objectsize. This implies that for objects up to or equal a certain objectsize, the see-saw remains entirely below UMA_MAX_WASTE, so for those objectsizes it is impossible to ever go OFFPAGE for slab headers. - ipers (items-per-slab) versus objectsize is an inversely proportional function which falls off very quickly (very large for small objectsizes). - To determine the maximum ipers we'll ever need from OFFPAGE slab headers we first find the largest objectsize for which we are guaranteed to not go offpage for and use it to compute ipers (as though we were offpage). Since the only objectsizes allowed to go offpage are bigger than the found objectsize, and since ipers vs objectsize is inversely proportional (and monotonically decreasing), then we are guaranteed that the ipers computed is always >= what we will ever need in offpage slab headers. - Define UMA_FRITM_SZ and UMA_FRITMREF_SZ to be the actual (possibly padded) size of each freelist index so that offset calculations are fixed.
This might fix weird data corruption problems and certainly allows ARM to now boot to at least single-user (via simulator).
Tested on i386 UP by me. Tested on sparc64 SMP by fenner. Tested on ARM simulator to single-user by cognet.
|
132804 |
28-Jul-2004 |
alc |
Correct a very old error in both vm_object_madvise() (originating in vm/vm_object.c revision 1.88) and vm_object_sync() (originating in vm/vm_map.c revision 1.36): When descending a chain of backing objects, both use the wrong object's backing offset. Consequently, both may operate on the wrong pages.
Quoting Matt, "This could be responsible for all of the sporatic madvise oddness that has been reported over the years."
Reviewed by: Matt Dillon
|
132684 |
27-Jul-2004 |
alc |
- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.
|
132638 |
25-Jul-2004 |
alc |
For years, kmem_alloc_pageable() has been misused. Now that the last of these misuses has been corrected, remove it before new ones appear, such as arm/arm/pmap.c revision 1.8.
|
132636 |
25-Jul-2004 |
alc |
Remove spl calls.
|
132627 |
25-Jul-2004 |
alc |
Make the code and comments for vm_object_coalesce() consistent.
|
132593 |
24-Jul-2004 |
alc |
Simplify vmspace initialization. The bcopy() of fields from the old vmspace to the new vmspace in vmspace_exec() is mostly wasted effort. With one exception, vm_swrss, the copied fields are immediately overwritten. Instead, initialize these fields to zero in vmspace_alloc(), eliminating a bcopy() from vmspace_exec() and a bzero() from vmspace_fork().
|
132550 |
22-Jul-2004 |
alc |
- Change uma_zone_set_obj() to call kmem_alloc_nofault() instead of kmem_alloc_pageable(). The difference between these is that an errant memory access to the zone will be detected sooner with kmem_alloc_nofault().
The following changes serve to eliminate the following lock-order reversal reported by witness:
1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311 2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797 3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931
There is no potential deadlock in this case. However, witness is unable to recognize this because vm objects used by UMA have the same type as ordinary vm objects. To remedy this, we make the following changes:
- Add a mutex type argument to VM_OBJECT_LOCK_INIT(). - Use the mutex type argument to assign distinct types to special vm objects such as the kernel object, kmem object, and UMA objects. - Define a static swap zone object for use by UMA. (Only static objects are assigned a special mutex type.)
|
132517 |
21-Jul-2004 |
green |
Fix a race in vm_page_sleep_if_busy(). Due to vm_object locking being incomplete, it currently has to know how to drop and pick back up the vm_object's mutex if it has to sleep and drop the page queue mutex. The problem with this is that if the page is busy, while we are sleeping, the page can be freed and object disappear. When trying to lock m->object, we'd get a stale or NULL pointer and crash.
The object is now cached, but this makes the assumption that the object is referenced in some manner and will not itself disappear while it is unlocked. Since this only happens if the object is locked, I had to remove an assumption earlier in contigmalloc() that reversed the order of locking the object and doing vm_page_sleep_if_busy(), not the normal order.
|
132483 |
21-Jul-2004 |
peter |
Semi-gratuitous change. Move two refcount operations to their own lines rather than be buried inside an if (expression). And now that the if expression is the same in both exit paths, use the same ordering.
|
132475 |
21-Jul-2004 |
peter |
Move the initialization and teardown of pmaps to the vmspace zone's init and fini handlers. Our vm system removes all userland mappings at exit prior to calling pmap_release. It just so happens that we might as well reuse the pmap for the next process since the userland slate has already been wiped clean.
However. There is a functional benefit to this as well. For platforms that share userland and kernel context in the same pmap, it means that the kernel portion of a pmap remains valid after the vmspace has been freed (process exit) and while it is in uma's cache. This is significant for i386 SMP systems with kernel context borrowing because it avoids a LOT of IPIs from the pmap_lazyfix() cleanup in the usual case.
Tested on: amd64, i386, sparc64, alpha Glanced at by: alc
|
132420 |
19-Jul-2004 |
green |
Remove extraneous locks on the VM free page queue mutex; it is not meant to be recursed upon, and could cauuse a deadlock inside the new contigmalloc (vm.old_contigmalloc=0) code.
Submitted by: alc
|
132414 |
19-Jul-2004 |
alc |
- Eliminate the pte object from the pmap. Instead, page table pages are allocated as "no object" pages. Similar changes were made to the amd64 and i386 pmap last year. The primary reason being that maintaining a pte object leads to lock order violations. A secondary reason being that the pte object is redundant, i.e., the page table itself can be used to lookup page table pages. (Historical note: The pte object predates our ability to allocate "no object" pages. Thus, the pte object was a necessary evil.) - Unconditionally check the vm object lock's status in vm_page_remove(). Previously, this assertion could not be made on Alpha due to its use of a pte object.
|
132407 |
19-Jul-2004 |
green |
Since breakage of malloc(9)/uma_zalloc(9) is totally non-optional in GENERIC/for WITNESS users, make sure the sysctl to disable the behavior is read-only and always enabled.
|
132379 |
19-Jul-2004 |
green |
Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time.
The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated.
The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked.
There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes.
A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation.
When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.
|
132336 |
18-Jul-2004 |
alc |
Remove the GIANT_REQUIRED preceding pmap_remove() in vm_pageout_map_deactivate_pages().
|
132220 |
15-Jul-2004 |
alc |
Push down the acquisition and release of the page queues lock into pmap_protect() and pmap_remove(). In general, they require the lock in order to modify a page's pv list or flags. In some cases, however, pmap_protect() can avoid acquiring the lock.
|
132040 |
12-Jul-2004 |
alc |
Remove an unused and unimplemented sysctl. (For the record, it was marked as unimplemented in revision 1.129 nearly six years ago.)
|
131937 |
10-Jul-2004 |
alc |
Increase the scope of the page queues lock in vm_page_alloc() to cover a diagnostic check that accesses the cache queue count.
|
131719 |
06-Jul-2004 |
alc |
Micro-optimize vmspace for 64-bit architectures: Colocate vm_refcnt and vm_exitingcnt so that alignment does not result in wasted space.
|
131665 |
06-Jul-2004 |
bms |
Properly brucify a string by outdenting it.
|
131573 |
04-Jul-2004 |
bmilekic |
Introduce debug.nosleepwithlocks sysctl, 0 by default. If set to 1 and WITNESS is not built, then force all M_WAITOK allocations to M_NOWAIT behavior (transparently). This is to be used temporarily if wierd deadlocks are reported because we still have code paths that perform M_WAITOK allocations with lock(s) held, which can lead to deadlock. If WITNESS is compiled, then the sysctl is ignored and we ask witness to tell us wether we have locks held, converting to M_NOWAIT behavior only if it tells us that we do.
Note this removes the previous mbuf.h inclusion as well (only needed by last revision), and cleans up unneeded [artificial] comparisons to just the mbuf zones. The problem described above has nothing to do with previous mbuf wait behavior; it is a general problem.
|
131572 |
04-Jul-2004 |
green |
Reextend the M_WAITOK-disabling-hack to all three of the mbuf-related zones, and do it by direct comparison of uma_zone_t instead of strcmp.
The mbuf subsystem used to provide M_TRYWAIT/M_DONTWAIT semantics, but this is mostly no longer the case. M_WAITOK has taken over the spot M_TRYWAIT used to have, and for mbuf things, still may return NULL if the code path is incorrectly holding a mutex going into mbuf allocation functions.
The M_WAITOK/M_NOWAIT semantics are absolute; though it may deadlock the system to try to malloc or uma_zalloc something with a mutex held and M_WAITOK specified, it is absolutely required to not return NULL and will result in instability and/or security breaches otherwise. There is still room to add the WITNESS_WARN() to all cases so that we are notified of the possibility of deadlocks, but it cannot change the value of the "badness" variable and allow allocation to actually fail except for the specialized cases which used to be M_TRYWAIT.
|
131528 |
03-Jul-2004 |
green |
Limit mbuma damage. Suddenly ALL allocations with M_WAITOK are subject to failing -- that is, allocations via malloc(M_WAITOK) that are required to never fail -- if WITNESS is not defined. While everyone should be running WITNESS, in any case, zone "Mbuf" allocations are really the only ones that should be screwed with by this hack.
This hack is crashing people, and would continue to do so with or without WITNESS. Things shouldn't be allocating with M_WAITOK with locks held, but it's not okay just to always remove M_WAITOK when !WITNESS.
Reported by: Bernd Walter <ticso@cicely5.cicely.de>
|
131481 |
02-Jul-2004 |
jhb |
Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64.
This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting.
Approved by: scottl (with his re@ hat)
|
131473 |
02-Jul-2004 |
jhb |
- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
|
131434 |
02-Jul-2004 |
jhb |
- Don't use a variable to point to the user area that we only use once. Just use p2->p_uarea directly instead. - Remove an old and mostly bogus assertion regarding p2->p_sigacts. - Use RANGEOF macro ala fork1() to clean up bzero/bcopy of p_stats.
|
131256 |
28-Jun-2004 |
tegge |
Initialize result->backing_object_offset before linking result onto the list of vm objects shadowing source in vm_object_shadow(). This closes a race where vm_object_collapse() could be called with a partially uninitialized object argument causing symptoms that looked like hardware problems, e.g. signal 6, 10, 11 or a /bin/sh busy-waiting for a nonexistant child process.
|
131252 |
28-Jun-2004 |
gallatin |
Use MIN() macro rather than ulmin() inline, and fix stray tab that snuck in with my last commit.
Submitted by: green
|
131251 |
28-Jun-2004 |
gallatin |
Fix alpha - the use of min() on longs was loosing the high bits and returning wrong answers, leading to strange values vm2->vm_{s,t,d}size.
|
131163 |
27-Jun-2004 |
das |
Update a stale comment. The heuristic to swap processes out based on the number of pages already paged out was broken in rev 1.10 and removed in rev 1.11.
|
131152 |
26-Jun-2004 |
alc |
Remove an unused field from the vmspace structure.
|
131073 |
24-Jun-2004 |
green |
Correct the tracking of various bits of the process's vmspace and vm_map when not propogated on fork (due to minherit(2)). Consistency checks otherwise fail when the vm_map is freed and it appears to have not been emptied completely, causing an INVARIANTS panic in vm_map_zdtor().
PR: kern/68017 Submitted by: Mark W. Krentel <krentel@dreamscape.com> Reviewed by: alc
|
131027 |
24-Jun-2004 |
alc |
Call vm_pageout_page_stats() with the page queues lock held.
|
131023 |
24-Jun-2004 |
alc |
Remove spl calls.
|
130995 |
23-Jun-2004 |
bmilekic |
Make uma_mtx MTX_RECURSE. Here's why:
The general UMA lock is a recursion-allowed lock because there is a code path where, while we're still configured to use startup_alloc() for backend page allocations, we may end up in uma_reclaim() which calls zone_foreach(zone_drain), which grabs uma_mtx, only to later call into startup_alloc() because while freeing we needed to allocate a bucket. Since startup_alloc() also takes uma_mtx, we need to be able to recurse on it.
This exact explanation also added as comment above mtx_init().
Trace showing recursion reported by: Peter Holm <peter-at-holm.cc>
|
130979 |
23-Jun-2004 |
bms |
In swap_pager_getpages(), bp->b_dev can be NULL, particularly for the case of NFS mounted swap, so do not try to dereference it.
While we're here, brucify the printf() call which happens when we time out on acquisition of vm_page_queue_mtx.
PR: kern/67898 Submitted by: bde (style)
|
130710 |
19-Jun-2004 |
alc |
Remove spl() calls. Update comments to reflect the removal of spl() calls. Remove '\n' from panic() format strings. Remove some blank lines.
|
130640 |
17-Jun-2004 |
phk |
Second half of the dev_t cleanup.
The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel space struct cdev etc.
|
130626 |
17-Jun-2004 |
alc |
Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not accessible through an object. Thus, PG_BUSY serves no purpose.
|
130585 |
16-Jun-2004 |
phk |
Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
|
130551 |
16-Jun-2004 |
julian |
Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
|
130502 |
15-Jun-2004 |
green |
Make contigmalloc() more reliable:
1. Remove a race whereby contigmalloc() would deadlock against the running processes in the system if they kept reinstantiating the memory on the active and inactive page queues that it was trying to flush out. The process doing the contigmalloc() would sit in "swwrt" forever and the swap pager would be going at full force, but never get anywhere. Instead of doing it until the queues are empty, launder for as many iterations as there are pages in the queue. 2. Do all laundering to swap synchronously; previously, the vnode laundering was synchronous and the swap laundering not. 3. Increase the number of launder-or-allocate passes to three, from two, while failing without bothering to do all the laundering on the third pass if allocation was not possible. This effectively gives exactly two chances to launder enough contiguous memory, helpful with high memory churn where a lot of memory from one pass to the next (and during a single laundering loop) becomes dirtied again.
I can now reliably hot-plug hardware requiring a 256KB contigmalloc() without having the kldload/cbb ithread sit around failing to make progress, while running a busy X session. Previously, it took killing X to get contigmalloc() to get further (that is, quiescing the system), and even then contigmalloc() returned failure.
|
130344 |
11-Jun-2004 |
phk |
Deorbit COMPAT_SUNOS.
We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.
|
130283 |
09-Jun-2004 |
bmilekic |
Backout previous change, I think Julian has a better solution which does not require type-stable refcnts here.
|
130278 |
09-Jun-2004 |
bmilekic |
Make the slabrefzone, the zone from which we allocated slabs with internal reference counters, UMA_ZONE_NOFREE. This way, those slabs (with their ref counts) will be effectively type-stable, then using a trick like this on the refcount is no longer dangerous:
MEXT_REM_REF(m); if (atomic_cmpset_int(m->m_ext.ref_cnt, 0, 1)) { if (m->m_ext.ext_type == EXT_PACKET) { uma_zfree(zone_pack, m); return; } else if (m->m_ext.ext_type == EXT_CLUSTER) { uma_zfree(zone_clust, m->m_ext.ext_buf); m->m_ext.ext_buf = NULL; } else { (*(m->m_ext.ext_free))(m->m_ext.ext_buf, m->m_ext.ext_args); if (m->m_ext.ext_type != EXT_EXTREF) free(m->m_ext.ref_cnt, M_MBUF); } } uma_zfree(zone_mbuf, m);
Previously, a second thread hitting the above cmpset might actually read the refcnt AFTER it has already been freed. A very rare occurance. Now we'll know that it won't be freed, though.
Spotted by: julian, pjd
|
130201 |
07-Jun-2004 |
netchild |
Remove references to L1 in the comments, according to Alan they are historical leftovers.
Approved by: alc
|
130137 |
05-Jun-2004 |
alc |
Update stale comments regarding page coloring.
|
130049 |
04-Jun-2004 |
alc |
Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to blist.h, enabling the removal of numerous #includes from subr_blist.c. (subr_blist.c and swap_pager.c are the only users of these definitions.)
|
129913 |
01-Jun-2004 |
bmilekic |
Fix a comment above uma_zsecond_create(), describing its arguments. It doesn't take 'align' and 'flags' but 'master' instead, which is a reference to the Master Zone, containing the backing Keg.
Pointed out by: Tim Robbins (tjr)
|
129906 |
31-May-2004 |
bmilekic |
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein.
Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework.
From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime.
Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits.
Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)
|
129883 |
30-May-2004 |
alc |
Remove a stale comment: PG_DIRTY and PG_FILLED were removed in revisions 1.17 and 1.12 respectively.
|
129857 |
30-May-2004 |
hmp |
Correct typo, vm_page_list_find() is called vm_pageq_find() for quite a long time, i.e., since the cleanup of the VM Page-queues code done two years ago.
Reviewed by: Alan Cox <alc at freebsd.org>, Matthew Dillon <dillon at backplane.com>
|
129729 |
25-May-2004 |
des |
MFS: vm_map.c rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE semantics but provide a sysctl knob for reverting to old ones.
|
129728 |
25-May-2004 |
des |
Back out previous commit; it went to the wrong file.
|
129725 |
25-May-2004 |
des |
MFS: rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE semantics but provide a sysctl knob for reverting to old ones.
|
129701 |
25-May-2004 |
alc |
Correct two error cases in vm_map_unwire():
1. Contrary to the Single Unix Specification our implementation of munlock(2) when performed on an unwired virtual address range has returned an error. Correct this. Note, however, that the behavior of "system" unwiring is unchanged, only "user" unwiring is changed. If "system" unwiring is performed on an unwired virtual address range, an error is still returned.
2. Performing an errant "system" unwiring on a virtual address range that was "user" (i.e., mlock(2)) but not "system" wired would incorrectly undo the "user" wiring instead of returning an error. Correct this.
Discussed with: green@ Reviewed by: tegge@
|
129571 |
22-May-2004 |
alc |
To date, unwiring a fictitious page has produced a panic. The reason being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious pages but unwiring uses PHYS_TO_VM_PAGE(). The resulting panic reported an unexpected wired count. Rather than attempting to fix PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of fictitious pages. Specifically, fictitious pages will never be completely unwired. Therefore, we can keep a fictitious page's wired count forever set to one and thereby avoid the use of PHYS_TO_VM_PAGE() when we know that we're working with a fictitious page, just not which one.
In collaboration with: green@, tegge@ PR: kern/29915
|
129145 |
12-May-2004 |
alc |
Restructure vm_page_select_cache() so that adding assertions is easy.
Some of the conditions that caused vm_page_select_cache() to deactivate a page were wrong. For example, deactivating an unmanaged or wired page is a nop. Thus, if vm_page_select_cache() had ever encountered an unmanaged or wired page, it would have looped forever. Now, we assert that the page is neither unmanaged nor wired.
|
129143 |
12-May-2004 |
alc |
Cache queue pages are not mapped. Thus, the pmap_remove_all() by vm_pageout_scan()'s loop for freeing cache queue pages is unnecessary.
|
129110 |
11-May-2004 |
tjr |
To handle orphaned character device vnodes properly in mmap(), check that v_mount is non-null before dereferencing it. If it's null, behave as if MNT_NOEXEC was not set on the mount that originally containined it.
|
129057 |
09-May-2004 |
alc |
Cache queue pages are not mapped. Thus, the pmap_remove_all() by vm_page_alloc() is unnecessary.
|
129028 |
07-May-2004 |
green |
In r1.190, vslock() and vsunlock() were bogusly made to do a "user wire" and a "system unwire." Make this a "system wire" and "system unwire."
Reviewed by: alc
|
129018 |
07-May-2004 |
green |
Properly remove MAP_FUTUREWIRE when a vm_map_entry gets torn down. Previously, mlockall(2) usage would leak MAP_FUTUREWIRE of the process's vmspace::vm_map and subsequent processes would wire all of their memory. Coupled with a wired-page leak in vm_fault_unwire(), this would run the system out of free pages and cause programs to randomly SIGBUS when faulting in new pages.
(Note that this is not the fix for the latter part; pages are still leaked when a wired area is unmapped in some cases.)
Reviewed by: alc PR kern/62930
|
128992 |
06-May-2004 |
alc |
Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page.
Reviewed by: tegge@
|
128633 |
25-Apr-2004 |
alc |
Zero the physical page only if it is invalid and not prezeroed.
|
128620 |
24-Apr-2004 |
alc |
Add a VM_OBJECT_LOCK_ASSERT() call. Remove splvm() and splx() calls. Move a comment.
|
128614 |
24-Apr-2004 |
alc |
Update the comment describing vm_page_grab() to reflect the previous revision and correct some of its style errors.
|
128613 |
24-Apr-2004 |
alc |
Push down the responsibility for zeroing a physical page from the caller to vm_page_grab(). Although this gives VM_ALLOC_ZERO a different meaning for vm_page_grab() than for vm_page_alloc(), I feel such change is necessary to accomplish other goals. Specifically, I want to make the PG_ZERO flag immutable between the time it is allocated by vm_page_alloc() and freed by vm_page_free() or vm_page_free_zero() to avoid locking overheads. Once we gave up on the ability to automatically recognize a zeroed page upon entry to vm_page_free(), the ability to mutate the PG_ZERO flag became useless. Instead, I would like to say that "Once a page becomes valid, its PG_ZERO flag must be ignored."
|
128596 |
24-Apr-2004 |
alc |
In cases where a file was resident in memory mmap(..., PROT_NONE, ...) would actually map the file with read access enabled. According to http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is an error. Similarly, an madvise(..., MADV_WILLNEED) would enable read access on a virtual address range that was PROT_NONE.
The solution implemented herein is (1) to pass a vm_prot_t to vm_map_pmap_enter() describing the allowed access and (2) to make vm_map_pmap_enter() responsible for understanding the limitations of pmap_enter_quick().
Submitted by: "Mark W. Krentel" <krentel@dreamscape.com> PR: kern/64573
|
128570 |
23-Apr-2004 |
alc |
Push down Giant into vm_pager_get_pages(). The only get pages methods that require Giant are in the device and vnode pagers.
|
128097 |
10-Apr-2004 |
alc |
- pmap_kenter_temporary() is unused by machine-independent code. Therefore, move its declaration to the machine-dependent header file on those machines that use it. In principle, only i386 should have it. Alpha and AMD64 should use their direct virtual-to-physical mapping. - Remove pmap_kenter_temporary() from ia64. It is unused. Approved by: marcel@
|
128038 |
08-Apr-2004 |
alc |
The demise of vm_pager_map_page() in revision 1.93 of vm/vm_pager.c permits the reduction of the pager map's size by 8M bytes. In other words, eight megabytes of largely wasted KVA are returned to the kernel map for use elsewhere.
|
127961 |
06-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999.
Approved by: core
|
127926 |
06-Apr-2004 |
alc |
Eliminate vm_pager_map_page() and vm_pager_unmap_page() and their uses. Use sf_buf_alloc() and sf_buf_free() instead.
|
127879 |
05-Apr-2004 |
kan |
Delay permission checks for VCHR vnodes until after vnode is locked in vm_mmap_vnode function, where we can safely check for a special /dev/zero case. Rev. 1.180 has reordered checks and introduced a regression.
Submitted by: alc Was broken by: kan
|
127869 |
05-Apr-2004 |
alc |
Remove unused arguments from pmap_init().
|
127868 |
04-Apr-2004 |
alc |
Eliminate unused arguments from vm_page_startup().
|
127327 |
23-Mar-2004 |
tjr |
Do not copy vm_exitingcnt to the new vmspace in vmspace_exec(). Copying it led to impossibly high values in the new vmspace, causing it to never drop to 0 and be freed.
|
127187 |
18-Mar-2004 |
guido |
When mmap-ing a file from a noexec mount, be sure not to grant the right to mmap it PROT_EXEC. This also depends on the architecture, as some architextures (e.g. i386) do not distinguish between read and exec pages
Inspired by: http://linux.bkbits.net:8080/linux-2.4/cset@1.1267.1.85 Reviewed by: alc
|
127013 |
15-Mar-2004 |
truckman |
Make overflow/wraparound checking more robust and unbreak len=0 in vslock(), mlock(), and munlock().
Reviewed by: bde
|
127008 |
15-Mar-2004 |
truckman |
Style(9) changes.
Pointed out by: bde
|
127007 |
15-Mar-2004 |
truckman |
Revert to the original vslock() and vsunlock() API with the following exceptions: Retain the recently added vslock() error return.
The type of the len argument should be size_t, not u_int.
Suggested by: bde
|
127006 |
15-Mar-2004 |
truckman |
Remove redundant suser() check.
|
126911 |
13-Mar-2004 |
alc |
Remove GIANT_REQUIRED from contigfree().
|
126865 |
12-Mar-2004 |
peter |
Part 2 of rev 1.68. Update comment to match reality now that vm_endcopy exists and we no longer copy to the end of the struct.
Forgotten by: alfred and green
|
126793 |
10-Mar-2004 |
alc |
- Make the acquisition of Giant in vm_fault_unwire() conditional on the pmap. For the kernel pmap, Giant is not required. In general, for other pmaps, Giant is required by i386's pmap_pte() implementation. Specifically, the use of PMAP2/PADDR2 is synchronized by Giant. Note: In principle, updates to the kernel pmap's wired count could be lost without Giant. However, in practice, we never use the kernel pmap's wired count. This will be resolved when pmap locking appears. - With the above change, cpu_thread_clean() and uma_large_free() need not acquire Giant. (The first case is simply the revival of i386/i386/vm_machdep.c's revision 1.226 by peter.)
|
126739 |
08-Mar-2004 |
alc |
Implement a work around for the deadlock avoidance case in vm_object_deallocate() so that it doesn't spin forever either.
Submitted by: bde
|
126728 |
07-Mar-2004 |
alc |
Retire pmap_pinit2(). Alpha was the last platform that used it. However, ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps, there has been no reason for having this function on Alpha. Briefly, when pmap_growkernel() relied upon the list of all processes to find and update the various pmaps to reflect a growth in the kernel's valid address space, pmap_init2() served to avoid a race between pmap initialization and pmap_growkernel(). Specifically, pmap_pinit2() was responsible for initializing the kernel portions of the pmap and pmap_pinit2() was called after the process structure contained a pointer to the new pmap for use by pmap_growkernel(). Thus, an update to the kernel's address space might be applied to the new pmap unnecessarily, but an update would never be lost.
|
126714 |
07-Mar-2004 |
rwatson |
Mark uma_callout as CALLOUT_MPSAFE, as uma_timeout can run MPSAFE.
Reviewed by: jeff
|
126668 |
05-Mar-2004 |
truckman |
Undo the merger of mlock()/vslock and munlock()/vsunlock() and the introduction of kern_mlock() and kern_munlock() in src/sys/kern/kern_sysctl.c 1.150 src/sys/vm/vm_extern.h 1.69 src/sys/vm/vm_glue.c 1.190 src/sys/vm/vm_mmap.c 1.179 because different resource limits are appropriate for transient and "permanent" page wiring requests.
Retain the kern_mlock() and kern_munlock() API in the revived vslock() and vsunlock() functions.
Combine the best parts of each of the original sets of implementations with further code cleanup. Make the mclock() and vslock() implementations as similar as possible.
Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent test, which can return EAGAIN, last so that requests that have no hope of ever being satisfied will not be retried unnecessarily.
Disable the test that can return EAGAIN in the vslock() implementation because it will cause the sysctl code to wedge.
Tested by: Cy Schubert <Cy.Schubert AT komquats.com>
|
126632 |
05-Mar-2004 |
alc |
In the last revision, I introduced a physical contiguity check that is both unnecessary and wrong. While it is necessary to verify that the page is still free after dropping and reacquiring the free page queue lock, the physical contiguity of the page can not change, making this check unnecessary. This check was wrong in that it could cause an out-of-bounds array access.
Tested by: rwatson
|
126588 |
04-Mar-2004 |
bde |
Record exactly where this file was copied from. It wasn't repo-copied so this is not very obvious.
Fixed some style bugs (mainly missing parentheses around return values).
|
126585 |
04-Mar-2004 |
bde |
Minor style fixes. In vm_daemon(), don't fetch the rss limit long before it is needed.
|
126571 |
04-Mar-2004 |
alc |
Remove some long unused definitions.
|
126479 |
02-Mar-2004 |
alc |
Modify contigmalloc1() so that the free page queues lock is not held when vm_page_free() is called. The problem with holding this lock is that it is a spin lock and vm_page_free() may attempt the acquisition of a different default-type lock.
|
126424 |
01-Mar-2004 |
kan |
Pich up a do {} while(0) cleanup by phk that was discarded accidentally in previous revision.
Submitted by: alc
|
126332 |
27-Feb-2004 |
kan |
Move the code dealing with vnode out of several functions into a single helper function vm_mmap_vnode.
Discussed with: jeffr,alc (a while ago)
|
126253 |
26-Feb-2004 |
truckman |
Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way.
Enable the RLIMIT_MEMLOCK checking code in kern_mlock().
Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits.
Nuke the vslock() and vsunlock() implementations, which are no longer used.
Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request.
Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request.
Modify the callers of sysctl_wire_old_buffer() to look for the error return.
Modify sysctl_old_user to obey the wired buffer length and clean up its implementation.
Reviewed by: bms
|
126135 |
23-Feb-2004 |
alc |
- Substitute bdone() and bwait() from vfs_bio.c for swap_pager_putpages()'s buffer completion code. Note: the only difference between swp_pager_sync_iodone() and bdone(), aside from the locking in the latter, was the unnecessary clearing of B_ASYNC. - Remove an unnecessary pmap_page_protect() from swp_pager_async_iodone().
Reviewed by: tegge
|
126108 |
22-Feb-2004 |
alc |
Correct a long-standing race condition in vm_object_page_remove() that could result in a dirty page being unintentionally freed.
Reviewed by: tegge MFC after: 7 days
|
126088 |
21-Feb-2004 |
alc |
Eliminate the second, unnecessary call to pmap_page_protect() near the end of vm_pageout_flush(). Instead, assert that the page is still write protected.
Discussed with: tegge
|
125990 |
19-Feb-2004 |
alc |
- Correct a long-standing race condition in vm_page_try_to_free() that could result in a dirty page being unintentionally freed. - Simplify the dirty page check in vm_page_dontneed().
Reviewed by: tegge MFC after: 7 days
|
125889 |
16-Feb-2004 |
des |
Back out previous commit due to objections.
|
125882 |
16-Feb-2004 |
des |
Don't panic if we fail to satisfy an M_WAITOK request; return 0 instead. The calling code will either handle that gracefully or cause a page fault.
|
125861 |
16-Feb-2004 |
alc |
Correct a long-standing race condition in vm_contig_launder() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275.
MFC after: 7 days
|
125838 |
15-Feb-2004 |
alc |
Correct a long-standing race condition in vm_fault() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275.
Reviewed by: tegge MFC after: 7 days
|
125798 |
14-Feb-2004 |
alc |
- Correct a long-standing race condition in vm_page_try_to_cache() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251. - Simplify the code surrounding the fix to this same race condition in vm_pageout.c's revision 1.251. There should be no behavioral change. Reviewed by: tegge
MFC after: 7 days
|
125755 |
12-Feb-2004 |
phk |
Remove the absolute count g_access_abs() function since experience has shown that it is not useful.
Rename the relative count g_access_rel() function to g_access(), only the name has changed.
Change all g_access_rel() calls in our CVS tree to call g_access() instead.
Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source code compatibility.
|
125748 |
12-Feb-2004 |
alc |
Further reduce the use of Giant in vm_map_delete(): Perform pmap_remove() on system maps, besides the kmem_map, without Giant.
In collaboration with: tegge
|
125662 |
10-Feb-2004 |
alc |
Correct a long-standing race condition in the inactive queue scan. (See the added comment for low-level details.) The effect of this race condition is a panic "vm_page_cache: caching a dirty page, ..."
Reviewed by: tegge MFC after: 7 days
|
125558 |
07-Feb-2004 |
alc |
swp_pager_async_iodone() no longer requires Giant. Modify bufdone() and swapgeom_done() to perform swp_pager_async_iodone() without Giant.
Reviewed by: tegge
|
125470 |
05-Feb-2004 |
alc |
- Locking for the per-process resource limits structure has eliminated the need for Giant in vm_map_growstack(). - Use the proc * that is passed to vm_map_growstack() rather than curthread->td_proc.
|
125454 |
04-Feb-2004 |
jhb |
Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
|
125362 |
02-Feb-2004 |
jhb |
Drop the reference count on the old vmspace after fully switching the current thread to the new vmspace.
Suggested by: dillon
|
125322 |
02-Feb-2004 |
phk |
Check error return from g_clone_bio(). (netchild@)
Add XXX comment about why this is still not optimal. (phk@)
Submitted by: netchild@
|
125314 |
02-Feb-2004 |
jeff |
- Use a seperate startup function for the zeroidle kthread. Use this to set P_NOLOAD prior to running the thread.
|
125294 |
01-Feb-2004 |
jeff |
- Fix a problem where we did not drain the cache of buckets in the zone when uma_reclaim() was called. This was introduced when the zone working-set algorithm was removed in favor of using the per cpu caches as the working set.
|
125246 |
30-Jan-2004 |
des |
Mechanical whitespace cleanup.
|
125193 |
29-Jan-2004 |
bde |
Fixed breakage of scheduling in rev.1.29 of subr_4bsd.c. The "scheduler" here has very little to do with scheduling. It is actually the swapper, and it really must be the last SYSINIT'ed item like its comment says, since proc0 metamorphoses into swapper by calling scheduler() last in mi_start(), and scheduler() never returns.. Rev.1.29 of subr_4bsd.c broke this by adding another SI_ORDER_FIRST item (kproc_start() for schedcpu_thread() onto the SI_SUB_RUN_SCHEDULER_LIST. The sorting of SYSINITs with identical orders (at all levels) is apparently nondeterministic, so this resulted in schedule() sometimes being called second last and schedcpu_thread() not being called at all.
This quick fix just changes the code to almost match the comment (SI_ORDER_FIRST -> SI_ORDER_ANY). "LAST" is misspelled "ANY", and there is no way to ensure that there is only 1 very lst SYSINIT. A more complete fix would remove the SYSINIT obfuscation.
|
124944 |
25-Jan-2004 |
jeff |
- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
|
124933 |
24-Jan-2004 |
alc |
1. Statically initialize swap_pager_full and swap_pager_almost_full to the full state. (When swap is added their state will change appropriately.) 2. Set swap_pager_full and swap_pager_almost_full to the full state when the last swap device is removed. Combined these changes eliminate nonsense messages from the kernel on swap- less machines.
Item 2 submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz> Prodding by: phk
|
124649 |
18-Jan-2004 |
alc |
Increase UMA_BOOT_PAGES because of changes to pv entry initialization in revision 1.457 of i386/i386/pmap.c.
|
124646 |
18-Jan-2004 |
alc |
Don't acquire Giant in vm_object_deallocate() unless the object is vnode- backed.
|
124513 |
14-Jan-2004 |
alc |
Remove vm_page_alloc_contig(). It's now unused.
|
124366 |
11-Jan-2004 |
alc |
Remove long dead code, specifically, code related to munmapfd(). (See also vm/vm_mmap.c revision 1.173.)
|
124353 |
10-Jan-2004 |
alc |
- Unmanage pages allocated by contigmalloc1(). (There is no point in having PV entries for these pages.) - Remove splvm() and splx() calls.
|
124321 |
10-Jan-2004 |
alc |
Unmanage pages allocated by kmem_alloc(). (There is no point in having PV entries for these pages.)
|
124261 |
08-Jan-2004 |
alc |
- Enable recursive acquisition of the mutex synchronizing access to the free pages queue. This is presently needed by contigmalloc1(). - Move a sanity check against attempted double allocation of two pages to the same vm object offset from vm_page_alloc() to vm_page_insert(). This provides better protection because double allocation could occur through a direct call to vm_page_insert(), such as that by vm_page_rename(). - Modify contigmalloc1() to hold the mutex synchronizing access to the free pages queue while it scans vm_page_array in search of free pages. - Correct a potential leak of pages by contigmalloc1() that I introduced in revision 1.20: We must convert all cache queue pages to free pages before we begin removing free pages from the free queue. Otherwise, if we have to restart the scan because we are unable to acquire the vm object lock that is necessary to convert a cache queue page to a free page, we leak those free pages already removed from the free queue.
|
124195 |
06-Jan-2004 |
alc |
Don't bother clearing PG_ZERO in contigmalloc1(), kmem_alloc(), or kmem_malloc(). It serves no purpose.
|
124133 |
04-Jan-2004 |
alc |
Simplify the various pager allocation routines by computing the desired object size once and assigning that value to a local variable.
|
124117 |
04-Jan-2004 |
alc |
Eliminate the acquisition and release of Giant from vnode_pager_alloc(). The vm object and vnode locking should suffice.
Discussed with: jeff
|
124110 |
03-Jan-2004 |
alc |
Reduce the scope of Giant in swap_pager_alloc().
|
124084 |
02-Jan-2004 |
alc |
Revision 1.74 of vm_meter.c ("Avoid lock-order reversal") makes the release and subsequent reacquisition of the same vm object lock in vm_object_collapse() unnecessary.
|
124083 |
02-Jan-2004 |
alc |
Avoid lock-order reversal between the vm object list mutex and the vm object mutex.
|
124048 |
01-Jan-2004 |
alc |
- Increase the scope of the kmem_object's lock in kmem_malloc(). Add a comment explaining why a further increase is not possible.
|
124028 |
31-Dec-2003 |
alc |
In vm_page_lookup() check the root of the vm object's splay tree for the desired page before calling vm_page_splay().
|
124012 |
31-Dec-2003 |
alc |
Simplify vm_page_grab(): Don't bother with the generation check. If the vm object hasn't changed, the desired page will be at or near the root of the vm object's splay tree, making vm_page_lookup() cheap. (The only lock required for vm_page_lookup() is already held.) If, however, the vm object has changed and retry was requested, eliminating the generation check also eliminates a pointless acquisition and release of the page queues lock.
|
124008 |
30-Dec-2003 |
alc |
- Modify vm_object_split() to expect a locked vm object on entry and return on a locked vm object on exit. Remove GIANT_REQUIRED. - Eliminate some unnecessary local variables from vm_object_split().
|
123948 |
29-Dec-2003 |
alc |
Remove swap_pager_un_object_list; it is unused.
|
123914 |
28-Dec-2003 |
alc |
Remove GIANT_REQUIRED from kmem_suballoc().
|
123879 |
26-Dec-2003 |
alc |
- Reduce Giant's scope in vm_fault(). - Use vm_object_reference_locked() instead of vm_object_reference() in vm_fault().
|
123878 |
26-Dec-2003 |
alc |
Minor correction to revision 1.258: Use the proc pointer that is passed to vm_map_growstack() in the RLIMIT_VMEM check rather than curthread.
|
123711 |
22-Dec-2003 |
alc |
- Create an unmapped guard page to trap access to vm_page_array[-1]. This guard page would have trapped the problems with the MFC of the PAE support to RELENG_4 at an earlier point in the sequence of events.
Submitted by: tegge
|
123710 |
22-Dec-2003 |
alc |
- Significantly reduce the number of preallocated pv entries in pmap_init(). Such a large preallocation is unnecessary and wastes nearly eight megabytes of kernel virtual address space per gigabyte of managed physical memory. - Increase UMA_BOOT_PAGES by two. This enables the removal of pmap_pv_allocf(). (Note: this function was only used during initialization, specifically, after pmap_init() but before pmap_init2(). During pmap_init2(), a new allocator is installed.)
|
123697 |
21-Dec-2003 |
alc |
- Correct an error in mincore(2) that has existed since its introduction: mincore(2) should check that the page is valid, not just allocated. Otherwise, it can return a false positive for a page that is not yet resident because it is being read from disk.
|
123280 |
08-Dec-2003 |
kan |
Remove trailing whitespace.
|
123276 |
08-Dec-2003 |
alc |
Addendum to revision 1.174: In the case where vm_pager_allocate() is called to create a vnode-backed object, the vnode lock must be held by the caller.
Reported by: truckman Discussed with: kan
|
123168 |
06-Dec-2003 |
alc |
Fix a deadlock between vm_fault() and vm_mmap(): The expected lock ordering between vm_map and vnode locks is that vm_map locks are acquired first. In revision 1.150 mmap(2) was changed to pass a locked vnode into vm_mmap(). This creates a lock-order reversal when vm_mmap() calls one of the vm_map routines that acquires a vm_map lock. The solution implemented herein is to release the vnode lock in mmap() before calling vm_mmap() and reacquire this lock if necessary in vm_mmap().
Approved by: re (scottl) Reviewed by: jeff, kan, rwatson
|
123126 |
03-Dec-2003 |
jhb |
Fix all users of mp_maxid to use the same semantics, namely:
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.
Approved by: re (scottl) Tested on: i386, amd64, alpha
|
123073 |
30-Nov-2003 |
jeff |
- Unbreak UP. mp_maxid is not defined on uni-processor machines, although I believe it and the other MP variables should be. For now, just define it here and wait for jhb to clean it up later.
Approved by: re (rwatson)
|
123057 |
30-Nov-2003 |
jeff |
- Replace the local maxcpu with mp_maxid. Previously, if mp_maxid was equal to MAXCPU, we would overrun the pcpu_mtx array because maxcpu was calculated incorrectly. - Add some more debugging code so that memory leaks at the time of uma_zdestroy() are more easily diagnosed.
Approved by: re (rwatson)
|
122902 |
19-Nov-2003 |
alc |
- Avoid a lock-order reversal between Giant and a system map mutex that occurs when kmem_malloc() fails to allocate a sufficient number of vm pages. Specifically, we avoid the lock-order reversal by not grabbing Giant around pmap_remove() if the map is the kmem_map.
Approved by: re (jhb) Reported by: Eugene <eugene3@web.de>
|
122748 |
15-Nov-2003 |
tjr |
In vnode_pager_input_smlfs(), call VOP_STRATEGY instead of VOP_SPECSTRATEGY on non-VCHR vnodes. This fixes a panic when reading data from files on a filesystem with a small (less than a page) block size.
PR: 59271 Reviewed by: alc
|
122680 |
14-Nov-2003 |
alc |
- Remove use of Giant from uma_zone_set_obj().
|
122651 |
14-Nov-2003 |
alc |
- Remove long dead code.
|
122646 |
14-Nov-2003 |
alc |
Changes to msync(2) - Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE is specified to msync(2). This is required by the Open Group Base Specifications Issue 6. - vm_map_sync() doesn't return KERN_FAILURE. Thus, msync(2) can't possibly return EIO. - The second major loop in vm_map_sync() handles sub maps. Thus, failing on sub maps in the first major loop isn't necessary.
|
122384 |
10-Nov-2003 |
alc |
- The Open Group Base Specifications Issue 6 specifies that an munmap(2) must return EINVAL if size is zero. Submitted by: tegge - In order to avoid a race condition in multithreaded applications, the check and removal operations by munmap(2) must be in the same critical section. To accomodate this, vm_map_check_protection() is modified to require its caller to obtain at least a read lock on the map.
|
122383 |
10-Nov-2003 |
mini |
NFC: Update stale comments.
Reviewed by: alc
|
122367 |
09-Nov-2003 |
alc |
- Remove Giant from msync(2). Giant is still acquired by the lower layers if we drop into the pmap or vnode layers. - Migrate the handling of zero-length msync(2)s into vm_map_sync() so that multithread applications can't change the map between implementing the zero-length hack in msync(2) and reacquiring the map lock in vm_map_sync().
Reviewed by: tegge
|
122349 |
09-Nov-2003 |
alc |
- Rename vm_map_clean() to vm_map_sync(). This better reflects the fact that msync(2) is its only caller. - Migrate the parts of the old vm_map_clean() that examined the internals of a vm object to a new function vm_object_sync() that is implemented in vm_object.c. At the same, introduce the necessary vm object locking so that vm_map_sync() and vm_object_sync() can be called without Giant.
Reviewed by: tegge
|
122095 |
05-Nov-2003 |
alc |
- Move the implementation of OBJ_ONEMAPPING from vm_map_delete() to vm_map_entry_delete() so that all of the vm object manipulation is performed in one place.
|
122034 |
04-Nov-2003 |
marcel |
Update avail_ssize for rstacks after growing them.
|
121962 |
03-Nov-2003 |
des |
Whitespace cleanup.
|
121919 |
03-Nov-2003 |
alc |
- Increase the scope of the source object lock in vm_map_copy_entry().
|
121913 |
02-Nov-2003 |
alc |
- Increase the scope of two vm object locks in vm_object_split().
|
121907 |
02-Nov-2003 |
alc |
- Introduce and use vm_object_reference_locked(). Unlike vm_object_reference(), this function must not be used to reanimate dead vm objects. This restriction simplifies locking.
Reviewed by: tegge
|
121866 |
01-Nov-2003 |
alc |
- Increase the scope of two vm object locks in vm_object_collapse(). - Remove the acquisition and release of Giant from vm_object_coalesce().
|
121854 |
01-Nov-2003 |
alc |
- Modify swap_pager_copy() and its callers such that the source and destination objects are locked on entry and exit. Add comments to the callers noting that the locks can be released by swap_pager_copy(). - Remove several instances of GIANT_REQUIRED.
|
121844 |
01-Nov-2003 |
alc |
- Additional vm object locking in vm_object_split() - New vm object locking assertions in vm_page_insert() and vm_object_set_writeable_dirty()
|
121821 |
31-Oct-2003 |
alc |
- Revert a part of revision 1.73: Make vm_object_set_flag() an inline function. This function is so trivial that inlining reduces the size of the kernel.
|
121815 |
31-Oct-2003 |
alc |
- Take advantage of the swap pager locking: Eliminate the use of Giant from vm_object_madvise(). - Remove excessive blank lines from vm_object_madvise().
|
121786 |
31-Oct-2003 |
marcel |
Fix two bugs introduced with the rstack functionality and specific to the rstack functionality: 1. Fix a KASSERT that tests for the address to be above the upward growable stack. Typically for rstack, the faulting address can be identical to the record end of the upward growable entry, and very likely is on ia64. The KASSERT tested for greater than, not greater equal, so whenever the register stack had to be grown the assertion fired. 2. When we grow the upward growable stack entry and adjust the unlying object, don't forget to adjust the size of the VM map. Not doing so would trigger an assert in vm_mapzdtor().
Pointy hat: marcel (for not testing with INVARIANTS).
|
121782 |
31-Oct-2003 |
alc |
- Synchronize access to the swdevt's sw_flags with sw_dev_mtx. - Remove several instances of GIANT_REQUIRED.
|
121727 |
30-Oct-2003 |
alc |
- Synchronize access to the swdevt's sw_blist with sw_dev_mtx. - Remove several instances of GIANT_REQUIRED.
|
121725 |
30-Oct-2003 |
alc |
- Synchronize access to swdevhd using sw_dev_mtx. - Use swp_sizecheck() rather than assignment to swap_pager_full in swaponsomething().
|
121649 |
29-Oct-2003 |
alc |
- Synchronize updates to nswapdev using sw_dev_mtx.
|
121646 |
29-Oct-2003 |
alc |
- Avoid a race in swaponsomething(): Calculate the new swdevt's first and end swblk and insert this new swdevt into the list of swap devices in the same critical section.
|
121601 |
27-Oct-2003 |
alc |
- Complete the synchronization of accesses to the swblock hash table.
|
121583 |
26-Oct-2003 |
alc |
- Introduce and use a mutex synchronizing access to the swblock hash table.
|
121562 |
26-Oct-2003 |
alc |
- Simplify vm_object_collapse()'s collapse case, reducing the number of lock acquires and releases performed. - Move an assertion from vm_object_collapse() to vm_object_zdtor() because it applies to all cases of object destruction.
|
121517 |
25-Oct-2003 |
alc |
- Add some of the required vm object locking, including assertions where the vm object lock is required and already held.
|
121511 |
25-Oct-2003 |
alc |
- Align a comment within struct vm_page. - Annotate the vm_page's valid field as synchronized by the containing vm object's lock.
|
121495 |
25-Oct-2003 |
alc |
- Call vnode_pager_input_old() with the vm object locked.
|
121455 |
24-Oct-2003 |
alc |
- Push down Giant from vm_pageout() to vm_pageout_scan(), freeing vm_pageout_page_stats() from Giant. - Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the vm object to be locked on entry. (All of the pager routines now expect this.)
|
121351 |
22-Oct-2003 |
alc |
- Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from vm_pageout_scan(). Rationale: I don't like leaving a busy page in the cache queue with neither the vm object nor the vm page queues lock held. - Assert that the page is active in vm_pageout_page_stats().
|
121321 |
22-Oct-2003 |
alc |
- Assert that every page found in the active queue is an active page.
|
121313 |
21-Oct-2003 |
alc |
- Assert that the containing vm object is locked in vm_page_set_validclean(). (This function reads and modifies the vm page's valid field, which is synchronized by the lock on the containing vm object.)
|
121288 |
20-Oct-2003 |
alc |
- Remove some long unused code.
|
121267 |
20-Oct-2003 |
alc |
- Remove comments referring to functions that no longer exist.
|
121264 |
20-Oct-2003 |
alc |
- Hold the vm object's lock around calls to vm_page_set_validclean().
|
121230 |
19-Oct-2003 |
alc |
- Synchronize access to a vm page's valid field using the containing vm object's lock. - Reduce the scope of the vm page queues lock in two places.
|
121227 |
18-Oct-2003 |
alc |
- Synchronize access to the page's valid field in vnode_pager_generic_getpages() using the containing object's lock.
|
121226 |
18-Oct-2003 |
alc |
- Increase the object lock's scope in vm_contig_launder() so that access to the object's type field and the call to vm_pageout_flush() are synchronized. - The above change allows for the eliminaton of the last parameter to vm_pageout_flush(). - Synchronize access to the page's valid field in vm_pageout_flush() using the containing object's lock.
|
121221 |
18-Oct-2003 |
alc |
Corrections to revision 1.305 - Specifying VM_MAP_WIRE_HOLESOK should not assume that the start address is the beginning of the map. Instead, move to the first entry after the start address. - The implementation of VM_MAP_WIRE_HOLESOK was incomplete. This caused the failure of mlockall(2) in some circumstances.
|
121205 |
18-Oct-2003 |
phk |
DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)
|
121199 |
18-Oct-2003 |
phk |
Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY(). Remove stale comment about B_PHYS.
|
121150 |
17-Oct-2003 |
alc |
- Synchronize access to a vm page's valid field using the containing vm object's lock. - Release the vm object and vm page queues locks around vput().
|
121108 |
15-Oct-2003 |
alc |
- vm_fault_copy_entry() should not assume that the source object contains every page. If the source entry was read-only, one or more wired pages could be in backing objects. - vm_fault_copy_entry() should not set the PG_WRITEABLE flag on the page unless the destination entry is, in fact, writeable.
|
120905 |
08-Oct-2003 |
alc |
Lock the destination object in vm_fault_copy_entry().
|
120903 |
08-Oct-2003 |
alc |
Retire vm_page_copy(). Its reason for being ended when peter@ modified pmap_copy_page() et al. to accept a vm_page_t rather than a physical address. Also, this change will facilitate locking access to the vm page's valid field.
|
120837 |
06-Oct-2003 |
bms |
Only the super-user should be able to wire pages via the mlock() family of system calls at this time. Remove various #ifdef's to enforce this.
|
120831 |
06-Oct-2003 |
bms |
Move pmap_resident_count() from the MD pmap.h to the MI pmap.h. Add a definition of pmap_wired_count(). Add a definition of vmspace_wired_count().
Reviewed by: truckman Discussed with: peter
|
120824 |
05-Oct-2003 |
alc |
The addition of a locking assertion to vm_page_zero_invalid() has revealed a long-time bug: vm_pager_get_pages() assumes that m[reqpage] contains a valid page upon return from pgo_getpages(). In the case of the device pager this page has been freed and replaced by a fake page. The fake page is properly inserted into the vm object but m[reqpage] is left pointing to a freed page. For now, update m[reqpage] to point to the fake page.
Submitted by: tegge
|
120811 |
05-Oct-2003 |
bms |
Revert previous commit. Come back vslock(), all is forgiven.
Pointy hat to: bms
|
120806 |
05-Oct-2003 |
bms |
Retire vslock() and vsunlock() with extreme prejudice.
Discussed with: pete
|
120790 |
05-Oct-2003 |
alc |
Assert that the containing vm object's lock is held in vm_page_set_invalid().
|
120766 |
04-Oct-2003 |
alc |
Assert that the containing vm object's lock is held in vm_page_zero_invalid().
|
120764 |
04-Oct-2003 |
alc |
Synchronize access to a vm page's valid field using the containing vm object's lock.
|
120762 |
04-Oct-2003 |
alc |
- Extend the scope the vm object lock to cover calls to vm_page_is_valid(). - Assert that the lock on the containing vm object is held in vm_page_is_valid().
|
120761 |
04-Oct-2003 |
alc |
Synchronize access to a vm page's valid field using the containing vm object's lock.
|
120739 |
04-Oct-2003 |
jeff |
- Use the UMA_ZONE_VM flag on the fakepg and object zones to prevent vm recursion and LORs. This may be necessary for other zones created in the vm but this needs to be verified.
|
120722 |
03-Oct-2003 |
alc |
Migrate pmap_prefault() into the machine-independent virtual memory layer.
A small helper function pmap_is_prefaultable() is added. This function encapsulate the few lines of pmap_prefault() that actually vary from machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have much in common. Going forward, it's worth considering their merger.
|
120538 |
28-Sep-2003 |
alc |
In vm_page_remove(), assert that the vm object is locked, unless an Alpha. (The Alpha still requires updates to its pmap.)
|
120531 |
27-Sep-2003 |
marcel |
Part 2 of implementing rstacks: add the ability to create rstacks and use the ability on ia64 to map the register stack. The orientation of the stack (i.e. its grow direction) is passed to vm_map_stack() in the overloaded cow argument. Since the grow direction is represented by bits, it is possible and allowed to create bi-directional stacks. This is not an advertised feature, more of a side-effect.
Fix a bug in vm_map_growstack() that's specific to rstacks and which we could only find by having the ability to create rstacks: when the mapped stack ends at the faulting address, we have not actually mapped the faulting address. we need to include or cover the faulting address.
Note that at this time mmap(2) has not been extended to allow the creation of rstacks by processes. If such a need arises, this can be done.
Tested on: alpha, i386, ia64, sparc64
|
120526 |
27-Sep-2003 |
phk |
Provide a bit more help with "memory overwritten after free" style bugs.
|
120422 |
25-Sep-2003 |
peter |
Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
|
120389 |
23-Sep-2003 |
silby |
Adjust the kmapentzone limit so that it takes into account the size of maxproc and maxfiles, as procs, pipes, and other structures cause allocations from kmapentzone.
Submitted by: tegge
|
120371 |
23-Sep-2003 |
alc |
Change the handling of the kernel and kmem objects in vm_map_delete(): In order to use "unmanaged" pages in the kmem object, vm_map_delete() must unconditionally perform pmap_remove(). Otherwise, sparc64 has problems.
Tested by: jake
|
120326 |
22-Sep-2003 |
alc |
Initialize the page's pindex field even for VM_ALLOC_NOOBJ allocations. (This field is useful for implementing sanity checks even if the page does not belong to an object.)
|
120311 |
21-Sep-2003 |
jeff |
- Fix MD_SMALL_ALLOC on architectures that support it. Define a new alloc function, startup_alloc(), that is used for single page allocations prior to the VM starting up. If it is used after the VM startups up, it replaces the zone's allocf pointer with either page_alloc() or uma_small_alloc() where appropriate.
Pointy hat to: me Tested by: phk/amd64, me/x86
|
120305 |
20-Sep-2003 |
peter |
Bad Jeffr! No cookie!
Temporarily disable the UMA_MD_SMALL_ALLOC stuff since recent commits break sparc64, amd64, ia64 and alpha. It appears only i386 and maybe powerpc were not broken.
|
120262 |
19-Sep-2003 |
jeff |
- Remove the working-set algorithm. Instead, use the per cpu buckets as the working set cache. This has several advantages. Firstly, we never touch the per cpu queues now in the timeout handler. This removes one more reason for having per cpu locks. Secondly, it reduces the size of the zone by 8 bytes, bringing it under 200 bytes for a single proc x86 box. This tidies up other logic as well. - The 'destroy' flag no longer needs to be passed to zone_drain() since it always frees everything in the zone's slabs. - cache_drain() is now only called from zone_dtor() and so it destroys by default. It also does not need the destroy parameter now.
|
120255 |
19-Sep-2003 |
jeff |
- Remove the cache colorization code. We can't use it due to all of the broken consumers of the malloc interface who assume that the allocated address will be an even multiple of the size. - Remove disabled time delay code on uma_reclaim(). The comment there said it all. It was not an effective strategy and it should not be left in #if 0'd for all eternity.
|
120249 |
19-Sep-2003 |
jeff |
- There are an endless stream of style(9) errors in this file. Fix a few. Also catch some spelling errors.
|
120229 |
19-Sep-2003 |
jeff |
- Don't inspect the zone in page_alloc(). It may be NULL. - Don't cache more items than the zone would like in uma_zalloc_bucket().
|
120224 |
19-Sep-2003 |
jeff |
- Move the logic for dealing with the uma_boot_pages cache into the page_alloc() function from the slab_zalloc() function. This allows us to unconditionally call uz_allocf(). - In page_alloc() cleanup the boot_pages logic some. Previously memory from this cache that was not used by the time the system started was left in the cache and never used. Typically this wasn't more than a few pages, but now we will use this cache so long as memory is available.
|
120223 |
19-Sep-2003 |
jeff |
- Fix the silly flag situation in UMA. Remove redundant ZFLAG/ZONE flags by accepting the user supplied flags directly. Previously this was not done so that flags for the same field would not be defined in two different files. Add comments in each header instructing future developers on how now to shoot their feet. - Fix a test for !OFFPAGE which should have been a test for HASH. This would have caused a panic if we had ever destructed a malloc zone. This also opens up the possibility that other zones could use the vsetobj() method rather than a hash.
|
120221 |
19-Sep-2003 |
jeff |
- Don't abuse M_DEVBUF, define a tag for UMA hashes.
|
120219 |
19-Sep-2003 |
jeff |
- Eliminate a pair of unnecessary variables.
|
120218 |
19-Sep-2003 |
jeff |
- Initialize a pool of bucket zones so that we waste less space on zones that don't cache as many items. - Introduce the bucket_alloc(), bucket_free() functions to wrap bucket allocation. These functions select the appropriate bucket zone to allocate from or free to. - Rename ub_ptr to ub_cnt to reflect a change in its use. ub_cnt now reflects the count of free items in the bucket. This gets rid of many unnatural subtractions by 1 throughout the code. - Add ub_entries which reflects the number of entries possibly held in a bucket.
|
120217 |
19-Sep-2003 |
alc |
Merge vm_pageout_free_page_calc() into vm_pageout(), eliminating some unneeded code.
|
120183 |
18-Sep-2003 |
alc |
Add vm object locking to vnode_pager_lock(). (This triggers the movement of a VM_OBJECT_LOCK() in vm_fault().)
|
120152 |
17-Sep-2003 |
alc |
Remove GIANT_REQUIRED from vm_object_shadow().
|
120150 |
17-Sep-2003 |
alc |
When calling vget() on a vnode-backed vm object, acquire the vnode interlock before releasing the vm object's lock.
|
120086 |
15-Sep-2003 |
alc |
Eliminate the use of Giant from vm_object_reference().
|
120050 |
14-Sep-2003 |
alc |
Call vm_page_unmanage() on pages belonging to the kmem_object. This eliminates the unnecessary overhead of managing "PV" entries for these pages.
|
120035 |
13-Sep-2003 |
alc |
There is no need for an atomic increment on the vm object's generation count in _vm_object_allocate(). (Access to the generation count is governed by the vm object's lock.) Note: the introduction of the atomic increment in revision 1.238 appears to be an accident. The purpose of that commit was to fix an Alpha-specific bug in UMA's debugging code.
|
119999 |
12-Sep-2003 |
alc |
Add a new parameter to pmap_extract_and_hold() that is needed to eliminate Giant from vmapbuf().
Idea from: tegge
|
119869 |
08-Sep-2003 |
alc |
Introduce a new pmap function, pmap_extract_and_hold(). This function atomically extracts and holds the physical page that is associated with the given pmap and virtual address. Such a function is needed to make the memory mapping optimizations used by, for example, pipes and raw disk I/O MP-safe.
Reviewed by: tegge
|
119858 |
07-Sep-2003 |
alc |
Revise the locking in mincore(2).
|
119663 |
02-Sep-2003 |
phk |
Don't open with exclusive bit, swapon(8) wants to trash our swapdev.
Add XXX comment with a rating of this concept.
|
119658 |
01-Sep-2003 |
eivind |
Change clean_map from a global to an auto variable
|
119596 |
31-Aug-2003 |
alc |
- Add vm object locking to the part of vm_pageout_scan() that launders dirty pages. - Remove some unused variables.
|
119595 |
30-Aug-2003 |
marcel |
Introduce MAP_ENTRY_GROWS_DOWN and MAP_ENTRY_GROWS_UP to allow for growable (stack) entries that not only grow down, but also grow up. Have vm_map_growstack() take these flags into account when growing an entry.
This is the first step in adding support for upward growable stacks. It is a required feature on ia64 to support the register stack (or rstack as I like to call it -- it also means reverse stack). We do not currently create rstacks, so the upward growing is not exercised and the change should be a functional no-op.
Reviewed by: alc
|
119591 |
30-Aug-2003 |
phk |
Add a close() method to a swapdev.
Add a GEOM based backend.
Remove the device/VOP_SPECSTRATEGY() based backend.
|
119590 |
30-Aug-2003 |
phk |
Protect the swapdevice tailq with a mutex.
Store the udev_t we will report to userland in the swdevt.
|
119575 |
30-Aug-2003 |
phk |
Continue the objectification of the swapdev backends:
Remove the vnode and dev_t fields and replace them with a void *.
Introduce separate strategy functions for devices and regular (NFS) vnodes.
For devices we don't need the vnode v_numoutput stuff.
Add a generic swaponsomething() function to add a swapdevice and split the remainder of swaponvp() into swaponvp() and swapondev() which calls this backend.
|
119574 |
30-Aug-2003 |
phk |
Make the strategy function a method of the individual swapdev.
|
119573 |
30-Aug-2003 |
phk |
Consistent use modern function definitions
|
119544 |
29-Aug-2003 |
marcel |
In vnode_pager_generic_putpages(), change the printf format specifier to long and explicitly cast field dirty of struct vm_page to unsigned long. When PAGE_SIZE is 32K, this field is actually unsigned long.
|
119543 |
28-Aug-2003 |
alc |
Recent pmap changes permit the use of a more precise locking assertion in vm_page_lookup().
|
119468 |
25-Aug-2003 |
marcel |
Assert that u_long is at least 64 bits if PAGE_SIZE is 32K.
Suggested by: phk
|
119373 |
23-Aug-2003 |
alc |
Held pages, just like wired pages, should not be added to the cache queues.
Submitted by: tegge
|
119370 |
23-Aug-2003 |
alc |
Hold the page queues lock when performing vm_page_clear_dirty() and vm_page_set_invalid().
|
119357 |
23-Aug-2003 |
alc |
To implement the sequential access optimization, vm_fault() may need to reacquire the "first" object's lock while a backing object's lock is held. Since this is a lock-order reversal, vm_fault() uses trylock to acquire the first object's lock, skipping the sequential access optimization in the unlikely event that the trylock fails.
|
119356 |
23-Aug-2003 |
marcel |
Also define VM_PAGE_BITS_ALL for 16K and 32K pages. Make the constant unsigned for all page sizes and unsigned long for 32K pages.
|
119354 |
23-Aug-2003 |
marcel |
Add support for 16K and 32K page sizes. The valid and dirty maps in struct vm_page are defined as u_int for 16K pages and u_long for 32K pages, with the implied assumption that long will at least be 64 bits wide on platforms where we support 32K pages.
|
119247 |
21-Aug-2003 |
alc |
Assert that the vm object's lock is held on entry to vm_page_grab(); remove code from this function that was needed when vm object locking was incomplete.
|
119186 |
20-Aug-2003 |
alc |
Assert that the vm object lock is held in vm_page_alloc().
|
119182 |
20-Aug-2003 |
bmilekic |
In sysctl_vm_zone, do not calculate per-cpu cache stats on UMA_ZFLAG_INTERNAL zones at all. Apparently, Wilko's alpha was crashing while entering multi-user because, I think, we were calculating the garbage cachefree for pcpu caches that essentially don't exist for at least the 'zones' zone and it so happened that we were reading from an unmapped location.
Confirmed to fix crash: wilko Helped debug: wilko, gallatin
|
119092 |
18-Aug-2003 |
phk |
Replace a homegrown bdone()/bwait() implementation by the real thing
|
119059 |
18-Aug-2003 |
alc |
Three unrelated changes to vm_proc_new(): (1) add vm object locking on the U pages object; (2) reorganize such that the U pages object is created and filled in one block; and (3) remove an unnecessary clearing of PG_ZERO.
|
119045 |
17-Aug-2003 |
phk |
Use NULL for 3rd argument of VOP_BMAP() rather than custom cast. Eliminate unused variable.
|
119004 |
16-Aug-2003 |
marcel |
In vm_thread_swap{in|out}(), remove the alpha specific conditional compilation and replace it with a call to cpu_thread_swap{in|out}(). This allows us to add similar code on ia64 without cluttering the code even more.
|
118946 |
15-Aug-2003 |
phk |
Eliminate unnecessary udev_t variable: we can derive it from the dev_t when we need it.
|
118945 |
15-Aug-2003 |
phk |
Make swaponvp() static to the swap_pager.
|
118931 |
15-Aug-2003 |
alc |
Extend the scope of the page queues lock in vm_pageout_scan() to cover the traversal of the PQ_INACTIVE queue.
|
118878 |
13-Aug-2003 |
alc |
Remove GIANT_REQUIRED from vmspace_alloc().
|
118852 |
13-Aug-2003 |
alc |
Reduce the size of the vm map (and by inclusion the vm space) on 64-bit architectures by moving a field within the structure.
|
118848 |
12-Aug-2003 |
imp |
Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's copyrighted files.
Approved by: Matt Dillon
|
118838 |
12-Aug-2003 |
alc |
Reduce the size of the vm object on 64-bit architectures by moving a field within the structure.
|
118795 |
11-Aug-2003 |
bmilekic |
- When deciding whether to init the zone with small_init or large_init, compare the zone element size (+1 for the byte of linkage) against UMA_SLAB_SIZE - sizeof(struct uma_slab), and not just UMA_SLAB_SIZE. Add a KASSERT in zone_small_init to make sure that the computed ipers (items per slab) for the zone is not zero, despite the addition of the check, just to be sure (this part submitted by: silby)
- UMA_ZONE_VM used to imply BUCKETCACHE. Now it implies CACHEONLY instead. CACHEONLY is like BUCKETCACHE in the case of bucket allocations, but in addition to that also ensures that we don't setup the zone with OFFPAGE slab headers allocated from the slabzone. This means that we're not allowed to have a UMA_ZONE_VM zone initialized for large items (zone_large_init) because it would require the slab headers to be allocated from slabzone, and hence kmem_map. Some of the zones init'd with UMA_ZONE_VM are so init'd before kmem_map is suballoc'd from kernel_map, which is why this change is necessary.
|
118771 |
11-Aug-2003 |
bms |
Add the mlockall() and munlockall() system calls. - All those diffs to syscalls.master for each architecture *are* necessary. This needed clarification; the stub code generation for mlockall() was disabled, which would prevent applications from linking to this API (suggested by mux) - Giant has been quoshed. It is no longer held by the code, as the required locking has been pushed down within vm_map.c. - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES to express their intention explicitly. - Inspected at the vmstat, top and vm pager sysctl stats level. Paging-in activity is occurring correctly, using a test harness. - The RES size for a process may appear to be greater than its SIZE. This is believed to be due to mappings of the same shared library page being wired twice. Further exploration is needed. - Believed to back out of allocations and locks correctly (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).
PR: kern/43426, standards/54223 Reviewed by: jake, alc Approved by: jake (mentor) MFC after: 2 weeks
|
118764 |
11-Aug-2003 |
silby |
More pipe changes:
From alc: Move pageable pipe memory to a seperate kernel submap to avoid awkward vm map interlocking issues. (Bad explanation provided by me.)
From me: Rework pipespace accounting code to handle this new layout, and adjust our default values to account for the fact that we now have a solid limit on allocations.
Also, remove the "maxpipes" limit, as it no longer has a purpose. (The limit on kva usage solves the problem of having two many pipes.)
|
118544 |
06-Aug-2003 |
phk |
Make the first two pages magic to protect the BSD labels rather than only one.
|
118537 |
06-Aug-2003 |
phk |
Remove an unused variable.
|
118536 |
06-Aug-2003 |
phk |
Staticize swap_pager_putpages()
Eliminate a lot of checkes to make sure requests are not cross-device which is unnecessary with the new layout. We know a sequential request cannot possibly be cross-device because there is a reserved page between the devices.
Remove a couple of comments which no longer are relevant.
|
118535 |
06-Aug-2003 |
phk |
Access the swap_pagers' ->putpages() through swappagerops instead of directly, this is a cleaner way to do it.
|
118528 |
06-Aug-2003 |
phk |
Add XXX: comment to vm_pager_unswapped().
|
118527 |
06-Aug-2003 |
phk |
Explicitly set B_PAGING
|
118521 |
06-Aug-2003 |
phk |
Rip out the totally bogos vnode swapdev_vp with extreeme prejudice.
Don't mark buffers with B_KEEPGIANT, we don't drop giant in strategy at this point in time.
|
118468 |
05-Aug-2003 |
phk |
Use sparse struct initialization for struct pagerops.
Mark our buffers B_KEEPGIANT before sending them downstream.
Remove swap_pager_strategy implementation.
|
118466 |
05-Aug-2003 |
phk |
Use sparse struct initializations for struct pagerops.
This makes grepping for which pagers implement which methods easier.
|
118418 |
04-Aug-2003 |
phk |
Put an uncovered page between the swap devices, that way we can be sure to not get any cross-device I/O requests. (The unallocated first page protecting BSD labels already gave us this, but that hack may go away at some point in time).
Remove the check for cross-device I/O requests in swap_pager_strategy.
Move the repeated statistics updating into flushchainbuf().
|
118413 |
04-Aug-2003 |
alc |
Use kmem_alloc_nofault() instead of kmem_alloc_pageable() to allocate swapbkva. Swapbkva mappings are explicitly managed using pmap_qenter(), not on-demand by vm_fault(), making kmem_alloc_nofault() more appropriate.
Submitted by: tegge
|
118398 |
03-Aug-2003 |
phk |
Name swap_pager_find_dev() more correctly swp_pager_finde_dev().
Use ->bio_children to count child buffers, rather than abuse the bio_caller1 pointer.
Expand the relevant bits of waitchainbuf() inline, this clarifies the code a little bit.
|
118392 |
03-Aug-2003 |
phk |
I accidentally hit undo before committing, fix the resulting off-by-one.
|
118390 |
03-Aug-2003 |
phk |
Change the layout policy of the swap_pager from a hardcoded width striping to a per device round-robin algorithm.
Because of the policy of not attempting to retain previous swap allocation on page-out, this means that a newly added swap device almost instantly takes its 1/N share of the I/O load but it takes somewhat longer for it to assume it's 1/N share of the pages if there is plenty of space on the other devices.
Change the 8G total swapspace limitation to 8G per device instead by using a per device blist rather than one global blist. This reduces the memory footprint by 75% (typically a couple hundred kilobytes) for the common case with one swapdevice but NSWAPDEV=4.
Remove the compile time constant limit of number of swap devices, there is no limit now. Instead of a fixed size array, store the per swapdev structure in a TAILQ.
Total swap space is still addressed by a 32 bit page number and therefore the upper limit is now 2^42 bytes = 16TB (for i386).
We still do not allocate the first page of each device in order to give some amount of protection to any bsdlabel at the start of the device.
A new device is appended after the existing devices in the swap space, no attempt is made to fill in holes left behind by swapoff (this can trivially be changed should it ever become a problem).
The sysctl vm.nswapdev now reflects the number of currently configured swap devices.
Rename vm_swap_size to swap_pager_avail for consistency with other exported names.
Change argument type for vm_proc_swapin_all() and swap_pager_isswapped() to be a struct swdevt pointer rather than an index.
Not changed: we are still using blists to manage the free space, but since the swapspace is no longer fragmented by the striping different resource managers might fare better.
|
118384 |
03-Aug-2003 |
phk |
Move extern declaration of the various pagerops from vm_pager.c to vm_pager.h where the various pagers will also see them.
|
118380 |
03-Aug-2003 |
alc |
Revise obj_alloc(). Most notably, use the object's lock to prevent two concurrent invocations from acquiring the same address(es). Also, in case of an incomplete allocation, free any allocated pages.
In collaboration with: tegge
|
118369 |
02-Aug-2003 |
bmilekic |
When INVARIANTS is on and we're in uma_zalloc_free(), we need to make sure that uma_dbg_free() is called if we're about to call uma_zfree_internal() but we're asking it to skip the dtor and uma_dbg_free() call itself. So, if we're about to call uma_zfree_internal() from uma_zfree_arg() and skip == 1, call uma_dbg_free() ourselves.
|
118317 |
01-Aug-2003 |
alc |
Update the comment at the head of kmem_alloc_nofault() to describe its purpose and use.
|
118315 |
01-Aug-2003 |
bmilekic |
Only free the pcpu cache buckets if they are non-NULL.
Crashed this person's machine: harti Pointy-hat to: me
|
118286 |
31-Jul-2003 |
phk |
Remove unused stuff.
Move used stuff to swap_pager.c where it belongs.
This file no longer exports anything to userland.
|
118234 |
31-Jul-2003 |
peter |
Add #include "opt_kstack_pages.h" and "opt_kstack_max_pages.h" to remain in sync with the backend machdep code. When cpu_thread_init() does not have the same idea of KSTACK_PAGES as the thing that created the kstack, all hell breaks loose.
Bad alc! no cookie! :-)
|
118221 |
30-Jul-2003 |
bmilekic |
Plug a race and a leak in UMA.
1) The race has to do with zone destruction. From the zone destructor we would lock the zone, set the working set size to 0, then unlock the zone, drain it, and then free the structure. Within the window following the working-set-size set to 0 and unlocking of the zone and the point where in zone_drain we re-acquire the zone lock, the uma timer routine could have fired off and changed the working set size to something non-zero, thereby potentially preventing us from completely freeing slabs before destroying the zone (and thus leaking them).
2) The leak has to do with zone destruction as well. When destroying a zone we would take care to free all the buckets cached in the zone, but although we would drain the pcpu cache buckets, we would not free them. This resulted in leaking a couple of bucket structures (512 bytes each) per cpu on SMP during zone destruction.
While I'm here, also silence GCC warnings by turning uma_slab_alloc() from inline to real function. It's too big to be an inline.
Reviewed by: JeffR
|
118212 |
30-Jul-2003 |
bmilekic |
When generating the zone stats make sure to handle the master zone ("UMA Zone") carefully, because it does not have pcpu caches allocated at all. In the UP case, we did not catch this because one pcpu cache is always allocated with the zone, but for the MP case, we were getting bogus stats for this zone.
Tested by: Lukas Ertl <le@univie.ac.at>
|
118201 |
30-Jul-2003 |
phk |
Remove the disabling of buckets workaround.
Thanks to: jeffr
|
118190 |
30-Jul-2003 |
jeff |
- Get rid of the ill-conceived uz_cachefree member of uma_zone. - In sysctl_vm_zone use the per cpu locks to read the current cache statistics this makes them more accurate while under heavy load.
Submitted by: tegge
|
118189 |
30-Jul-2003 |
jeff |
- Check to see if we need a slab prior to allocating one. Failure to do so not only wastes memory but it can also cause a leak in zones that will be destroyed later. The problem is that the slab allocation code places newly created slabs on the partially allocated list because it assumes that the caller will actually allocate some memory from it. Failure to do so places an otherwise free slab on the partial slab list where we wont find it later in zone_drain().
Continuously prodded to fix by: phk (Thanks)
|
118187 |
29-Jul-2003 |
phk |
Temporary workaround: Always disable buckets, there is a bug there somewhere.
JeffR will look at this as soon as he has time.
OK'ed by: jeffr
|
118104 |
28-Jul-2003 |
alc |
None of the "alloc" functions used by UMA assume that Giant is held any longer. (If they still need it, e.g., contigmalloc(), they acquire it themselves.) Therefore, we need not acquire Giant in slab_zalloc().
|
118096 |
27-Jul-2003 |
alc |
Remove GIANT_REQUIRED from kmem_alloc().
|
118076 |
27-Jul-2003 |
mux |
Use pmap_zero_page() to zero pages instead of bzero() because they haven't been vm_map_wire()'d yet.
|
118074 |
27-Jul-2003 |
alc |
Allow vm_object_reference() on kernel_object without Giant.
|
118071 |
26-Jul-2003 |
alc |
Acquire Giant rather than asserting it is held in contigmalloc(). This is a prerequisite to removing further uses of Giant from UMA.
|
118047 |
26-Jul-2003 |
phk |
Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
|
118040 |
26-Jul-2003 |
alc |
Gulp ... call kmem_malloc() without Giant.
|
118029 |
25-Jul-2003 |
mux |
Add support for the M_ZERO flag to contigmalloc().
Reviewed by: jeff
|
117903 |
22-Jul-2003 |
phk |
Remove all but one of the inlines here, this reduces the code size by 2032 bytes and has no measurable impact on performance.
|
117876 |
22-Jul-2003 |
phk |
Don't inline very large functions.
Gcc has silently not been doing this for a long time.
|
117866 |
22-Jul-2003 |
peter |
swp_pager_hash() was called before it was instantiated inline. This made gcc (quite rightly) unhappy. Move it earlier.
|
117747 |
18-Jul-2003 |
phk |
Fix a printf format warning I introduced. Use the macro max number of swap devices rather than cache the constant in a variable. Avoid a (now) pointless variable.
|
117736 |
18-Jul-2003 |
harti |
When INVARIANTS is defined make sure that uma_zalloc_arg (and hence uma_zalloc) is called with exactly one of either M_WAITOK or M_NOWAIT and that it is called with neither M_TRYWAIT or M_DONTWAIT. Print a warning if anything is wrong. Default to M_WAITOK of no flag is given. This is the same test as in malloc(9).
|
117725 |
18-Jul-2003 |
phk |
If a proposed swap device exceeds the 8G artificial limit which out radix-tree code imposes, truncate the device instead of rejecting it.
|
117724 |
18-Jul-2003 |
phk |
Move the implementation of the vmspace_swap_count() (used only in the "toss the largest process" emergency handling) from vm_map.c to swap_pager.c.
The quantity calculated depends strongly on the internals of the swap_pager and by moving it, we no longer need to expose the internal metrics of the swap_pager to the world.
|
117723 |
18-Jul-2003 |
phk |
Add a new function swap_pager_status() which reports the total size of the paging space and how much of it is in use (in pages).
Use this interface from the Linuxolator instead of groping around in the internals of the swap_pager.
|
117722 |
18-Jul-2003 |
phk |
Merge swap_pager.c and vm_swap.c into swap_pager.c, the separation is not natural and needlessly exposes a lot of dirty laundry.
Move private interfaces between the two from swap_pager.h to swap_pager.c and staticize as much as possible.
No functional change.
|
117702 |
17-Jul-2003 |
phk |
Make sure that SWP_NPAGES always has the same value in all source files, so that SWAP_META_PAGES does not vary either.
swap_pager.c ended up with a value of 16, everybody else 8. Go with the 16 for now.
This should only have any effect in the "kill processes because we are out of swap" scenario, where it will make some sort of estimate of something more precise.
|
117519 |
13-Jul-2003 |
robert |
Avoid an unnecessary calculation: there is no need to subtract `firstaddr' from `v' if we know that the former equals zero.
|
117303 |
07-Jul-2003 |
alc |
- Complete the vm object locking in vm_pageout_object_deactivate_pages(). - Change vm_pageout_object_deactivate_pages()'s first parameter from a vm_map_t to a pmap_t. - Change vm_pageout_object_deactivate_pages()'s and vm_pageout_map_deactivate_pages()'s last parameter from a vm_pindex_t to a long. Since the number of pages in an address space doesn't require 64 bits on an i386, vm_pindex_t is overkill.
|
117262 |
05-Jul-2003 |
alc |
Lock a vm object when freeing a page from it.
|
117224 |
04-Jul-2003 |
phk |
Remove unnecessary cast.
|
117206 |
03-Jul-2003 |
alc |
Background: pmap_object_init_pt() premaps the pages of a object in order to avoid the overhead of later page faults. In general, it implements two cases: one for vnode-backed objects and one for device-backed objects. Only the device-backed case is really machine-dependent, belonging in the pmap.
This commit moves the vnode-backed case into the (relatively) new function vm_map_pmap_enter(). On amd64 and i386, this commit only amounts to code rearrangement. On alpha and ia64, the new machine independent (MI) implementation of the vnode case is smaller and more efficient than their pmap-based implementations. (The MI implementation takes advantage of the fact that objects in -CURRENT are ordered collections of pages.) On sparc64, pmap_object_init_pt() hadn't (yet) been implemented.
|
117143 |
02-Jul-2003 |
mux |
Fix a few style(9) nits.
|
117094 |
01-Jul-2003 |
alc |
Modify vm_page_alloc() and vm_page_select_cache() to allow the page that is returned by vm_page_select_cache() to belong to the object that is already locked by the caller to vm_page_alloc().
|
117093 |
01-Jul-2003 |
alc |
Check the address provided to vm_map_stack() against the vm map's maximum, returning an error if the address is too high.
|
117047 |
29-Jun-2003 |
alc |
Introduce vm_map_pmap_enter(). Presently, this is a stub calling the MD pmap_object_init_pt().
|
117045 |
29-Jun-2003 |
alc |
- Export pmap_enter_quick() to the MI VM. This will permit the implementation of a largely MI pmap_object_init_pt() for vnode-backed objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64 and powerpc. - Correct a mismatch between pmap_object_init_pt()'s prototype and its various implementations. (I plan to keep pmap_object_init_pt() as the MD hook for device-backed objects on i386 and amd64.) - Correct an error in ia64's pmap_enter_quick() and adjust its interface to match the other versions. Discussed with: marcel
|
117038 |
29-Jun-2003 |
alc |
Add vm object locking to vm_pageout_map_deactivate_pages().
|
117004 |
28-Jun-2003 |
alc |
Remove GIANT_REQUIRED from kmem_malloc().
|
117001 |
28-Jun-2003 |
alc |
- Add vm object locking to vm_pageout_clean().
|
116959 |
28-Jun-2003 |
alc |
- Use an int rather than a vm_pindex_t to represent the desired page color in vm_page_alloc(). (This also has small performance benefits.) - Eliminate vm_page_select_free(); vm_page_alloc() might as well call vm_pageq_find() directly.
|
116923 |
27-Jun-2003 |
alc |
Simple read-modify-write operations on a vm object's flags, ref_count, and shadow_count can now rely on its mutex for synchronization. Remove one use of Giant from vm_map_insert().
|
116885 |
26-Jun-2003 |
alc |
vm_page_select_cache() enforces a number of conditions on the returned page. Add the ability to lock the containing object to those conditions.
|
116860 |
26-Jun-2003 |
alc |
Modify vm_pageq_requeue() to handle a PQ_NONE page without dereferencing a NULL pointer; remove some now unused code.
|
116837 |
25-Jun-2003 |
bmilekic |
Move the pcpu lock out of the uma_cache and instead have a single set of pcpu locks. This makes uma_zone somewhat smaller (by (LOCKNAME_LEN * sizeof(char) + sizeof(struct mtx) * maxcpu) bytes, to be exact).
No Objections from jeff.
|
116829 |
25-Jun-2003 |
bmilekic |
Make sure that the zone destructor doesn't get called twice in certain free paths.
|
116799 |
25-Jun-2003 |
alc |
Remove a GIANT_REQUIRED on the kernel object that we no longer need.
|
116798 |
25-Jun-2003 |
alc |
Maintain the lock on a vm object when calling vm_page_grab().
|
116793 |
24-Jun-2003 |
alc |
Assert that the vm object is locked on entry to dev_pager_getpages().
|
116710 |
23-Jun-2003 |
alc |
Assert that the vm object is locked on entry to vm_pager_get_pages().
|
116695 |
22-Jun-2003 |
alc |
Maintain a lock on the vm object of interest throughout vm_fault(), releasing the lock only if we are about to sleep (e.g., vm_pager_get_pages() or vm_pager_has_pages()). If we sleep, we have marked the vm object with the paging-in-progress flag.
|
116678 |
22-Jun-2003 |
phk |
Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for stuff like the f*() functions.
By giving the vnode a speparate field, a number of checks for the specific subtype can be replaced simply with a check for f_vnode != NULL, and we can later free f_data up to subtype specific use.
At this point in time, f_data still points to the vnode, so any code I might have overlooked will still work.
|
116667 |
22-Jun-2003 |
alc |
As vm_fault() descends the chain of backing objects, set paging-in- progress on the next object before clearing it on the current object.
|
116662 |
22-Jun-2003 |
alc |
Complete the vm object locking in vm_object_backing_scan(); specifically, deal with the case where we need to sleep on a busy page with two vm object locks held.
|
116658 |
22-Jun-2003 |
alc |
Make some style and white-space changes to the copy-on-write path through vm_fault(); remove a pointless assignment statement from that path.
|
116653 |
21-Jun-2003 |
phk |
Use a do {...} while (0); and a couple of breaks to reduce the level of indentation a bit.
|
116650 |
21-Jun-2003 |
alc |
Lock one of the vm objects involved in an optimized copy-on-write fault.
|
116645 |
21-Jun-2003 |
alc |
- Increase the scope of the vm object lock in vm_object_collapse(). - Assert that the vm object and its backing vm object are both locked in vm_object_qcollapse().
|
116629 |
20-Jun-2003 |
alc |
Make swap_pager_haspages() static; remove unused function prototypes.
|
116605 |
20-Jun-2003 |
phk |
Initialize b_saveaddr when we hand out pbufs
|
116596 |
20-Jun-2003 |
alc |
The so-called "optimized copy-on-write fault" case should not require the vm map lock. What's really needed is vm object locking, which is (for the moment) provided Giant.
Reviewed by: tegge
|
116554 |
19-Jun-2003 |
alc |
Assert that the vm object is locked in vm_page_try_to_free().
|
116552 |
19-Jun-2003 |
alc |
Fix a vm object reference leak in the page-based copy-on-write mechanism used by the zero-copy sockets implementation.
Reviewed by: gallatin
|
116512 |
18-Jun-2003 |
alc |
Lock the vm object when freeing a vm page.
|
116437 |
16-Jun-2003 |
phk |
This file was ignored by CVS in my last commit for some reason:
Remove pointless initialization of b_spc field, which now no longer exists.
|
116412 |
15-Jun-2003 |
phk |
Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations to check that the buffer points to the correct vnode.
|
116387 |
15-Jun-2003 |
alc |
Remove an unnecessary forward declaration.
|
116359 |
15-Jun-2003 |
alc |
Use #ifdef __alpha__, not __alpha.
|
116355 |
14-Jun-2003 |
alc |
Migrate the thread stack management functions from the machine-dependent to the machine-independent parts of the VM. At the same time, this introduces vm object locking for the non-i386 platforms.
Two details:
1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The different machine-dependent implementations used various combinations of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set KSTACK_GUARD_PAGES to 0.
2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In 5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed to vm_page_alloc() or vm_page_grab().
|
116328 |
14-Jun-2003 |
alc |
Move the *_new_altkstack() and *_dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.
|
116280 |
13-Jun-2003 |
alc |
Extend the scope of the vm object lock in swp_pager_async_iodone() to cover a vm_page_free().
|
116279 |
13-Jun-2003 |
alc |
Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.
|
116226 |
11-Jun-2003 |
obrien |
Use __FBSDID().
|
116188 |
11-Jun-2003 |
peter |
GC unused cpu_wait() function
|
116167 |
10-Jun-2003 |
alc |
- Finish vm object and page locking in vnode_pager_setsize(). - Make some small style changes to vnode_pager_setsize(); most notably, move two comments to a more logical place.
|
116131 |
09-Jun-2003 |
phk |
Revert last commit, I have no idea what happened.
|
116117 |
09-Jun-2003 |
phk |
A white-space nit I noticed.
|
116080 |
09-Jun-2003 |
alc |
Hold the vm object's lock when performing vm_page_lookup().
|
116079 |
09-Jun-2003 |
alc |
Don't use vm_object_set_flag() to initialize the vm object's flags.
|
116067 |
08-Jun-2003 |
alc |
- Properly handle the paging_in_progress case on two vm objects in vm_object_deallocate(). - Remove vm_object_pip_sleep().
|
115997 |
07-Jun-2003 |
alc |
Lock the kernel object in kmem_alloc().
|
115996 |
07-Jun-2003 |
alc |
Teach vm_page_grab() how to handle the vm object's lock.
|
115987 |
07-Jun-2003 |
alc |
Assert that the vm object is locked on entry to swap_pager_freespace().
|
115931 |
07-Jun-2003 |
alc |
Pass the vm object to vm_object_collapse() with its lock held.
|
115883 |
05-Jun-2003 |
phk |
Fix NFS file swapping, I broke it 3 months ago it seems.
|
115879 |
05-Jun-2003 |
alc |
- Extend the scope of the backing object's lock in vm_object_collapse().
|
115856 |
04-Jun-2003 |
alc |
- Add further vm object locking to vm_object_deallocate(), specifically, for accessing a vm object's shadows.
|
115853 |
04-Jun-2003 |
alc |
- Add VM_OBJECT_TRYLOCK().
|
115818 |
04-Jun-2003 |
alc |
- Add vm object locking to vm_object_deallocate(). (Still more changes are required.) - Remove special-case macros for kmem object locking. They are no longer used.
|
115782 |
03-Jun-2003 |
alc |
Add vm object locking to vm_object_coalesce().
|
115655 |
01-Jun-2003 |
alc |
Change kernel_object and kmem_object to (&kernel_object_store) and (&kmem_object_store), respectively. This allows the address of these objects to be resolved at link-time rather than run-time.
|
115523 |
31-May-2003 |
phk |
Prepend _ to internal union members to avoid ambiguity.
Found by: FlexeLint
|
115522 |
31-May-2003 |
phk |
Remove unused variables
Found by: FlexeLint
|
115516 |
31-May-2003 |
alc |
Add vm object locking to vm_object_madvise().
|
115146 |
19-May-2003 |
das |
If we seem to be out of VM, don't allow the pagedaemon to kill processes in the first pass. Among other things, this will give us a chance to launder vnode-backed pages before concluding that we need more swap. This is particularly useful for systems that have no swap.
While here, update a comment and remove some long-unused code.
Reported by: Lucky Green <shamrock@cypherpunks.to> Suggested by: dillon Approved by: re (rwatson)
|
115127 |
18-May-2003 |
alc |
Reduce the size of a vm object by converting its shadow list from a TAILQ to a LIST.
Approved by: re (rwatson)
|
114983 |
13-May-2003 |
jhb |
- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe.
Reviewed by: arch@ Approved by: re (rwatson)
|
114850 |
09-May-2003 |
alc |
Give the kmem object's mutex a unique name, instead of "vm object", to avoid false reports of lock-order reversal with a system map mutex.
Approved by: re (jhb)
|
114774 |
06-May-2003 |
alc |
Lock the vm_object when performing vm_pager_deallocate().
|
114669 |
04-May-2003 |
alc |
Extend the scope of the vm_object lock in vm_object_terminate().
|
114649 |
04-May-2003 |
alc |
Avoid a lock-order reversal and implement vm_object locking in vm_pageout_page_free().
|
114599 |
03-May-2003 |
alc |
Lock the vm_object on entry to vm_object_vndeallocate().
|
114570 |
03-May-2003 |
alc |
- Revert kern/vfs_subr.c revision 1.444. The vm_object's size isn't trustworthy for vnode-backed objects. - Restore the old behavior of vm_object_page_remove() when the end of the given range is zero. Add a comment to vm_object_page_remove() regarding this behavior.
Reported by: iedowse
|
114564 |
03-May-2003 |
alc |
Move a declaration to its proper place.
|
114489 |
02-May-2003 |
alc |
Lock the vm_object when updating its shadow list.
|
114487 |
02-May-2003 |
alc |
Simplify the removal of a shadow object in vm_object_collapse().
|
114387 |
01-May-2003 |
alc |
Extend the scope of the vm_object locking in vm_object_split().
|
114372 |
01-May-2003 |
alc |
- Update the vm_object locking in vm_object_reference(). - Convert some dead code in vm_object_reference() into a comment.
|
114317 |
30-Apr-2003 |
alc |
Increase the scope of the vm_object lock in vm_map_delete().
|
114273 |
30-Apr-2003 |
alc |
Eliminate an unused parameter from vm_pageout_object_deactivate_pages().
|
114263 |
30-Apr-2003 |
alc |
Add vm_object locking to vmspace_swap_count().
|
114245 |
29-Apr-2003 |
alc |
Remove unused declarations and definitions.
|
114216 |
29-Apr-2003 |
kan |
Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h>
Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
114166 |
28-Apr-2003 |
alc |
- Lock the vm_object when performing swap_pager_isswapped(). - Assert that the vm_object is locked in swap_pager_isswapped().
|
114149 |
28-Apr-2003 |
alc |
uma_zone_set_obj() must perform VM_OBJECT_LOCK_INIT() if the caller provides storage for the vm_object.
|
114145 |
28-Apr-2003 |
alc |
- Define VM_OBJECT_LOCK_INIT(). - Avoid repeatedly mtx_init()ing and mtx_destroy()ing the vm_object's lock using UMA's uminit callback, in this case, vm_object_zinit().
|
114128 |
27-Apr-2003 |
alc |
- Tell witness that holding two or more vm_object locks is okay. - In vm_object_deallocate(), lock the child when removing the parent from the child's shadow list.
|
114112 |
27-Apr-2003 |
alc |
Various changes to vm_object_shadow(): (1) update the vm_object locking, (2) remove a pointless assertion, and (3) make a trivial change to a comment.
|
114091 |
26-Apr-2003 |
alc |
Various changes to vm_object_page_remove(): - Eliminate an odd, special-case feature: if start == end == 0 then all pages are removed. Only one caller used this feature and that caller can trivially pass the object's size. - Assert that the vm_object is locked on entry; don't bother testing for a NULL vm_object. - Style: Fix lines that are longer than 80 characters.
|
114080 |
26-Apr-2003 |
alc |
- Lock the vm_object on entry to vm_object_terminate().
|
114074 |
26-Apr-2003 |
alc |
- Convert vm_object_pip_wait() from using tsleep() to msleep(). - Make vm_object_pip_sleep() static. - Lock the vm_object when performing vm_object_pip_wait().
|
114053 |
26-Apr-2003 |
alc |
- Extend the scope of two existing vm_object locks to cover swap_pager_freespace().
|
114052 |
26-Apr-2003 |
alc |
Remove an XXX comment. It is no longer a problem.
|
114030 |
25-Apr-2003 |
jhb |
- Don't bother using the proc lock to test just P_SYSTEM as that is set in fork1() and never changes. - The proc lock is enough to cover reading p_state, so push down sched_lock into the PRS_NORMAL case of the switch on p_state.
|
114019 |
25-Apr-2003 |
alc |
- Lock the vm_object when iterating over its list of resident pages.
|
114003 |
25-Apr-2003 |
alc |
- Relax the Giant required in vm_page_remove(). - Remove the Giant required from vm_page_free_toq(). (Any locking errors will be caught by vm_page_remove().)
This remedies a panic that occurred when kmem_malloc(NOWAIT) performed without Giant failed to allocate the necessary pages.
Reported by: phk
|
113956 |
24-Apr-2003 |
alc |
- Move swap_pager_isswapped()'s prototype to a more logical place.
|
113955 |
24-Apr-2003 |
alc |
- Acquire the vm_object's lock when performing vm_object_page_clean(). - Add a parameter to vm_pageout_flush() that tells vm_pageout_flush() whether its caller has locked the vm_object. (This is a temporary measure to bootstrap vm_object locking.)
|
113918 |
23-Apr-2003 |
jhb |
Fix compiling in the NO_SWAPPING case.
Submitted by: bde (partially)
|
113869 |
22-Apr-2003 |
jhb |
Lock the proc to check p_flag and several other related tests in vm_daemon(). We don't need to hold sched_lock as long now as a result.
|
113868 |
22-Apr-2003 |
jhb |
Prefer the proc lock to sched_lock when testing PS_INMEM now that it is safe to do so.
|
113867 |
22-Apr-2003 |
jhb |
- Always call faultin() in _PHOLD() if PS_INMEM is clear. This closes a race where a thread could assume that a process was swapped in by PHOLD() when it actually wasn't fully swapped in yet. - In faultin(), always msleep() if PS_SWAPPINGIN is set instead of doing this check after bumping p_lock in the PS_INMEM == 0 case. Also, sched_lock is only needed for setting and clearning swapping PS_* flags and the swap thread inhibitor. - Don't set and clear the thread swap inhibitor in the same loops as the pmap_swapin/out_thread() since we have to do it under sched_lock. Instead, mimic the treatment of the PS_INMEM flag and use separate loops to set the inhibitors when clearing PS_INMEM and clear the inhibitors when setting PS_INMEM. - swapout() now returns with the proc lock held as it holds the lock while adjusting the swapping-related PS_* flags so that the proc lock can be used to test those flags. - Only use the proc lock to check the swapping-related PS_* flags in several places. - faultin() no longer requires sched_lock to be held by callers. - Rename PS_SWAPPING to PS_SWAPPINGOUT to be less ambiguous now that we have PS_SWAPPINGIN.
|
113856 |
22-Apr-2003 |
alc |
Revision 1.246 should have also included
- Weaken the assertion in vm_page_insert() to require Giant only if the vm_object isn't locked.
Reported by: "Ilmar S. Habibulin" <ilmar@watson.org>
|
113842 |
22-Apr-2003 |
alc |
Remove unused declarations.
|
113841 |
22-Apr-2003 |
alc |
Revision 1.52 of vm/uma_core.c has led to UMA's obj_alloc() being called without Giant; and obj_alloc() in turn calls vm_page_alloc() without Giant. This causes an assertion failure in vm_page_alloc(). Fortunately, obj_alloc() is now MPSAFE. So, we need only clean up some assertions.
- Weaken the assertion in vm_page_lookup() to require Giant only if the vm_object isn't locked. - Remove an assertion from vm_page_alloc() that duplicates a check performed in vm_page_lookup().
In collaboration with: gallatin, jake, jeff
|
113838 |
22-Apr-2003 |
alc |
Add VM_OBJECT_LOCKED().
|
113791 |
21-Apr-2003 |
alc |
- Assert that the vm_object is locked in vm_object_clear_flag(), vm_object_pip_add() and vm_object_pip_wakeup(). - Remove GIANT_REQUIRED from vm_object_pip_subtract() and vm_object_pip_subtract(). - Lock the vm_object when performing vm_object_page_remove().
|
113775 |
20-Apr-2003 |
alc |
- Lock the vm_object when performing either vm_object_clear_flag() or vm_object_pip_wakeup().
|
113768 |
20-Apr-2003 |
alc |
- Update the vm_object locking in vm_map_insert().
|
113765 |
20-Apr-2003 |
alc |
- Lock the vm_object when performing vm_object_pip_wakeup(). - Merge two identical cases in a switch statement.
|
113761 |
20-Apr-2003 |
alc |
- Lock the vm_object when performing vm_object_pip_wakeup().
|
113744 |
20-Apr-2003 |
alc |
- Lock the vm_object when performing vm_object_pip_add(). - Remove an unnecessary variable.
|
113740 |
20-Apr-2003 |
alc |
Update vm_object locking in vm_map_delete().
|
113739 |
20-Apr-2003 |
alc |
- Lock the vm_object when performing vm_object_pip_add().
|
113722 |
19-Apr-2003 |
alc |
- Lock the vm_object when performing vm_object_pip_subtract(). - Assert that the vm_object lock is held in vm_object_pip_subtract().
|
113721 |
19-Apr-2003 |
alc |
- Lock the vm_object when performing vm_object_pip_wakeupn(). - Assert that the vm_object lock is held in vm_object_pip_wakeupn(). - Add a new macro VM_OBJECT_LOCK_ASSERT().
|
113701 |
19-Apr-2003 |
alc |
o Update locking around vm_object_page_remove() in vm_map_clean() to use the new macros. o Remove unnecessary increment and decrement of the vm_object's reference count in vm_map_clean().
|
113699 |
19-Apr-2003 |
alc |
Lock the vm_object in obj_alloc().
|
113671 |
18-Apr-2003 |
alc |
Update locking around vm_object_page_remove() to use the new macros.
|
113665 |
18-Apr-2003 |
gallatin |
Don't grab Giant in slab_zalloc() if M_NOWAIT is specified. This should allow the use of INTR_MPSAFE network drivers.
Tested by: njl Glanced at by: jeff
|
113639 |
17-Apr-2003 |
jhb |
suser() does not need the proc lock, just the setting of P_PROTECTED in p_flag needs the lock.
|
113603 |
17-Apr-2003 |
trhodes |
Add some tunable descriptions.
Submitted by: hmp Discussed with: bde
|
113600 |
17-Apr-2003 |
trhodes |
Pre-content whitespace commit.
Discussed with: bde
|
113489 |
15-Apr-2003 |
alc |
Update locking on the kmem_object to use the new macros.
|
113458 |
14-Apr-2003 |
alc |
Update locking on the kernel_object to use the new macros.
|
113457 |
13-Apr-2003 |
alc |
Lock some manipulations of the vm object's flags.
|
113449 |
13-Apr-2003 |
alc |
Lock some manipulations of the vm object's flags.
|
113448 |
13-Apr-2003 |
alc |
Lock some manipulations of the vm object's flags.
|
113445 |
13-Apr-2003 |
alc |
Add new macros for locking and unlocking a vm object.
|
113419 |
13-Apr-2003 |
alc |
Permit vm_object_pip_add() and vm_object_pip_wakeup() on the kmem_object without Giant held.
|
113418 |
13-Apr-2003 |
alc |
Eliminate unnecessary gotos from kmem_malloc().
|
113343 |
10-Apr-2003 |
jhb |
- Kill the pv_flags member of the alpha mdpage since it stop being used in rev 1.61 of pmap.c. - Now that pmap_page_is_free() is empty and since it is just a hack for the Alpha pmap, remove it.
|
113138 |
05-Apr-2003 |
alc |
Remove GIANT_REQUIRED from getpbuf(). Reviewed by: tegge
Reduce pbuf_mtx's scope in relpbuf(). Submitted by: tegge
|
113070 |
04-Apr-2003 |
des |
Rename a static variable to avoid future conflicts.
|
112881 |
31-Mar-2003 |
wes |
Add a facility allowing processes to inform the VM subsystem they are critical and should not be killed when pageout is looking for more memory pages in all the wrong places.
Reviewed by: arch@ Sponsored by: St. Bernard Software
|
112835 |
30-Mar-2003 |
mux |
The object type can't be OBJT_PHYS in vm_mmap().
Reviewed by: peter
|
112683 |
26-Mar-2003 |
tegge |
Obtain Giant before calling kmem_alloc without M_NOWAIT and before calling kmem_free if Giant isn't already held.
|
112569 |
25-Mar-2003 |
jake |
- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long.
Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms.
Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)
|
112390 |
19-Mar-2003 |
mux |
Remove an empty comment.
|
112367 |
18-Mar-2003 |
phk |
Including <sys/stdint.h> is (almost?) universally only to be able to use %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.
|
112329 |
17-Mar-2003 |
jake |
Subtract the memory that backs the vm_page structures from phys_avail after mapping it. This makes it possible to determine if a physical page has a backing vm_page or not.
|
112312 |
16-Mar-2003 |
jake |
Made the prototypes for pmap_kenter and pmap_kremove MD. These functions are machine dependent because they are not required to update the tlb when mappings are added or removed, and doing so is machine dependent. In addition, an implementation may require that pages mapped with pmap_kenter have a backing vm_page_t, which is not necessarily true of all physical pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the physical address in order to make this requirement clear.
|
112167 |
12-Mar-2003 |
das |
- When the VM daemon is out of swap space and looking for a process to kill, don't block on a map lock while holding the process lock. Instead, skip processes whose map locks are held and find something else to kill. - Add vm_map_trylock_read() to support the above.
Reviewed by: alc, mike (mentor)
|
111977 |
08-Mar-2003 |
ken |
Zero copy send and receive fixes:
- On receive, vm_map_lookup() needs to trigger the creation of a shadow object. To make that happen, call vm_map_lookup() with PROT_WRITE instead of PROT_READ in vm_pgmoveco().
- On send, a shadow object will be created by the vm_map_lookup() in vm_fault(), but vm_page_cowfault() will delete the original page from the backing object rather than simply letting the legacy COW mechanism take over. In other words, the new page should be added to the shadow object rather than replacing the old page in the backing object. (i.e. vm_page_cowfault() should not be called in this case.) We accomplish this by making sure fs.object == fs.first_object before calling vm_page_cowfault() in vm_fault().
Submitted by: gallatin, alc Tested by: ken
|
111937 |
06-Mar-2003 |
alc |
Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.
Discussed on: arch@
|
111936 |
05-Mar-2003 |
rwatson |
Provide a mac_check_system_swapoff() entry point, which permits MAC modules to authorize disabling of swap against a particular vnode.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
111883 |
04-Mar-2003 |
jhb |
Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().
|
111732 |
02-Mar-2003 |
phk |
NO_GEOM cleanup:
Use VOP_IOCTL(DIOCGMEDIASIZE) to check the size of a potential swap device instead of the cdevsw->d_psize() method.
|
111712 |
01-Mar-2003 |
alc |
Teach vm_page_sleep_if_busy() to release the vm_object lock before sleeping.
|
111467 |
25-Feb-2003 |
alc |
Fuse two #ifdefs with identical conditions.
|
111463 |
25-Feb-2003 |
jeff |
- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock.
Reviewed by: arch, mckusick
|
111462 |
25-Feb-2003 |
mux |
Cleanup of the d_mmap_t interface.
- Get rid of the useless atop() / pmap_phys_address() detour. The device mmap handlers must now give back the physical address without atop()'ing it. - Don't borrow the physical address of the mapping in the returned int. Now we properly pass a vm_offset_t * and expect it to be filled by the mmap handler when the mapping was successful. The mmap handler must now return 0 when successful, any other value is considered as an error. Previously, returning -1 was the only way to fail. This change thus accidentally fixes some devices which were bogusly returning errno constants which would have been considered as addresses by the device pager. - Garbage collect the poorly named pmap_phys_address() now that it's no longer used. - Convert all the d_mmap_t consumers to the new API.
I'm still not sure wheter we need a __FreeBSD_version bump for this, since and we didn't guarantee API/ABI stability until 5.1-RELEASE.
Discussed with: alc, phk, jake Reviewed by: peter Compile-tested on: LINT (i386), GENERIC (alpha and sparc64) Runtime-tested on: i386
|
111434 |
24-Feb-2003 |
alc |
In vm_page_dirty(), assert that the page is not in the free queue(s).
|
111119 |
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
110983 |
16-Feb-2003 |
alc |
Remove GIANT_REQUIRED from vm_pageq_remove().
|
110958 |
15-Feb-2003 |
alc |
Remove the acquisition and release of Giant around pmap_growkernel(). It's unnecessary for two reasons: (1) Giant is at present already held in such cases and (2) our various implementations of pmap_growkernel() look to be MP safe. (For example, for sparc64 the proof of (2) is trivial.)
|
110957 |
15-Feb-2003 |
alc |
Move kernel_vm_end's declaration to pmap.h; add a comment regarding the synchronization of access to kernel_vm_end.
|
110597 |
09-Feb-2003 |
alc |
Add a comment describing how pagedaemon_wakeup() should be used and synchronized.
Suggested by: tegge
|
110313 |
04-Feb-2003 |
phk |
Change a printf to also tell how many items were left in the zone.
|
110225 |
02-Feb-2003 |
alc |
- It's more accurate to say that vm_paging_needed() returns TRUE than a positive number. - In pagedaemon_wakeup(), set vm_pages_needed to 1 rather than incrementing it to accomplish the same.
|
110218 |
02-Feb-2003 |
alc |
- Convert vm_pageout()'s tsleep()s to msleep()s with the page queue lock.
|
110207 |
01-Feb-2003 |
alc |
- Remove (some) unnecessary explicit initializations to zero. - Style changes to vm_pageout(): declarations and white-space.
|
110204 |
01-Feb-2003 |
alc |
- Convert the tsleep()s in vm_wait() and vm_waitpfault() to msleep()s with the page queue lock. - Assert that the page queue lock is held in vm_page_free_wakeup().
|
109912 |
27-Jan-2003 |
alc |
Simplify vm_object_page_remove(): The object's memq is now ordered. The two cases that existed before for performance optimization purposes can be reduced to one.
|
109820 |
25-Jan-2003 |
alc |
Add MTX_DUPOK to the initialization of system map locks.
|
109630 |
21-Jan-2003 |
alfred |
use 'void *' instead of 'caddr_t' for useracc, kernacc, vslock and vsunlock.
|
109623 |
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
109587 |
20-Jan-2003 |
dillon |
Fix swapping to a file, it was broken when SPECSTRATEGY was introduced.
|
109572 |
20-Jan-2003 |
dillon |
Close the remaining user address mapping races for physical I/O, CAM, and AIO. Still TODO: streamline useracc() checks.
Reviewed by: alc, tegge MFC after: 7 days
|
109554 |
20-Jan-2003 |
alc |
- Hold the page queues lock around vm_page_hold(). - Assert that the page queues lock rather than Giant is held in vm_page_hold().
|
109548 |
20-Jan-2003 |
jeff |
- M_WAITOK is 0 and not a real flag. Test for this properly.
Submitted by: tmm Pointy hat to: jeff
|
109496 |
18-Jan-2003 |
obrien |
Rev 1.16 renamed VM_METER to VM_TOTAL. This is breaking 3rd-party apps. So add a VM_METER compat define.
Submitted by: Andy Fawcett <andy@athame.co.uk>
|
109342 |
16-Jan-2003 |
dillon |
Merge all the various copies of vm_fault_quick() into a single portable copy.
|
109223 |
14-Jan-2003 |
alc |
- Update vm_pageout_deficit using atomic operations. It's a simple counter outside the scope of existing locks. - Eliminate a redundant clearing of vm_pageout_deficit.
|
109216 |
14-Jan-2003 |
alc |
Make vm_pageout_page_free() static.
|
109205 |
13-Jan-2003 |
dillon |
It is possible for an active aio to prevent shared memory from being dereferenced when a process exits due to the vmspace ref-count being bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead of a process and call it in vmspace_dofree(). This way if it is missed in exit1()'s early-resource-free it will still be caught when the zombie is reaped.
Also fix a potential race in shmexit_myhook() by NULLing out vmspace->vm_shm prior to calling shm_delete_mapping() and free().
MFC after: 7 days
|
109198 |
13-Jan-2003 |
phk |
We can get past here on a normal vnode as well, so use VOP_STRATEGY if so.
|
109153 |
13-Jan-2003 |
dillon |
Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
|
109151 |
12-Jan-2003 |
alc |
Make vm_page_alloc() return PG_ZERO only if VM_ALLOC_ZERO is specified. The objective being to eliminate some cases of page queues locking. (See, for example, vm/vm_fault.c revision 1.160.)
Reviewed by: tegge
(Also, pointed out by tegge that I changed vm_fault.c before changing vm_page.c. Oops.)
|
109131 |
12-Jan-2003 |
alc |
vm_fault_copy_entry() needn't clear PG_ZERO because it didn't pass VM_ALLOC_ZERO to vm_page_alloc().
|
109123 |
12-Jan-2003 |
dillon |
Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
|
109114 |
11-Jan-2003 |
alc |
In vm_page_alloc(), fuse two if statements that are conditioned on the same expression.
|
109097 |
11-Jan-2003 |
dillon |
Make 'sysctl vm.vmtotal' work properly using updated patch from Hiten. (the patch in the PR was stale).
PR: kern/5689 Submitted by: Hiten Pandya <hiten@unixdaemons.com>
|
108963 |
08-Jan-2003 |
alc |
In vm_page_alloc(), honor VM_ALLOC_ZERO for system and interrupt class requests when the number of free pages is below the reserved threshold. Previously, VM_ALLOC_ZERO was only honored when the number of free pages was above the reserved threshold. Honoring it in all cases generally makes sense, does no harm, and simplifies the code.
|
108723 |
05-Jan-2003 |
phk |
Convert VOP_STRATEGY to VOP_SPECSTRATEGY in the generic getpages and the pager input for small filesystems.
|
108693 |
05-Jan-2003 |
alc |
Use atomic add and subtract to update the global wired page count, cnt.v_wire_count.
|
108686 |
04-Jan-2003 |
phk |
Temporarily introduce a new VOP_SPECSTRATEGY operation while I try to sort out disk-io from file-io in the vm/buffer/filesystem space.
The intent is to sort VOP_STRATEGY calls into those which operate on "real" vnodes and those which operate on VCHR vnodes. For the latter kind, the call will be changed to VOP_SPECSTRATEGY, possibly conditionally for those places where dual-use happens.
Add a default VOP_SPECSTRATEGY method which will call the normal VOP_STRATEGY. First time it is called it will print debugging information. This will only happen if a normal vnode is passed to VOP_SPECSTRATEGY by mistake.
Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY does on a VCHR vnode today.
Add a new VOP_STRATEGY method in specfs to catch instances where the conversion to VOP_SPECSTRATEGY has not yet happened. Handle the request just like we always did, but first time called print debugging information.
Apart up to two instances of console messages per boot, this amounts to a glorified no-op commit.
If you get any of the messages on your console I would very much like a copy of them mailed to phk@freebsd.org
|
108677 |
04-Jan-2003 |
alc |
Allow kmem_malloc() without Giant if M_NOWAIT is specified.
|
108676 |
04-Jan-2003 |
alc |
Use vm_object_lock() and vm_object_unlock() in vm_object_deallocate(). (This procedure needs further work, but this change is sufficient for locking the kmem_object.)
|
108675 |
04-Jan-2003 |
alc |
Refine the assertions in vm_page_alloc().
|
108610 |
03-Jan-2003 |
alc |
Refine the assertion in vm_object_clear_flag() to allow operation on the kmem_object without Giant. In that case, assert that the kmem_object's mutex is held.
|
108609 |
03-Jan-2003 |
phk |
Revert use of dmmax_mask, I had overlooked a '~'.
Spotted by: bde
|
108602 |
03-Jan-2003 |
phk |
Make struct swblock kernel only, to make vm/swap_pager.h userland includable. Move struct swdevt from sys/conf.h to the more appropriate vm/swap_pager.h. Adjust #include use in libkvm and pstat(8) to match.
|
108600 |
03-Jan-2003 |
phk |
Avoid extern decls in .c files by putting them in the vm/swap_pager.h include file where they belong. Share the dmmax_mask variable.
|
108599 |
03-Jan-2003 |
phk |
Use correct _VM_SWAP_PAGER_H_ to check for multiple inclusion.
|
108595 |
03-Jan-2003 |
phk |
Retire sys/dmap.h by including the two lines of it which matters directly in vm/vm_swap.c.
|
108594 |
03-Jan-2003 |
alc |
Lock the vm object when performing vm_object_clear_flag().
|
108589 |
03-Jan-2003 |
phk |
Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
|
108585 |
03-Jan-2003 |
alc |
Add vm map and vm object locking to vmtotal().
|
108551 |
02-Jan-2003 |
alc |
Lock the vm object when performing vm_object_clear_flag().
|
108534 |
01-Jan-2003 |
alc |
Update the assertions in vm_page_insert() and vm_page_lookup() to reflect locking of the kmem_object.
|
108533 |
01-Jan-2003 |
schweikh |
Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.
|
108518 |
01-Jan-2003 |
alc |
Add a needed #include.
Reported by: ia64 tinderbox
|
108515 |
31-Dec-2002 |
alc |
Implement a variant locking scheme for vm maps: Access to system maps is now synchronized by a mutex, whereas access to user maps is still synchronized by a lockmgr()-based lock. Why? No single type of lock, including sx locks, meets the requirements of both types of vm map. Sometimes we sleep while holding the lock on a user map. Thus, a a mutex isn't appropriate. On the other hand, both lockmgr()-based and sx locks release Giant when a thread/process blocks during contention for a lock. This could lead to a race condition in a legacy driver (that relies on Giant for synchronization) if it attempts to kmem_malloc() and fails to immediately obtain the lock. Fortunately, we never sleep while holding a system map lock.
|
108426 |
30-Dec-2002 |
alc |
- Mark the kernel_map as a system map immediately after its creation. - Correct a cast.
|
108418 |
30-Dec-2002 |
alc |
- Increment the vm_map's timestamp if _vm_map_trylock() succeeds. - Introduce map_sleep_mtx and use it to replace Giant in vm_map_unlock_and_wait() and vm_map_wakeup(). (Original version by: tegge.)
|
108413 |
29-Dec-2002 |
alc |
- Remove vm_object_init2(). It is unused. - Add a mtx_destroy() to vm_object_collapse(). (This allows a bzero() to migrate from _vm_object_allocate() to vm_object_zinit(), where it will be performed less often.)
|
108384 |
29-Dec-2002 |
alc |
Reduce the number of times that we acquire and release the page queues lock by making vm_page_rename()'s caller, rather than vm_page_rename(), responsible for acquiring it.
|
108370 |
28-Dec-2002 |
alc |
Assert that the page queues lock rather than Giant is held in vm_page_flag_clear().
|
108361 |
28-Dec-2002 |
dillon |
vm_pager_put_pages() takes VM_PAGER_* flags, not OBJPC_* flags. It just so happens that OBJPC_SYNC has the same value as VM_PAGER_PUT_SYNC so no harm done. But fix it :-)
No operational changes.
MFC after: 1 day
|
108358 |
28-Dec-2002 |
dillon |
Allow the VM object flushing code to cluster. When the filesystem syncer comes along and flushes a file which has been mmap()'d SHARED/RW, with dirty pages, it was flushing the underlying VM object asynchronously, resulting in thousands of 8K writes. With this change the VM Object flushing code will cluster dirty pages in 64K blocks.
Note that until the low memory deadlock issue is reviewed, it is not safe to allow the pageout daemon to use this feature. Forced pageouts still use fs block size'd ops for the moment.
MFC after: 3 days
|
108351 |
28-Dec-2002 |
alc |
Two changes to kmem_malloc(): - Use VM_ALLOC_WIRED. - Perform vm_page_wakeup() after pmap_enter(), like we do everywhere else.
|
108334 |
27-Dec-2002 |
alc |
- Change vm_object_page_collect_flush() to assert rather than acquire the page queues lock. - Acquire the page queues lock in vm_object_page_clean().
|
108306 |
27-Dec-2002 |
alc |
Increase the scope of the page queues lock in phys_pager_getpages().
|
108262 |
24-Dec-2002 |
alc |
- Hold the page queues lock around calls to vm_page_flag_clear().
|
108251 |
24-Dec-2002 |
alc |
- Hold the page queues lock around vm_page_wakeup().
|
108233 |
23-Dec-2002 |
alc |
- Hold the kernel_object's lock around vm_page_insert(..., kernel_object, ...).
|
108197 |
23-Dec-2002 |
alc |
Eliminate some dead code. (Any possible use for this code died with vm/vm_page.c revision 1.220.)
Submitted by: bde
|
108171 |
22-Dec-2002 |
dillon |
The UP -current was not properly counting the per-cpu VM stats in the sysctl code. This makes 'systat -vm 1's syscall count work again.
Submitted by: Michal Mertl <mime@traveller.cz> Note: also slated for 5.0
|
108138 |
20-Dec-2002 |
alc |
Increase the scope of the kmem_object locking in kmem_malloc().
|
108117 |
20-Dec-2002 |
alc |
Add a mutex to struct vm_object. Initialize and destroy that mutex at appropriate times. For the moment, the mutex is only used on the kmem_object.
|
108101 |
19-Dec-2002 |
alc |
Remove the hash_rand field from struct vm_object. As of revision 1.215 of vm/vm_page.c, it is unused.
|
108081 |
19-Dec-2002 |
alc |
- Remove vm_page_sleep_busy(). The transition to vm_page_sleep_if_busy(), which incorporates page queue and field locking, is complete. - Assert that the page queue lock rather than Giant is held in vm_page_flag_set().
|
108068 |
19-Dec-2002 |
alc |
- Hold the page queues lock when performing vm_page_busy() or vm_page_flag_set(). - Replace vm_page_sleep_busy() with proper page queues locking and vm_page_sleep_if_busy().
|
108012 |
18-Dec-2002 |
alc |
- Hold the page queues lock when performing vm_page_busy(). - Replace vm_page_sleep_busy() with proper page queues locking and vm_page_sleep_if_busy().
|
108011 |
18-Dec-2002 |
alc |
Hold the page queues lock when performing vm_page_flag_set().
|
107989 |
17-Dec-2002 |
alc |
Hold the page queues lock when performing vm_page_flag_set().
|
107948 |
16-Dec-2002 |
dillon |
Change the way ELF coredumps are handled. Instead of unconditionally skipping read-only pages, which can result in valuable non-text-related data not getting dumped, the ELF loader and the dynamic loader now mark read-only text pages NOCORE and the coredump code only checks (primarily) for complete inaccessibility of the page or NOCORE being set.
Certain applications which map large amounts of read-only data will produce much larger cores. A new sysctl has been added, debug.elf_legacy_coredump, which will revert to the old behavior.
This commit represents collaborative work by all parties involved. The PR contains a program demonstrating the problem.
PR: kern/45994 Submitted by: "Peter Edwards" <pmedwards@eircom.net>, Archie Cobbs <archie@dellroad.org> Reviewed by: jdp, dillon MFC after: 7 days
|
107918 |
15-Dec-2002 |
alc |
Perform vm_object_lock() and vm_object_unlock() on kmem_object around vm_page_lookup() and vm_page_free().
|
107913 |
15-Dec-2002 |
dillon |
This is David Schultz's swapoff code which I am finally able to commit. This should be considered highly experimental for the moment.
Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 3 weeks
|
107912 |
15-Dec-2002 |
dillon |
Fix a refcount race with the vmspace structure. In order to prevent resource starvation we clean-up as much of the vmspace structure as we can when the last process using it exits. The rest of the structure is cleaned up when it is reaped. But since exit1() decrements the ref count it is possible for a double-free to occur if someone else, such as the process swapout code, references and then dereferences the structure. Additionally, the final cleanup of the structure should not occur until the last process referencing it is reaped.
This commit solves the problem by introducing a secondary reference count, calling 'vm_exitingcnt'. The normal reference count is decremented on exit and vm_exitingcnt is incremented. vm_exitingcnt is decremented when the process is reaped. When both vm_exitingcnt and vm_refcnt are 0, the structure is freed for real.
MFC after: 3 weeks
|
107893 |
15-Dec-2002 |
alc |
As per the comments, vm_object_page_remove() now expects its caller to lock the object (i.e., acquire Giant).
|
107892 |
15-Dec-2002 |
alc |
Perform vm_object_lock() and vm_object_unlock() around vm_object_page_remove().
|
107891 |
15-Dec-2002 |
alc |
Perform vm_object_lock() and vm_object_unlock() around vm_object_page_remove().
|
107887 |
15-Dec-2002 |
alc |
Assert that the page queues lock is held in vm_page_unhold(), vm_page_remove(), and vm_page_free_toq().
|
107464 |
01-Dec-2002 |
alc |
Hold the page queues lock when calling pmap_protect(); it updates fields of the vm_page structure. Make the style of the pmap_protect() calls consistent.
Approved by: re (blanket)
|
107436 |
01-Dec-2002 |
alc |
Hold the page queues lock when calling pmap_protect(); it updates fields of the vm_page structure. Nearby, remove an unnecessary semicolon and return statement.
Approved by: re (blanket)
|
107433 |
01-Dec-2002 |
alc |
Increase the scope of the page queue lock in vm_pageout_scan().
Approved by: re (blanket)
|
107370 |
28-Nov-2002 |
alc |
Lock page field accesses in mincore().
Approved by: re (blanket)
|
107347 |
27-Nov-2002 |
alc |
Hold the page queues lock when performing pmap_clear_modify().
Approved by: re (blanket)
|
107304 |
27-Nov-2002 |
alc |
Hold the page queues lock while performing pmap_page_protect().
Approved by: re (blanket)
|
107250 |
25-Nov-2002 |
alc |
Acquire and release the page queues lock around calls to pmap_protect() because it updates flags within the vm page.
Approved by: re (blanket)
|
107200 |
24-Nov-2002 |
alc |
Extend the scope of the page queues/fields locking in vm_freeze_copyopts() to cover pmap_remove_all().
Approved by: re
|
107189 |
23-Nov-2002 |
alc |
Hold the page queues/flags lock when calling vm_page_set_validclean().
Approved by: re
|
107185 |
23-Nov-2002 |
alc |
Assert that the page queues lock rather than Giant is held in vm_pageout_page_free().
Approved by: re
|
107182 |
23-Nov-2002 |
alc |
Add page queue and flag locking in vnode_pager_setsize().
Approved by: re
|
107136 |
21-Nov-2002 |
jeff |
- Add an event that is triggered when the system is low on memory. This is intended to be used by significant memory consumers so that they may drain some of their caches.
Inspired by: phk Approved by: re Tested on: x86, alpha
|
107048 |
18-Nov-2002 |
jeff |
- Wakeup the correct address when a zone is no longer full.
Spotted by: jake
|
107039 |
18-Nov-2002 |
alc |
Remove vm_page_protect(). Instead, use pmap_page_protect() directly.
|
106992 |
16-Nov-2002 |
jeff |
- Don't forget the flags value when using boot pages.
Reported by: grehan
|
106981 |
16-Nov-2002 |
alc |
Now that pmap_remove_all() is exported by our pmap implementations use it directly.
|
106871 |
13-Nov-2002 |
alc |
Remove dead code that hasn't been needed since the demise of share maps in various revisions of vm/vm_map.c between 1.148 and 1.153.
|
106838 |
13-Nov-2002 |
alc |
Move pmap_collect() out of the machine-dependent code, rename it to reflect its new location, and add page queue and flag locking.
Notes: (1) alpha, i386, and ia64 had identical implementations of pmap_collect() in terms of machine-independent interfaces; (2) sparc64 doesn't require it; (3) powerpc had it as a TODO.
|
106778 |
11-Nov-2002 |
cognet |
Remove extra #include<sys/vmmeter.h>.
|
106773 |
11-Nov-2002 |
mjacob |
atomic_set_8 isn't MI. Instead, follow Jake's suggestions about ZONE_LOCK.
|
106753 |
11-Nov-2002 |
alc |
- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit() if we're removing write access from the page's PTEs. - Export pmap_remove_all() on alpha, i386, and ia64. (It's already exported on sparc64.)
|
106733 |
10-Nov-2002 |
mjacob |
Use atomic_set_8 on the us_freelist maps as they are not otherwise protected. Furthermore, in some RISC architectures with no normal byte operations, the surrounding 3 bytes are also affected by the read-modify-write that has to occur.
|
106720 |
10-Nov-2002 |
alc |
When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than indirectly through vm_page_protect(). The one remaining page flag that is updated by vm_page_protect() is already being updated by our various pmap implementations.
Note: A later commit will similarly change the VM_PROT_READ case and eliminate vm_page_protect().
|
106708 |
09-Nov-2002 |
alc |
Fix an error case in vm_map_wire(): unwiring of an entry during cleanup after a user wire error fails when the entry is already system wired.
Reported by: tegge
|
106691 |
09-Nov-2002 |
alc |
In vm_page_remove(), avoid calling vm_page_splay() if the object's memq is empty.
|
106605 |
07-Nov-2002 |
tmm |
Move the definitions of the hw.physmem, hw.usermem and hw.availpages sysctls to MI code; this reduces code duplication and makes all of them available on sparc64, and the latter two on powerpc. The semantics by the i386 and pc98 hw.availpages is slightly changed: previously, holes between ranges of available pages would be included, while they are excluded now. The new behaviour should be more correct and brings i386 in line with the other architectures.
Move physmem to vm/vm_init.c, where this variable is used in MI code.
|
106603 |
07-Nov-2002 |
mux |
Better printf() formats.
|
106602 |
07-Nov-2002 |
mux |
Some more printf() format fixes.
|
106600 |
07-Nov-2002 |
mux |
Correctly print vm_offset_t types.
|
106422 |
04-Nov-2002 |
alc |
Export the function vm_page_splay().
|
106387 |
03-Nov-2002 |
alc |
- Remove the memory allocation for the object/offset hash table because it's no longer used. (See revision 1.215.) - Fix a harmless bug: the number of vm_page structures allocated wasn't properly adjusted when uma_bootstrap() was introduced. Consequently, we were allocating 30 unused vm_page structures. - Wrap a long line.
|
106359 |
02-Nov-2002 |
alc |
Remove the vm page buckets mutex. As of revision 1.215 of vm/vm_page.c, it is unused.
|
106277 |
01-Nov-2002 |
jeff |
- Add support for machine dependant page allocation routines. MD code may define UMA_MD_SMALL_ALLOC to make use of this feature.
Reviewed by: peter, jake
|
106276 |
01-Nov-2002 |
jeff |
- Add a new flag to vm_page_alloc, VM_ALLOC_NOOBJ. This tells vm_page_alloc not to insert this page into an object. The pindex is still used for colorization. - Rework vm_page_select_* to accept a color instead of an object and pindex to work with VM_PAGE_NOOBJ. - Document other VM_ALLOC_ flags.
Reviewed by: peter, jake
|
106023 |
27-Oct-2002 |
rwatson |
Merge from MAC tree: rename mac_check_vnode_swapon() to mac_check_system_swapon(), to reflect the fact that the primary object of this change is the running kernel as a whole, rather than just the vnode. We'll drop additional checks of this class into the same check namespace, including reboot(), sysctl(), et al.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
105853 |
24-Oct-2002 |
jeff |
- Now that uma_zalloc_internal is not the fast path don't be so fussy about extra function calls. Refactor uma_zalloc_internal into seperate functions for finding the most appropriate slab, filling buckets, allocating single items, and pulling items off of slabs. This makes the code significantly cleaner. - This also fixes the "Returning an empty bucket." panic that a few people have seen.
Tested On: alpha, x86
|
105848 |
24-Oct-2002 |
jeff |
- Move the destructor calls so that they are not called with the zone lock held. This avoids a lock order reversal when destroying zones. Unfortunately, this also means that the free checks are not done before the destructor is called.
Reported by: phk
|
105718 |
22-Oct-2002 |
rwatson |
Invoke mac_check_vnode_mmap() during mmap operations on vnodes, permitting policies to restrict access to memory mapping based on the credential requesting the mapping, the target vnode, the requested rights, or other policy considerations.
Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
105717 |
22-Oct-2002 |
rwatson |
Introduce MAC_CHECK_VNODE_SWAPON, which permits MAC policies to perform authorization checks during swapon() events; policies might choose to enforce protections based on the credential requesting the swap configuration, the target of the swap operation, or other factors such as internal policy state.
Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
105695 |
22-Oct-2002 |
jhb |
- Check that a process isn't a new process (p_state == PRS_NEW) before trying to acquire it's proc lock since the proc lock may not have been constructed yet. - Split up the one big comment at the top of the loop and put the pieces in the right order above the various checks.
Reported by: kris (1)
|
105689 |
22-Oct-2002 |
sheldonh |
Fix typo in comments (misspelled "necessary").
|
105549 |
20-Oct-2002 |
alc |
o Reinline vm_page_undirty(), reducing the kernel size. (This reverts a part of vm_page.h revision 1.87 and vm_page.c revision 1.167.)
|
105466 |
19-Oct-2002 |
alc |
Complete the page queues locking needed for the page-based copy- on-write (COW) mechanism. (This mechanism is used by the zero-copy TCP/IP implementation.) - Extend the scope of the page queues lock in vm_fault() to cover vm_page_cowfault(). - Modify vm_page_cowfault() to release the page queues lock if it sleeps.
|
105407 |
18-Oct-2002 |
dillon |
Replace the vm_page hash table with a per-vmobject splay tree. There should be no major change in performance from this change at this time but this will allow other work to progress: Giant lock removal around VM system in favor of per-object mutexes, ranged fsyncs, more optimal COMMIT rpc's for NFS, partial filesystem syncs by the syncer, more optimal object flushing, etc. Note that the buffer cache is already using a similar splay tree mechanism.
Note that a good chunk of the old hash table code is still in the tree. Alan or I will remove it prior to the release if the new code does not introduce unsolvable bugs, else we can revert more easily.
Submitted by: alc (this is Alan's code) Approved by: re
|
105229 |
16-Oct-2002 |
phk |
Properly put macro args in ().
Spotted by: FlexeLint.
|
105126 |
14-Oct-2002 |
julian |
Remove old useless debugging code
|
104964 |
12-Oct-2002 |
jeff |
- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c
Reviewed by: -arch
|
104387 |
02-Oct-2002 |
jhb |
Rename the mutex thread and process states to use a more generic 'LOCK' name instead. (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of TD_ON_MUTEX()) Eventually a turnstile abstraction will be added that will be shared with mutexes and other types of locks. SLOCK/TDI_LOCK will be used internally by the turnstile code and will not be specific to mutexes. Making the change now ensures that turnstiles can be dropped in at a later date without affecting the ABI of userland applications.
|
104354 |
02-Oct-2002 |
scottl |
Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates.
Reviewed by: jake, peter, jhb
|
104094 |
28-Sep-2002 |
phk |
Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too.
Inspired by: FlexeLint warning #512
|
103925 |
25-Sep-2002 |
jeff |
- Get rid of the unused LK_NOOBJ.
|
103924 |
25-Sep-2002 |
jeff |
- Lock access to numoutput on the swap devices.
|
103923 |
25-Sep-2002 |
jeff |
- Add a ASSERT_VOP_LOCKED in vnode_pager_alloc. - Lock access to v_iflags.
|
103794 |
22-Sep-2002 |
mdodd |
Modify vm_map_clean() (and thus the msync(2) system call) to support invalidation of cached pages for objects of type OBJT_DEVICE.
Submitted by: Christian Zander <zander@minion.de> Approved by: alc
|
103777 |
22-Sep-2002 |
alc |
o Update some comments.
|
103767 |
21-Sep-2002 |
jake |
Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
|
103732 |
21-Sep-2002 |
alc |
Reduce namespace pollution.
Submitted by: bde
|
103623 |
19-Sep-2002 |
jeff |
- Use my freebsd email alias in the copyright. - Remove redundant instances of my email alias in the file summary.
|
103531 |
18-Sep-2002 |
jeff |
- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH. - Remove all instances of the mallochash. - Stash the slab pointer in the vm page's object pointer when allocating from the kmem_obj. - Use the overloaded object pointer to find slabs for malloced memory.
|
103314 |
14-Sep-2002 |
njl |
Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging.
Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.
Suggested by: phk Reviewed by: bde, rwatson (earlier version)
|
103216 |
11-Sep-2002 |
julian |
Completely redo thread states.
Reviewed by: davidxu@freebsd.org
|
103123 |
09-Sep-2002 |
tanimura |
- Do not swap out a process if it is in creation. The process may have no address space yet.
- Check whether a process is a system process prior to dereferencing its p_vmspace. Aio assumes that only the curthread switches the address space of a system process.
|
103002 |
06-Sep-2002 |
julian |
Use UMA as a complex object allocator. The process allocator now caches and hands out complete process structures *including substructures* .
i.e. it get's the process structure with the first thread (and soon KSE) already allocated and attached, all in one hit.
For the average non threaded program (non KSE that is) the allocated thread and its stack remain attached to the process, even when the process is unused and in the process cache. This saves having to allocate and attach it later, effectively bringing us (hopefully) close to the efficiency of pre-KSE systems where these were a single structure.
Reviewed by: davidxu@freebsd.org, peter@freebsd.org
|
102966 |
05-Sep-2002 |
bde |
Use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite.
|
102950 |
05-Sep-2002 |
davidxu |
s/SGNL/SIG/ s/SNGL/SINGLE/ s/SNGLE/SINGLE/
Fix abbreviation for P_STOPPED_* etc flags, in original code they were inconsistent and difficult to distinguish between them.
Approved by: julian (mentor)
|
102835 |
02-Sep-2002 |
alc |
o Synchronize updates to struct vm_page::cow with the page queues lock.
|
102738 |
31-Aug-2002 |
dillon |
Reduce the maximum KVA reserved for swap meta structures from 70 to 32 MB. Reduce the swap meta calculation by a factor of 2, it's still massive overkill.
X-MFC after: immediately
|
102600 |
30-Aug-2002 |
peter |
Change hw.physmem and hw.usermem to unsigned long like they used to be in the original hardwired sysctl implementation.
The buf size calculator still overflows an integer on machines with large KVA (eg: ia64) where the number of pages does not fit into an int. Use 'long' there.
Change Maxmem and physmem and related variables to 'long', mostly for completeness. Machines are not likely to overflow 'int' pages in the near term, but then again, 640K ought to be enough for anybody. This comes for free on 32 bit machines, so why not?
|
102399 |
25-Aug-2002 |
alc |
o Retire pmap_pageable(). It's an advisory routine that none of our platforms implements.
|
102382 |
25-Aug-2002 |
alc |
o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since pmap_zero_page() and pmap_zero_page_area() were modified to accept a struct vm_page * instead of a physical address, vm_page_zero_fill() and vm_page_zero_fill_area() have served no purpose.
|
102372 |
24-Aug-2002 |
alc |
o Use vm_object_lock() in place of directly locking Giant.
Reviewed by: md5
|
102370 |
24-Aug-2002 |
alc |
o Use vm_object_lock() in place of Giant when manipulating a vm object in vm_map_insert().
|
102349 |
24-Aug-2002 |
alc |
o Resurrect vm_object_lock() and vm_object_unlock() from revision 1.19. (For now, they simply acquire and release Giant.)
|
102241 |
21-Aug-2002 |
archie |
Don't use "NULL" when "0" is really meant.
|
101657 |
11-Aug-2002 |
alc |
o Assert that the page queues lock is held in vm_page_activate().
|
101656 |
11-Aug-2002 |
alc |
o Lock page queue accesses by vm_page_activate().
|
101655 |
10-Aug-2002 |
alc |
o Lock page queue accesses by vm_page_activate().
|
101654 |
10-Aug-2002 |
alc |
o Move a call to vm_page_wakeup() inside the scope of the page queues lock.
|
101645 |
10-Aug-2002 |
alc |
o Remove the setting and clearing of the PG_MAPPED flag from the alpha and ia64 pmap. o Remove the PG_MAPPED flag's declaration.
|
101634 |
10-Aug-2002 |
alc |
o Remove the setting and clearing of the PG_MAPPED flag. (This flag is obsolete.)
|
101543 |
08-Aug-2002 |
alc |
o Use pmap_page_is_mapped() in vm_page_protect() rather than the PG_MAPPED flag. (This is the only place in the entire kernel where the PG_MAPPED flag is tested. It will be removed soon.)
|
101327 |
04-Aug-2002 |
alc |
o Acquire the page queues lock before checking the page's busy status in vm_page_grab(). Also, replace the nearby tsleep() with an msleep() on the page queues lock.
|
101308 |
04-Aug-2002 |
jeff |
- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking.
Idea stolen from: BSD/OS
|
101304 |
04-Aug-2002 |
alc |
o Extend the scope of the page queues lock in contigmalloc1(). o Replace vm_page_sleep_busy() with vm_page_sleep_if_busy() in vm_contig_launder().
|
101250 |
03-Aug-2002 |
alc |
o Remove the setting of PG_MAPPED from vm_page_wire() and vm_page_alloc(VM_ALLOC_WIRED).
|
101236 |
02-Aug-2002 |
alc |
o Convert two instances of vm_page_sleep_busy() into vm_page_sleep_if_busy() with appropriate page queue locking.
|
101200 |
02-Aug-2002 |
alc |
o Lock page queue accesses in nwfs and smbfs. o Assert that the page queues lock is held in vm_page_deactivate().
|
101196 |
02-Aug-2002 |
alc |
o Lock page queue accesses by vm_page_deactivate().
|
101174 |
01-Aug-2002 |
alc |
o Acquire the page queues lock before calling vm_page_io_finish(). o Assert that the page queues lock is held in vm_page_io_finish().
|
101105 |
31-Jul-2002 |
alc |
o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably leads to unnecessary pmap_page_protect() calls if one of these pages is paged out after unwiring.
Note: setting PG_MAPPED asserts that the page's pv list may be non-empty. Since checking the status of the page's pv list isn't any harder than checking this flag, the flag should probably be eliminated. Alternatively, PG_MAPPED could be set by pmap_enter() exclusively rather than various places throughout the kernel.
|
101019 |
31-Jul-2002 |
alc |
o Lock page accesses by vm_page_io_start() with the page queues lock. o Assert that the page queues lock is held in vm_page_io_start().
|
100915 |
30-Jul-2002 |
alc |
o In vm_object_madvise() and vm_object_page_remove() replace vm_page_sleep_busy() with vm_page_sleep_if_busy(). At the same time, increase the scope of the page queues lock. (This should significantly reduce the locking overhead in vm_object_page_remove().) o Apply some style fixes.
|
100913 |
30-Jul-2002 |
tanimura |
- Optimize wakeup() and its friends; if a thread waken up is being swapped in, we do not have to ask for the scheduler thread to do that.
- Assert that a process is not swapped out in runq functions and swapout().
- Introduce thread_safetoswapout() for readability.
- In swapout_procs(), perform a test that may block (check of a thread working on its vm map) first. This lets us call swapout() with the sched_lock held, providing a better atomicity.
|
100889 |
29-Jul-2002 |
alc |
o Introduce vm_page_sleep_if_busy() as an eventual replacement for vm_page_sleep_busy(). vm_page_sleep_if_busy() uses the page queues lock.
|
100885 |
29-Jul-2002 |
julian |
Remove a XXXKSE comment. the code is no longer a problem..
|
100884 |
29-Jul-2002 |
julian |
Create a new thread state to describe threads that would be ready to run except for the fact tha they are presently swapped out. Also add a process flag to indicate that the process has started the struggle to swap back in. This will be needed for the case where multiple threads start the swapin action top a collision. Also add code to stop a process fropm being swapped out if one of the threads in this process is actually off running on another CPU.. that might hurt...
Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
|
100862 |
29-Jul-2002 |
alc |
o Pass VM_ALLOC_WIRED to vm_page_grab() rather than calling vm_page_wire() in pmap_new_thread(), pmap_pinit(), and vm_proc_new(). o Lock page queue accesses by vm_page_free() in pmap_object_init_pt().
|
100836 |
28-Jul-2002 |
alc |
o Modify vm_page_grab() to accept VM_ALLOC_WIRED.
|
100832 |
28-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_free(). o Apply some style fixes.
|
100829 |
28-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_free().
|
100797 |
28-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_free(). o Increment cnt.v_dfree inside vm_pageout_page_free() rather than at each call.
|
100796 |
28-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_free().
|
100779 |
27-Jul-2002 |
alc |
o Require that the page queues lock is held on entry to vm_pageout_clean() and vm_pageout_flush(). o Acquire the page queues lock before calling vm_pageout_clean() or vm_pageout_flush().
|
100742 |
27-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_activate().
|
100740 |
27-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_activate() and vm_page_deactivate() in vm_pageout_object_deactivate_pages(). o Apply some style fixes to vm_pageout_object_deactivate_pages().
|
100736 |
27-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_activate() and vm_page_deactivate().
|
100686 |
25-Jul-2002 |
alc |
o Remove a vm_page_deactivate() that is immediately followed by a vm_page_rename() from vm_object_backing_scan(). vm_page_rename() also performs vm_page_deactivate() on pages in the cache queues, making the removed vm_page_deactivate() redundant.
|
100630 |
24-Jul-2002 |
alc |
o Merge vm_fault_wire() and vm_fault_user_wire() by adding a new parameter, user_wire.
|
100545 |
23-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_dontneed(). o Assert that the page queue lock is held in vm_page_dontneed().
|
100542 |
23-Jul-2002 |
alc |
o Extend the scope of the page queues lock in vm_pageout_scan() to cover the traversal of the cache queue.
|
100512 |
22-Jul-2002 |
alfred |
Change struct vmspace->vm_shm from void * to struct shmmap_state *, this removes the need for casts in several cases.
|
100511 |
22-Jul-2002 |
alfred |
Remove caddr_t.
|
100456 |
21-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_free() and vm_page_deactivate().
|
100452 |
21-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_free().
|
100438 |
21-Jul-2002 |
tanimura |
Do not pass a thread with the state TDS_RUNQ to setrunqueue(), otherwise assertion in setrunqueue() fails.
|
100415 |
20-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_try_to_cache(). (The accesses in kern/vfs_bio.c are already locked.) o Assert that the page queues lock is held in vm_page_try_to_cache().
|
100414 |
20-Jul-2002 |
alc |
o Assert that the page queues lock is held in vm_page_try_to_free().
|
100413 |
20-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_cache() in vm_fault() and vm_pageout_scan(). (The others are already locked.) o Assert that the page queues lock is held in vm_page_cache().
|
100407 |
20-Jul-2002 |
alc |
o Lock accesses to the active page queue in vm_pageout_scan() and vm_pageout_page_stats().
|
100397 |
20-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_cache() in vm_contig_launder(). o Micro-optimize the control flow in vm_contig_launder().
|
100396 |
20-Jul-2002 |
alc |
o Remove dead and/or unused code.
|
100384 |
20-Jul-2002 |
peter |
Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable handler in the kernel at the same time. Also, allow for the exec_new_vmspace() code to build a different sized vmspace depending on the executable environment. This is a big help for execing i386 binaries on ia64. The ELF exec code grows the ability to map partial pages when there is a page size difference, eg: emulating 4K pages on 8K or 16K hardware pages.
Flesh out the i386 emulation support for ia64. At this point, the only binary that I know of that fails is cvsup, because the cvsup runtime tries to execute code in pages not marked executable.
Obtained from: dfr (mostly, many tweaks from me).
|
100379 |
19-Jul-2002 |
peter |
Set P_NOLOAD on the pagezero kthread so that it doesn't artificially skew the loadav. This is not real load. If you have a nice process running in the background, pagezero may sit in the run queue for ages and add one to the loadav, and thereby affecting other scheduling decisions.
|
100342 |
19-Jul-2002 |
alc |
o Duplicate an odd side-effect of vm_page_wire() in vm_page_allocate() when VM_ALLOC_WIRED is specified: set the PG_MAPPED bit in flags. o In both vm_page_wire() and vm_page_allocate() add a comment saying that setting PG_MAPPED does not belong there.
|
100331 |
18-Jul-2002 |
alc |
o Remove the acquisition and release of Giant from the idle priority thread that pre-zeroes free pages. o Remove GIANT_REQUIRED from some low-level page queue functions. (Instead assertions on the page queue lock are being added to the higher-level functions, like vm_page_wire(), etc.)
In collaboration with: peter
|
100326 |
18-Jul-2002 |
markm |
Void functions cannot return values.
|
100309 |
18-Jul-2002 |
peter |
(VM_MAX_KERNEL_ADDRESS - KERNBASE) / PAGE_SIZE may not fit in an integer. Use lmin(long, long), not min(u_int, u_int). This is a problem here on ia64 which has *way* more than 2^32 pages of KVA. 281474976710655 pages to be precice.
|
100276 |
18-Jul-2002 |
alc |
o Introduce an argument, VM_ALLOC_WIRED, that requests vm_page_alloc() to return a wired page. o Use VM_ALLOC_WIRED within Alpha's pmap_growkernel(). Also, because Alpha's pmap_growkernel() calls vm_page_alloc() from within a critical section, specify VM_ALLOC_INTERRUPT instead of VM_ALLOC_SYSTEM. (Only VM_ALLOC_INTERRUPT is implemented entirely with a spin mutex.) o Assert that the page queues mutex is held in vm_page_wire() on Alpha, just like the other platforms.
|
100193 |
16-Jul-2002 |
alc |
o Use vm_pageq_remove_nowakeup() and vm_pageq_enqueue() in vm_page_zero_idle() instead of partially duplicated implementations. In particular, this change guarantees that the number of free pages in the free queue(s) matches the global free page count when Giant is released.
Submitted by: peter (via his p4 "pmap" branch)
|
100031 |
15-Jul-2002 |
alc |
o Create vm_contig_launder() to replace code that appears twice in contigmalloc1().
|
100005 |
14-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_wire() that aren't within a critical section. o Assert that the page queues lock is held in vm_page_wire() unless an Alpha.
|
99985 |
14-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_wire().
|
99934 |
13-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_unmanage(). o Assert that the page queues lock is held in vm_page_unmanage().
|
99927 |
13-Jul-2002 |
alc |
o Complete the locking of page queue accesses by vm_page_unwire(). o Assert that the page queues lock is held in vm_page_unwire(). o Make vm_page_lock_queues() and vm_page_unlock_queues() visible to kernel loadable modules.
|
99920 |
13-Jul-2002 |
alc |
o Lock some page queue accesses, in particular, those by vm_page_unwire().
|
99893 |
12-Jul-2002 |
alc |
o Assert GIANT_REQUIRED on system maps in _vm_map_lock(), _vm_map_lock_read(), and _vm_map_trylock(). Submitted by: tegge o Remove GIANT_REQUIRED from kmem_alloc_wait() and kmem_free_wakeup(). (This clears the way for exec_map accesses to move outside of Giant. The exec_map is not a system map.) o Remove some premature MPSAFE comments.
Reviewed by: tegge
|
99890 |
12-Jul-2002 |
dillon |
Re-enable the idle page-zeroing code. Remove all IPIs from the idle page-zeroing code as well as from the general page-zeroing code and use a lazy tlb page invalidation scheme based on a callback made at the end of mi_switch.
A number of people came up with this idea at the same time so credit belongs to Peter, John, and Jake as well.
Two-way SMP buildworld -j 5 tests (second run, after stabilization) 2282.76 real 2515.17 user 704.22 sys before peter's IPI commit 2266.69 real 2467.50 user 633.77 sys after peter's commit 2232.80 real 2468.99 user 615.89 sys after this commit
Reviewed by: peter, jhb Approved by: peter
|
99851 |
12-Jul-2002 |
peter |
Avoid a vm_page_lookup() - that uses a spinlock protected hash. We can just use the object's memq for our nefarious purposes.
|
99850 |
12-Jul-2002 |
alc |
o Lock some (unfortunately, not yet all) accesses to the page queues.
|
99849 |
12-Jul-2002 |
alc |
o Lock accesses to the page queues.
|
99754 |
11-Jul-2002 |
alc |
o Add a "needs wakeup" flag to the vm_map for use by kmem_alloc_wait() and kmem_free_wakeup(). Previously, kmem_free_wakeup() always called wakeup(). In general, no one was sleeping. o Export vm_map_unlock_and_wait() and vm_map_wakeup() from vm_map.c for use in vm_kern.c.
|
99683 |
09-Jul-2002 |
alc |
o Lock accesses to the page queues in vm_object_terminate(). o Eliminate some unnecessary 64-bit arithmetic in vm_object_split().
|
99625 |
08-Jul-2002 |
peter |
vm_page_queue_free_mtx is a spin mutex, not a normal sleep mutex. I do not know why this didn't panic my box, but I have most certainly been using it: peter@overcee[3:14pm]~src/sys/i386/i386-110> sysctl -a | grep zero vm.stats.misc.zero_page_count: 2235 vm.stats.misc.cnt_prezero: 638951 vm.idlezero_enable: 1 vm.idlezero_maxrun: 16
Submitted by: Tor.Egge@cvsup.no.freebsd.org Approved by: Tor's patches are never wrong. :-)
|
99624 |
08-Jul-2002 |
peter |
Turn the zeroidle process off for SMP systems, there is still a possible TLB problem when bouncing from one cpu to another (the original cpu will not have purged its TLB if the it simply went idle).
Pointed out by: Tor.Egge@cvsup.no.freebsd.org Approved by: Tor is never wrong. :-)
|
99571 |
08-Jul-2002 |
peter |
Add a special page zero entry point intended to be called via the single threaded VM pagezero kthread outside of Giant. For some platforms, this is really easy since it can just use the direct mapped region. For others, IPI sending is involved or there are other issues, so grab Giant when needed.
We still have preemption issues to deal with, but Alan Cox has an interesting suggestion on how to minimize the problem on x86.
Use Luigi's hack for preserving the (lack of) priority.
Turn the idle zeroing back on since it can now actually do something useful outside of Giant in many cases.
|
99563 |
08-Jul-2002 |
peter |
Avoid vm_page_lookup() [grabs a spinlock] and just process the upage object memq instead.
Suggested by: alc
|
99559 |
07-Jul-2002 |
peter |
Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/ pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code and use a single equivalent MI version. There are other cleanups needed still.
While here, use the UMA zone hooks to keep a cache of preinitialized proc structures handy, just like the thread system does. This eliminates one dependency on 'struct proc' being persistent even after being freed. There are some comments about things that can be factored out into ctor/dtor functions if it is worth it. For now they are mostly just doing statistics to get a feel of how it is working.
|
99545 |
07-Jul-2002 |
alc |
o Lock accesses to the free queue(s) in vm_page_zero_idle().
|
99514 |
07-Jul-2002 |
alc |
o Traverse the object's memq rather than repeatedly calling vm_page_lookup() in vm_object_split().
|
99509 |
06-Jul-2002 |
jeff |
- Hold a lock on the vnode acquired from the file table across the call to vm_mmap() as well as the GETATTR etc. - If the handle is a vnode in vm_mmap() assert that it is locked. - Wiggle Giant around a little to account for the extra vnode operation.
|
99476 |
05-Jul-2002 |
gallatin |
Remove bogus vm_page_wakeup() in vm_page_cowfault() that will cause panics in the zero-copy send path if a process attempts to write to a page which is still in flight.
reviewed by: ken
|
99472 |
05-Jul-2002 |
jeff |
Fix a lock order reversal in uma_zdestroy. The uma_mtx needs to be held across calls to zone_drain().
Noticed by: scottl
|
99427 |
05-Jul-2002 |
alc |
o Lock accesses to the free page queues in contigmalloc1().
|
99424 |
05-Jul-2002 |
jeff |
Remove unnecessary includes.
|
99416 |
04-Jul-2002 |
alc |
o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free queue lock (revision 1.33 of vm/vm_page.c removed them). o Make the free queue lock a spin lock because it's sometimes acquired inside of a critical section.
|
99408 |
04-Jul-2002 |
julian |
A small cleanup.
|
99407 |
04-Jul-2002 |
julian |
Don;t call teh thread setup routines from here.. they are already called when uma calls thread_init()
|
99374 |
03-Jul-2002 |
alc |
o Make the reservation of KVA space for kernel map entries a function of the KVA space's size in addition to the amount of physical memory and reduce it by a factor of two.
Under the old formula, our reservation amounted to one kernel map entry per virtual page in the KVA space on a 4GB i386.
|
99320 |
03-Jul-2002 |
jeff |
Actually use the fini callback.
Pointy hat to: me :-( Noticed By: Julian
|
99211 |
01-Jul-2002 |
robert |
- Use (OFF_TO_IDX(off) - pi) instead of (OFF_TO_IDX(off - IDX_TO_OFF(pi))). - Reformat a comment.
|
99196 |
01-Jul-2002 |
alc |
o Remove some long dead code: from revision 1.41 of vm/vm_pager.c 3+ years ago. o Remove some unused prototypes.
|
99093 |
29-Jun-2002 |
iedowse |
Change the type of `tscan' in vm_object_page_clean() to vm_pindex_t, as it stores an absolute page index that may not fit in a vm_offset_t.
|
99072 |
29-Jun-2002 |
julian |
Part 1 of KSE-III
The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
98892 |
26-Jun-2002 |
iedowse |
Avoid using the 64-bit vm_pindex_t in a few places where 64-bit types are not required, as the overhead is unnecessary:
o In the i386 pmap_protect(), `sindex' and `eindex' represent page indices within the 32-bit virtual address space. o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary variable to store the low few bits of a vm_pindex_t that gets used as an array index. o vm_uiomove() uses `osize' and `idx' for page offsets within a map entry. o In vm_object_split(), `idx' is a page offset within a map entry.
|
98891 |
26-Jun-2002 |
iedowse |
Use an explicit cast to avoid relying on sign extension to do the right thing in code such as `vm_pindex_t x = ~SWAP_META_MASK'.
Reviewed by: dillon
|
98849 |
26-Jun-2002 |
ken |
At long last, commit the zero copy sockets code.
MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes.
ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls.
man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links.
jumbo.9: New man page describing the jumbo buffer allocator interface and operation.
zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality.
NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.
conf/files: Add uipc_jumbo.c and uipc_cow.c.
conf/options: Add the 5 options mentioned above.
kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1.
uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack.
uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive.
uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on.
Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions.
uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c)
if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails.
The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)).
ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives.
if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers.
Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface.
Add header splitting support to the ti(4) driver.
Tweak some of the default interrupt coalescing parameters to more useful defaults.
Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off.
if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13.
Add defines needed for debugging.
Remove the ti_stats structure, it is now defined in sys/tiio.h.
ti_fw.h: 12.4.11 firmware.
ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.)
sys/jumbo.h: Jumbo buffer allocator interface.
sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process.
socketvar.h: Add prototype for socow_setup.
tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions.
uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable.
ufs_readwrite.c:Update for new prototype of uiomoveco().
vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault.
vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre.
This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.)
vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK.
vm_object.h: Add prototype for vm_object_allocate_wait().
vm_page.c: Add page-based copy on write setup, clear and fault routines.
vm_page.h: Add page based COW function prototypes and variable in the vm_page structure.
Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
|
98848 |
26-Jun-2002 |
dillon |
Enforce RLIMIT_VMEM on growable mappings (aka the primary stack or any MAP_STACK mapping).
Suggested by: alc
|
98833 |
26-Jun-2002 |
dillon |
Part I of RLIMIT_VMEM implementation. Implement core functionality for a new resource limit that covers a process's entire VM space, including mmap()'d space.
(Part II will be additional code to check RLIMIT_VMEM during exec() but it needs more fleshing out).
PR: kern/18209 Submitted by: Andrey Alekseyev <uitm@zenon.net>, Dmitry Kim <jason@nichego.net> MFC after: 7 days
|
98824 |
25-Jun-2002 |
iedowse |
Complete the initial set of VM changes required to support full 64-bit file sizes. This step simply addresses the remaining overflows, and does attempt to optimise performance. The details are:
o Use a 64-bit type for the vm_object `size' and the size argument to vm_object_allocate(). o Use the correct type for index variables in dev_pager_getpages(), vm_object_page_clean() and vm_object_page_remove(). o Avoid an overflow in the i386 pmap_object_init_pt().
|
98823 |
25-Jun-2002 |
jeff |
Turn VM_ALLOC_ZERO into a flag.
Submitted by: tegge Reviewed by: dillon
|
98822 |
25-Jun-2002 |
jeff |
Reduce the amount of code that runs with the zone lock held in slab_zalloc(). This allows us to run the zone initialization functions without any locks held.
|
98818 |
25-Jun-2002 |
alc |
o Eliminate vmspace::vm_minsaddr. It's initialized but never used. o Replace stale comments in vmspace by "const until freed" annotations on some fields.
|
98686 |
23-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from kmem_alloc_pageable(), kmem_alloc_nofault(), and kmem_free(). (Annotate as MPSAFE.) o Remove incorrect casts from kmem_alloc_pageable() and kmem_alloc_nofault().
|
98656 |
23-Jun-2002 |
alc |
o Remove the unnecessary acquisition and release of Giant around fdrop() in mmap(2).
|
98632 |
22-Jun-2002 |
alc |
o Reduce the scope of Giant in vm_mmap() to just the code that manipulates a vnode. (Thus, MAP_ANON and MAP_STACK never acquire Giant.)
|
98630 |
22-Jun-2002 |
alc |
o Replace mtx_assert(&Giant, MA_OWNED) in dev_pager_alloc() with the acquisition and release of Giant. (Annotate as MPSAFE.) o Reorder the sanity checks in dev_pager_alloc() to reduce the time that Giant is held.
|
98624 |
22-Jun-2002 |
alc |
o In vm_map_insert(), replace GIANT_REQUIRED by the acquisition and release of Giant around the direct manipulation of the vm_object and the optional call to pmap_object_init_pt(). o In vm_map_findspace(), remove GIANT_REQUIRED. Instead, acquire and release Giant around the occasional call to pmap_growkernel(). o In vm_map_find(), remove GIANT_REQUIRED.
|
98607 |
22-Jun-2002 |
alc |
o Replace GIANT_REQUIRED in swap_pager_alloc() by the acquisition and release of Giant. (Annotate as MPSAFE.)
|
98605 |
22-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from phys_pager_alloc(). If handle isn't NULL, acquire and release Giant. If handle is NULL, Giant isn't needed. o Annotate phys_pager_alloc() and phys_pager_dealloc() as MPSAFE.
|
98604 |
22-Jun-2002 |
alc |
o Replace GIANT_REQUIRED in vnode_pager_alloc() by the acquisition and release of Giant. (Annotate as MPSAFE.) o Also, in vnode_pager_alloc(), remove an unnecessary re-initialization of struct vm_object::flags and move a statement that is duplicated in both branches of an if-else.
|
98600 |
22-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from vslock(). o Annotate kernacc(), useracc(), and vslock() as MPSAFE.
Motivated by: alfred
|
98541 |
21-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from vm_map_stack().
|
98538 |
21-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from vm_pager_allocate() and vm_pager_deallocate().
|
98498 |
20-Jun-2002 |
alc |
o Remove an incorrect cast from obreak(). This cast would, for example, break an sbrk(>=4GB) on 64-bit architectures even if the resource limit allowed it. o Correct an off-by-one error. o Correct a spelling error in a comment. o Reorder an && expression so that the commonly FALSE expression comes first.
Submitted by: bde (bullets 1 and 2)
|
98460 |
20-Jun-2002 |
alc |
o Acquire and release the vm_map lock instead of Giant in obreak(). Consequently, use vm_map_insert() and vm_map_delete(), which expect the vm_map to be locked, instead of vm_map_find() and vm_map_remove(), which do not.
|
98455 |
19-Jun-2002 |
jeff |
- Move the computation of pflags out of the page allocation loop in kmem_malloc() - zero fill pages if PG_ZERO bit is not set after allocation in kmem_malloc()
Suggested by: alc, jake
|
98451 |
19-Jun-2002 |
jeff |
- Remove bogus use of kmem_alloc that was inherited from the old zone allocator. - Properly set M_ZERO when talking to the back end page allocators for non malloc zones. This forces us to zero fill pages when they are first brought into a cache. - Properly handle M_ZERO in uma_zalloc_internal. This fixes a problem where per cpu buckets weren't always getting zeroed.
|
98450 |
19-Jun-2002 |
jeff |
Teach kmem_malloc about M_ZERO.
|
98414 |
19-Jun-2002 |
alc |
o Replace GIANT_REQUIRED in vm_object_coalesce() by the acquisition and release of Giant. o Reduce the scope of GIANT_REQUIRED in vm_map_insert().
These changes will enable us to remove the acquisition and release of Giant from obreak().
|
98397 |
18-Jun-2002 |
alc |
o Remove LK_CANRECURSE from the vm_map lock.
|
98362 |
17-Jun-2002 |
jeff |
Honor the BUCKETCACHE flag on free as well.
|
98361 |
17-Jun-2002 |
jeff |
- Introduce the new M_NOVM option which tells uma to only check the currently allocated slabs and bucket caches for free items. It will not go ask the vm for pages. This differs from M_NOWAIT in that it not only doesn't block, it doesn't even ask.
- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This tells uma that it should only allocate buckets out of the bucket cache, and not from the VM. It does this by using the M_NOVM option to zalloc when getting a new bucket. This is so that the VM doesn't recursively enter itself while trying to allocate buckets for vm_map_entry zones. If there are already allocated buckets when we get here we'll still use them but otherwise we'll skip it.
- Use the ZONE_VM flag on vm map entries and pv entries on x86.
|
98343 |
17-Jun-2002 |
alc |
o Acquire and release Giant in vm_map_wakeup() to prevent a lost wakeup().
Reviewed by: tegge
|
98304 |
16-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from vm_fault_user_wire(). o Move pmap_pageable() outside of Giant in vm_fault_unwire(). (pmap_pageable() is a no-op on all supported architectures.) o Remove the acquisition and release of Giant from mlock().
|
98263 |
15-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from useracc() and vsunlock(). Neither vm_map_check_protection() nor vm_map_unwire() expect Giant to be held.
|
98240 |
15-Jun-2002 |
alc |
o Remove the acquisition and release of Giant from munlock().
Reviewed by: tegge
|
98226 |
14-Jun-2002 |
alc |
o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and vm_map_user_pageable(). o Remove vm_map_pageable() and vm_map_user_pageable(). o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were only used by vm_map_pageable() and vm_map_user_pageable().)
Reviewed by: tegge
|
98142 |
12-Jun-2002 |
alc |
o Acquire and release Giant in vm_map_unlock_and_wait().
Submitted by: tegge
|
98119 |
11-Jun-2002 |
alc |
o Properly handle a failure by vm_fault_wire() or vm_fault_user_wire() in vm_map_wire(). o Make two white-space changes in vm_map_wire().
Reviewed by: tegge
|
98109 |
11-Jun-2002 |
alc |
o Teach vm_map_delete() to respect the "in-transition" flag on a vm_map_entry by sleeping until the flag is cleared.
Submitted by: tegge
|
98083 |
10-Jun-2002 |
alc |
o In vm_map_entry_create(), call uma_zalloc() with M_NOWAIT on system maps. Submitted by: tegge o Eliminate the "!mapentzone" check from vm_map_entry_create() and vm_map_entry_dispose(). Reviewed by: tegge o Fix white-space usage in vm_map_entry_create().
|
98075 |
10-Jun-2002 |
iedowse |
Correct the logic for determining whether the per-CPU locks need to be destroyed. This fixes a problem where destroying a UMA zone would fail to destroy all zone mutexes.
Reviewed by: jeff
|
98071 |
09-Jun-2002 |
alc |
o Add vm_map_wire() for wiring contiguous regions of either kernel or user vm_maps. This implementation has two key benefits when compared to vm_map_{user_,}pageable(): (1) it avoids a race condition through the use of "in-transition" vm_map entries and (2) it eliminates lock recursion on the vm_map.
Note: there is still an error case that requires clean up.
Reviewed by: tegge
|
98052 |
08-Jun-2002 |
alc |
o Simplify vm_map_unwire() by merging the second and third passes over the caller-specified region.
|
98036 |
08-Jun-2002 |
alc |
o Remove an unnecessary call to vm_map_wakeup() from vm_map_unwire(). o Add a stub for vm_map_wire().
Note: the description of the previous commit had an error. The in- transition flag actually blocks the deallocation of a vm_map_entry by vm_map_delete() and vm_map_simplify_entry().
|
98022 |
07-Jun-2002 |
alc |
o Add vm_map_unwire() for unwiring contiguous regions of either kernel or user vm_maps. In accordance with the standards for munlock(2), and in contrast to vm_map_user_pageable(), this implementation does not allow holes in the specified region. This implementation uses the "in transition" flag described below. o Introduce a new flag, "in transition," to the vm_map_entry. Eventually, vm_map_delete() and vm_map_simplify_entry() will respect this flag by deallocating in-transition vm_map_entrys, allowing the vm_map lock to be safely released in vm_map_unwire() and (the forthcoming) vm_map_wire(). o Modify vm_map_simplify_entry() to respect the in-transition flag.
In collaboration with: tegge
|
97947 |
06-Jun-2002 |
alfred |
fix typo in _SYS_SYSPROTO_H_ case: s/mlockall_args/munlockall_args
Submitted by: Mark Santcroos <marks@ripe.net>
|
97787 |
03-Jun-2002 |
jeff |
Add a comment describing a resource leak that occurs during a failure case in obj_alloc.
|
97753 |
02-Jun-2002 |
alc |
o Migrate vm_map_split() from vm_map.c to vm_object.c, renaming it to vm_object_split(). Its interface should still be changed to resemble vm_object_shadow().
|
97747 |
02-Jun-2002 |
alc |
o Style fixes to vm_map_split(), including the elimination of one variable declaration that shadows another.
Note: This function should really be vm_object_split(), not vm_map_split().
Reviewed by: md5
|
97729 |
02-Jun-2002 |
alc |
o Condition vm_object_pmap_copy_1()'s compilation on the kernel option ENABLE_VFS_IOOPT. Unless this option is in effect, vm_object_pmap_copy_1() is not used.
|
97727 |
01-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from vm_map_zfini(), vm_map_zinit(), vm_map_create(), and vm_map_submap(). o Make further use of a local variable in vm_map_entry_splay() that caches a reference to one of a vm_map_entry's children. (This reduces code size somewhat.) o Revert a part of revision 1.66, deinlining vmspace_pmap(). (This function is MPSAFE.)
|
97710 |
01-Jun-2002 |
alc |
o Revert a part of revision 1.66, contrary to what that commit message says, deinlining vm_map_entry_behavior() and vm_map_entry_set_behavior() actually increases the kernel's size. o Make vm_map_entry_set_behavior() static and add a comment describing its purpose. o Remove an unnecessary initialization statement from vm_map_entry_splay().
|
97654 |
31-May-2002 |
des |
Export nswapdev through sysctl(8).
Sponsored by: DARPA, NAI Labs
|
97648 |
31-May-2002 |
alc |
Further work on pushing Giant out of the vm_map layer and down into the vm_object layer: o Acquire and release Giant in vm_object_shadow() and vm_object_page_remove(). o Remove the GIANT_REQUIRED assertion preceding vm_map_delete()'s call to vm_object_page_remove(). o Remove the acquisition and release of Giant around vm_map_lookup()'s call to vm_object_shadow().
|
97556 |
30-May-2002 |
alfred |
Check for defined(__i386__) instead of just defined(i386) since the compiler will be updated to only define(__i386__) for ANSI cleanliness.
|
97453 |
29-May-2002 |
peter |
The kernel printf does not have %i
|
97359 |
27-May-2002 |
alc |
o Remove unused #defines.
|
97294 |
26-May-2002 |
alc |
o Acquire and release Giant around pmap operations in vm_fault_unwire() and vm_map_delete(). Assert GIANT_REQUIRED in vm_map_delete() only if operating on the kernel_object or the kmem_object. o Remove GIANT_REQUIRED from vm_map_remove(). o Remove the acquisition and release of Giant from munmap().
|
97198 |
24-May-2002 |
alc |
o Replace the vm_map's hint by the root of a splay tree. By design, the last accessed datum is moved to the root of the splay tree. Therefore, on lookups in which the hint resulted in O(1) access, the splay tree still achieves O(1) access. In contrast, on lookups in which the hint failed miserably, the splay tree achieves amortized logarithmic complexity, resulting in dramatic improvements on vm_maps with a large number of entries. For example, the execution time for replaying an access log from www.cs.rice.edu against the thttpd web server was reduced by 23.5% due to the large number of files simultaneously mmap()ed by this server. (The machine in question has enough memory to cache most of this workload.)
Nothing comes for free: At present, I see a 0.2% slowdown on "buildworld" due to the overhead of maintaining the splay tree. I believe that some or all of this can be eliminated through optimizations to the code.
Developed in collaboration with: Juan E Navarro <jnavarro@cs.rice.edu> Reviewed by: jeff
|
97088 |
22-May-2002 |
alc |
o Make contigmalloc1() static.
|
97007 |
20-May-2002 |
jhb |
In uma_zalloc_arg(), if we are performing a M_WAITOK allocation, ensure that td_intr_nesting_level is 0 (like malloc() does). Since malloc() calls uma we can probably remove the check in malloc() for this now. Also, perform an extra witness check in that case to make sure we don't hold any locks when performing a M_WAITOK allocation.
|
96875 |
18-May-2002 |
alc |
o Eliminate the acquisition and release of Giant from minherit(2). (vm_map_inherit() no longer requires Giant to be held.)
|
96839 |
18-May-2002 |
alc |
o Remove GIANT_REQUIRED from vm_map_madvise(). Instead, acquire and release Giant around vm_map_madvise()'s call to pmap_object_init_pt(). o Replace GIANT_REQUIRED in vm_object_madvise() with the acquisition and release of Giant. o Remove the acquisition and release of Giant from madvise().
|
96832 |
18-May-2002 |
alc |
o Remove the acquisition and release of Giant from mprotect().
|
96755 |
16-May-2002 |
trhodes |
More s/file system/filesystem/g
|
96572 |
14-May-2002 |
phk |
Make daddr_t and u_daddr_t 64bits wide. Retire daddr64_t and use daddr_t instead.
Sponsored by: DARPA & NAI Labs.
|
96496 |
13-May-2002 |
jeff |
Don't call the uz free function while the zone lock is held. This can lead to lock order reversals. uma_reclaim now builds a list of freeable slabs and then unlocks the zones to do all of the frees.
|
96493 |
13-May-2002 |
jeff |
Remove the hash_free() lock order reversal. This could have happened for several reasons before. Fixing it involved restructuring the generic hash code to require calling code to handle locking, unlocking, and freeing hashes on error conditions.
|
96469 |
12-May-2002 |
alc |
o Remove GIANT_REQUIRED and an excessive number of blank lines from vm_map_inherit(). (minherit() need not acquire Giant anymore.)
|
96441 |
12-May-2002 |
alc |
o Acquire and release Giant in vm_object_reference() and vm_object_deallocate(), replacing the assertion GIANT_REQUIRED. o Remove GIANT_REQUIRED from vm_map_protect() and vm_map_simplify_entry(). o Acquire and release Giant around vm_map_protect()'s call to pmap_protect().
Altogether, these changes eliminate the need for mprotect() to acquire and release Giant.
|
96096 |
06-May-2002 |
alc |
o Header files shouldn't depend on options: Provide prototypes for uiomoveco(), uioread(), and vm_uiomove() regardless of whether ENABLE_VFS_IOOPT is defined or not.
Submitted by: bde
|
96095 |
06-May-2002 |
alc |
o Condition the compilation and use of vm_freeze_copyopts() on ENABLE_VFS_IOOPT.
|
96091 |
06-May-2002 |
alc |
o Some improvements to the page coloring of vm objects, particularly, for shadow objects.
Submitted by: bde
|
96087 |
06-May-2002 |
alc |
o Move vm_freeze_copyopts() from vm_map.{c.h} to vm_object.{c,h}. It's plainly an operation on a vm_object and belongs in the latter place.
|
96080 |
05-May-2002 |
alc |
o Condition the compilation of uiomoveco() and vm_uiomove() on ENABLE_VFS_IOOPT. o Add a comment to the effect that this code is experimental support for zero-copy I/O.
|
96073 |
05-May-2002 |
phk |
Expand the one-line function pbreassignbuf() the only place it is or could be used.
|
96056 |
05-May-2002 |
alc |
o Remove GIANT_REQUIRED from vm_map_lookup() and vm_map_lookup_done(). o Acquire and release Giant around vm_map_lookup()'s call to vm_object_shadow().
|
96044 |
04-May-2002 |
jeff |
Use pages instead of uz_maxpages, which has not been initialized yet, when creating the vm_object. This was broken after the code was rearranged to grab giant itself.
Spotted by: alc
|
96042 |
04-May-2002 |
alc |
o Make _vm_object_allocate() and vm_object_allocate() callable without holding Giant. o Begin documenting the trivial cases of the locking protocol on vm_object.
|
96007 |
04-May-2002 |
alc |
o Remove GIANT_REQUIRED from vm_map_lookup_entry() and vm_map_check_protection(). o Call vm_map_check_protection() without Giant held in munmap().
|
95942 |
02-May-2002 |
alc |
o Change the implementation of vm_map locking to use exclusive locks exclusively. The interface still, however, distinguishes between a shared lock and an exclusive lock.
|
95931 |
02-May-2002 |
jeff |
Hide a pointer to the malloc_type bucket at the end of the freed memory. If this memory is modified after it has been freed we can now report it's previous owner.
|
95930 |
02-May-2002 |
jeff |
Move around the dbg code a bit so it's always under a lock. This stops a weird potential race if we were preempted right as we were doing the dbg checks.
|
95925 |
02-May-2002 |
arr |
- Changed the size element of uma_zctor_args to be size_t instead of int. - Changed uma_zcreate to accept the size argument as a size_t intead of int.
Approved by: jeff
|
95923 |
02-May-2002 |
jeff |
malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the mallochash. Mallochash is going to go away as soon as I introduce the kfree/kmalloc api and partially overhaul the malloc wrapper. This can't happen until all users of the malloc api that expect memory to be aligned on the size of the allocation are fixed.
|
95901 |
02-May-2002 |
alc |
o Remove dead and lockmgr()-specific debugging code.
|
95899 |
02-May-2002 |
jeff |
Remove the temporary alignment check in free().
Implement the following checks on freed memory in the bucket path: - Slab membership - Alignment - Duplicate free
This previously was only done if we skipped the buckets. This code will slow down INVARIANTS a bit, but it is smp safe. The checks were moved out of the normal path and into hooks supplied in uma_dbg.
|
95823 |
30-Apr-2002 |
alc |
o Convert the vm_page buckets mutex to a spin lock. (This resolves an issue on the Alpha platform found by jeff@.) o Simplify vm_page_lookup().
Reviewed by: jhb
|
95771 |
30-Apr-2002 |
jeff |
Add a new UMA debugging facility. This will overwrite freed memory with 0xdeadc0de and then check for it just before memory is handed off as part of a new request. This will catch any post free/pre alloc modification of memory, as well as introduce errors for anything that tries to dereference it as a pointer.
This code takes the form of special init, fini, ctor and dtor routines that are specificly used by malloc. It is in a seperate file because additional debugging aids will want to live here as well.
|
95766 |
30-Apr-2002 |
jeff |
Move the implementation of M_ZERO into UMA so that it can be passed to uma_zalloc and friends. Remove this functionality from the malloc wrapper.
Document this change in uma.h and adjust variable names in uma_core.
|
95764 |
30-Apr-2002 |
alc |
o Revert vm_fault1() to its original name vm_fault(), eliminating the wrapper that took its place for the purposes of acquiring and releasing Giant.
|
95758 |
29-Apr-2002 |
jeff |
Add a new zone flag UMA_ZONE_MTXCLASS. This puts the zone in it's own mutex class. Currently this is only used for kmapentzone because kmapents are are potentially allocated when freeing memory. This is not dangerous though because no other allocations will be done while holding the kmapentzone lock.
|
95710 |
29-Apr-2002 |
peter |
Tidy up some loose ends. i386/ia64/alpha - catch up to sparc64/ppc: - replace pmap_kernel() with refs to kernel_pmap - change kernel_pmap pointer to (&kernel_pmap_store) (this is a speedup since ld can set these at compile/link time) all platforms (as suggested by jake): - gc unused pmap_reference - gc unused pmap_destroy - gc unused struct pmap.pm_count (we never used pm_count - we track address space sharing at the vmspace)
|
95701 |
29-Apr-2002 |
alc |
Document three synchronization issues in vm_fault().
|
95686 |
28-Apr-2002 |
alc |
Pass the caller's file name and line number to the vm_map locking functions.
|
95610 |
28-Apr-2002 |
alc |
o Introduce and use vm_map_trylock() to replace several direct uses of lockmgr(). o Add missing synchronization to vmspace_swap_count(): Obtain a read lock on the vm_map before traversing it.
|
95598 |
28-Apr-2002 |
peter |
We do not necessarily need to map/unmap pages to zero parts of them. On systems where physical memory is also direct mapped (alpha, sparc, ia64 etc) this is slightly harmful.
|
95589 |
27-Apr-2002 |
alc |
o Begin documenting the (existing) locking protocol on the vm_map in the same style as sys/proc.h. o Undo the de-inlining of several trivial, MPSAFE methods on the vm_map. (Contrary to the commit message for vm_map.h revision 1.66 and vm_map.c revision 1.206, de-inlining these methods increased the kernel's size.)
|
95532 |
26-Apr-2002 |
alc |
o Control access to the vm_page_buckets with a mutex. o Fix some style(9) bugs.
|
95432 |
25-Apr-2002 |
arr |
- Fix a round down bogon in uma_zone_set_max().
Submitted by: jeff@
|
95112 |
20-Apr-2002 |
alc |
Reintroduce locking on accesses to vm_object_list.
|
95021 |
19-Apr-2002 |
alc |
o Move the acquisition of Giant from vm_fault() to the point after initialization in vm_fault1(). o Fix some style problems in vm_fault1().
|
94981 |
18-Apr-2002 |
alc |
Add a comment documenting a race condition in vm_fault(): Specifically, a modification is made to the vm_map while only a read lock is held.
|
94977 |
18-Apr-2002 |
alc |
o Call vm_map_growstack() from vm_fault() if vm_map_lookup() has failed due to conditions that suggest the possible need for stack growth. This has two beneficial effects: (1) we can now remove calls to vm_map_growstack() from the MD trap handlers and (2) simple page faults are faster because we no longer unnecessarily perform vm_map_growstack() on every page fault. o Remove vm_map_growstack() from the i386's trap_pfault(). o Remove the acquisition and release of Giant from i386's trap_pfault(). (vm_fault() still acquires it.)
|
94921 |
17-Apr-2002 |
peter |
Do not free the vmspace until p->p_vmspace is set to null. Otherwise statclock can access it in the tail end of statclock_process() at an unfortunate time. This bit me several times on an SMP alpha (UP2000) and the problem went away with this change. I'm not sure why it doesn't break x86 as well. Maybe it's because the clocks are much faster on alpha (HZ=1024 by default).
|
94912 |
17-Apr-2002 |
alc |
Remove an unused option, VM_FAULT_HOLD, to vm_fault().
|
94777 |
15-Apr-2002 |
peter |
Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]() and pmap_copy_page(). This gets rid of a couple more physical addresses in upper layers, with the eventual aim of supporting PAE and dealing with the physical addressing mostly within pmap. (We will need either 64 bit physical addresses or page indexes, possibly both depending on the circumstances. Leaving this to pmap itself gives more flexibilitly.)
Reviewed by: jake Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)
|
94653 |
14-Apr-2002 |
jeff |
Fix a witness warning when expanding a hash table. We were allocating the new hash while holding the lock on a zone. Fix this by doing the allocation seperately from the actual hash expansion.
The lock is dropped before the allocation and reacquired before the expansion. The expansion code checks to see if we lost the race and frees the new hash if we do. We really never will lose this race because the hash expansion is single threaded via the timeout mechanism.
|
94651 |
14-Apr-2002 |
jeff |
Protect the initial list traversal in sysctl_vm_zone() with the uma_mtx.
|
94631 |
14-Apr-2002 |
jeff |
Fix the calculation that determines uz_maxpages. It was off for large zones. Fortunately we have no large zones with maximums specified yet, so it wasn't breaking anything.
Implement blocking when a zone exceeds the maximum and M_WAITOK is specified. Previously this just failed like the old zone allocator did. The old zone allocator didn't support WAITOK/NOWAIT though so we should do what we advertise.
While I was in there I cleaned up some more zalloc logic to further simplify that code path and reduce redundant code. This was needed to make the blocking work properly anyway.
|
94329 |
10-Apr-2002 |
jeff |
Remember to unlock the zone if the fill count is too high.
Pointed out by: pete, jake, jhb
|
94240 |
08-Apr-2002 |
jeff |
Quiet witness warnings about acquiring several zone locks. In the case that this happens it is OK.
|
94165 |
08-Apr-2002 |
jeff |
Add a mechanism to disable buckets when the v_free_count drops below v_free_min. This should help performance in memory starved situations.
|
94163 |
08-Apr-2002 |
jeff |
Don't release the zone lock until after the dtor has been called. As far as I can tell this could not have caused any problems yet because UMA is still called with giant.
Pointy hat to: jeff Noticed by: jake
|
94161 |
08-Apr-2002 |
jeff |
Implement uma_zdestroy(). It's prototype changed slightly. I decided that I didn't like the wait argument and that if you were removing a zone it had better be empty.
Also, I broke out part of hash_expand and made a seperate hash_free() for use in uma_zdestroy.
|
94159 |
08-Apr-2002 |
jeff |
Rework most of the bucket allocation and free code so that per cpu locks are never held across blocking operations. Also, fix two other lock order reversals that were exposed by jhb's witness change.
The free path previously had a bug that would cause it to skip the free bucket list in some cases and go straight to allocating a new bucket. This has been fixed as well.
These changes made the bucket handling code much cleaner and removed quite a few lock operations. This should be marginally faster now.
It is now possible to call malloc w/o Giant and avoid any witness warnings. This still isn't entirely safe though because malloc_type statistics are not protected by any lock.
|
94157 |
07-Apr-2002 |
jeff |
Spelling correction; s/seperate/separate/g
Submitted by: eric
|
94156 |
07-Apr-2002 |
jeff |
There should be no remaining references to these two files in the tree. If there are, it is an error. vm_zone has been superseded by uma.
|
94155 |
07-Apr-2002 |
jeff |
This fixes a bug where isitem never got set to 1 if a certain chain of events relating to extreme low memory situations occured. This was only ever seen on the port build cluster, so many thanks to kris for helping me debug this.
Tested by: kris
|
93847 |
05-Apr-2002 |
alc |
o Eliminate the use of grow_stack() and useracc() from sendsig(), osendsig(), and osf1_sendsig(). o Eliminate the prototype for the MD grow_stack() now that it has been removed from all platforms.
|
93823 |
04-Apr-2002 |
dillon |
Embed a struct vmmeter in the per-cpu structure and add a macro, PCPU_LAZY_INC() which increments elements in it for cases where we can afford the occassional inaccuracy. Use of per-cpu stats counters avoids significant cache stalls in various critical paths that would otherwise severely limit our cpu scaleability.
Adjust all sysctl's accessing cnt.* elements to now use a procedure which aggregates the requested field for all cpus and for the global vmmeter.
The global vmmeter is retained, since some stats counters, like v_free_min, cannot be made per-cpu. Also, this allows us to convert counters from the global vmmeter to the per-cpu vmmeter in a piecemeal fashion, so have at it!
|
93818 |
04-Apr-2002 |
jhb |
Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
|
93716 |
03-Apr-2002 |
jake |
Fix a long standing 32bit-ism. Don't assume that the size of a chunk of memory in phys_avail will fit in 'int', use vm_size_t. This fixes booting on sparc64 machines with more than 2 gigs of ram.
Thanks to Jan Chrillesen for providing me with access to a 4 gig machine.
|
93697 |
02-Apr-2002 |
alfred |
fix comment typo, s/neccisary/necessary/g
|
93593 |
01-Apr-2002 |
jhb |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
|
93273 |
27-Mar-2002 |
jeff |
Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares.
Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone.
Approved by: jhb
|
93194 |
26-Mar-2002 |
alc |
Remove an unused prototype.
|
93089 |
24-Mar-2002 |
jeff |
Reset the cachefree statistics after draining the cache. This fixes a bug where a sysctl within 20 seconds of a cache_drain could yield negative "USED" counts.
Also, grab the uma_mtx while in the sysctl handler. This hadn't caused problems yet because Giant is held all the time.
Reported by: kkenn
|
92758 |
20-Mar-2002 |
jeff |
Add uma_zone_set_max() to add enforced limits to non vm obj backed zones.
|
92748 |
20-Mar-2002 |
jeff |
Remove references to vm_zone.h and switch over to the new uma API.
|
92727 |
19-Mar-2002 |
alfred |
Remove __P.
|
92692 |
19-Mar-2002 |
jeff |
Quit a warning introduced by UMA. This only occurs on machines where vm_size_t != unsigned long.
Reviewed by: phk
|
92666 |
19-Mar-2002 |
peter |
Fix a gcc-3.1+ warning. warning: deprecated use of label at end of compound statement
ie: you cannot do this anymore: switch(foo) { ....
default: }
|
92654 |
19-Mar-2002 |
jeff |
This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator.
Reviewed by: arch@
|
92588 |
18-Mar-2002 |
green |
Back out the modification of vm_map locks from lockmgr to sx locks. The best path forward now is likely to change the lockmgr locks to simple sleep mutexes, then see if any extra contention it generates is greater than removed overhead of managing local locking state information, cost of extra calls into lockmgr, etc.
Additionally, making the vm_map lock a mutex and respecting it properly will put us much closer to not needing Giant magic in vm.
|
92511 |
17-Mar-2002 |
alc |
Remove vm_object_count: It's unused, incorrectly maintained and duplicates information maintained by the zone allocator.
|
92475 |
17-Mar-2002 |
alc |
Undo part of revision 1.57: Now that (o)sendsig() doesn't call useracc(), the motivation for saving and restoring the map->hint in useracc() is gone. (The same tests that motivated this change in revision 1.57 now show that there is no performance loss from removing it.) This was really a hack and some day we would have had to add new synchronization here on map->hint to maintain it.
|
92466 |
17-Mar-2002 |
alc |
Acquire a read lock on the map inside of vm_map_check_protection() rather than expecting the caller to do so. This (1) eliminates duplicated code in kernacc() and useracc() and (2) fixes missing synchronization in munmap().
|
92461 |
17-Mar-2002 |
jake |
Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/ pmap_qremove. pmap_kenter is not safe to use in MI code because it is not guaranteed to flush the mapping from the tlb on all cpus. If the process in question is preempted and migrates cpus between the call to pmap_kenter and pmap_kremove, the original cpu will be left with stale mappings in its tlb. This is currently not a problem for i386 because we do not use PG_G on SMP, and thus all mappings are flushed from the tlb on context switches, not just user mappings. This is not the case on all architectures, and if PG_G is to be used with SMP on i386 it will be a problem. This was committed by peter earlier as part of his fine grained tlb shootdown work for i386, which was backed out for other reasons.
Reviewed by: peter
|
92363 |
15-Mar-2002 |
mckusick |
Introduce the new 64-bit size disk block, daddr64_t. Change the bio and buffer structures to have daddr64_t bio_pblkno, b_blkno, and b_lblkno fields which allows access to disks larger than a Terabyte in size. This change also requires that the VOP_BMAP vnode operation accept and return daddr64_t blocks. This delta should not affect system operation in any way. It merely sets up the necessary interfaces to allow the development of disk drivers that work with these larger disk block addresses. It also allows for the development of UFS2 which will use 64-bit block addresses.
|
92256 |
14-Mar-2002 |
green |
Document faultstate.lookup_still_valid more than none.
Requested by: alfred
|
92246 |
13-Mar-2002 |
green |
Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate. While doing this, move it earlier in the sysinit boot process so that the VM system can use it.
After that, the system is now able to use sx locks instead of lockmgr locks in the VM system. To accomplish this, some of the more questionable uses of the locks (such as testing whether they are owned or not, as well as allowing shared+exclusive recursion) are removed, and simpler logic throughout is used so locks should also be easier to understand.
This has been tested on my laptop for months, and has not shown any problems on SMP systems, either, so appears quite safe. One more user of lockmgr down, many more to go :)
|
92029 |
10-Mar-2002 |
eivind |
- Remove a number of extra newlines that do not belong here according to style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs.
Reviewed by: alc
|
91946 |
09-Mar-2002 |
tegge |
Revert change in revision 1.53 and add a small comment to protect the revived code.
vm pages newly allocated are marked busy (PG_BUSY), thus calling vm_page_delete before the pages has been freed or unbusied will cause a deadlock since vm_page_object_page_remove will wait for the busy flag to be cleared. This can be triggered by calling malloc with size > PAGE_SIZE and the M_NOWAIT flag on systems low on physical free memory.
A kernel module that reproduces the problem, written by Logan Gabriel <logan@mail.2cactus.com>, can be found in the freebsd-hackers mail archive (12 Apr 2001). The problem was recently noticed again by Archie Cobbs <archie@dellroad.org>.
Reviewed by: dillon
|
91777 |
07-Mar-2002 |
dillon |
Fix a bug in the vm_map_clean() procedure. msync()ing an area of memory that has just been mapped MAP_ANON|MAP_NOSYNC and has not yet been accessed will panic the machine.
MFC after: 1 day
|
91724 |
06-Mar-2002 |
dillon |
Add a sequential iteration optimization to vm_object_page_clean(). This moderately improves msync's and VM object flushing for objects containing randomly dirtied pages (fsync(), msync(), filesystem update daemon), and improves cpu use for small-ranged sequential msync()s in the face of very large mmap()ings from O(N) to O(1) as might be performed by a database.
A sysctl, vm.msync_flush_flag, has been added and defaults to 3 (the two committed optimizations are turned on by default). 0 will turn off both optimizations.
This code has already been tested under stable and is one in a series of memq / vp->v_dirtyblkhd / fsync optimizations to remove O(N^2) restart conditions that will be coming down the pipe.
MFC after: 3 days
|
91700 |
05-Mar-2002 |
eivind |
* Move bswlist declaration and initialization from kern/vfs_bio.c to vm/vm_pager.c, which is the only place it is used. * Make the QUEUE_* definitions and bufqueues local to vfs_bio.c. * constify buf_wmesg.
|
91641 |
04-Mar-2002 |
alc |
o Create vm_pageq_enqueue() to encapsulate code that is duplicated time and again in vm_page.c and vm_pageq.c. o Delete unusused prototypes. (Mainly a result of the earlier renaming of various functions from vm_page_*() to vm_pageq_*().)
|
91605 |
03-Mar-2002 |
alc |
Call vm_pageq_remove_nowakeup() rather than duplicating it.
|
91569 |
02-Mar-2002 |
alc |
Remove some long dead code.
|
91420 |
27-Feb-2002 |
jhb |
Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.
|
91406 |
27-Feb-2002 |
jhb |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
91403 |
27-Feb-2002 |
silby |
Fix a horribly suboptimal algorithm in the vm_daemon.
In order to determine what to page out, the vm_daemon checks reference bits on all pages belonging to all processes. Unfortunately, the algorithm used reacted badly with shared pages; each shared page would be checked once per process sharing it; this caused an O(N^2) growth of tlb invalidations. The algorithm has been changed so that each page will be checked only 16 times.
Prior to this change, a fork/sleepbomb of 1300 processes could cause the vm_daemon to take over 60 seconds to complete, effectively freezing the system for that time period. With this change in place, the vm_daemon completes in less than a second. Any system with hundreds of processes sharing pages should benefit from this change.
Note that the vm_daemon is only run when the system is under extreme memory pressure. It is likely that many people with loaded systems saw no symptoms of this problem until they reached the point where swapping began.
Special thanks go to dillon, peter, and Chuck Cranor, who helped me get up to speed with vm internals.
PR: 33542, 20393 Reviewed by: dillon MFC after: 1 week
|
91367 |
27-Feb-2002 |
peter |
Back out all the pmap related stuff I've touched over the last few days. There is some unresolved badness that has been eluding me, particularly affecting uniprocessor kernels. Turning off PG_G helped (which is a bad sign) but didn't solve it entirely. Userland programs still crashed.
|
91344 |
27-Feb-2002 |
peter |
Jake further reduced IPI shootdowns on sparc64 in loops by using ranged shootdowns in a couple of key places. Do the same for i386. This also hides some physical addresses from higher levels and has it use the generic vm_page_t's instead. This will help for PAE down the road.
Obtained from: jake (MI code, suggestions for MD part)
|
91263 |
26-Feb-2002 |
peter |
Remove unused variable (td)
|
91063 |
22-Feb-2002 |
phk |
GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED.
|
90944 |
19-Feb-2002 |
tegge |
Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero hold count that would otherwise be on one of the free queues. This eliminates a panic when broken programs unmap memory that still has pending IO from raw devices.
Reviewed by: dillon, alc
|
90937 |
19-Feb-2002 |
silby |
Add one more comment to the OOM changes so that future readers of the code may better understand the code.
Suggested by: dillon MFC after: 1 week
|
90935 |
19-Feb-2002 |
silby |
Changes to make the OOM killer much more effective:
- Allow the OOM killer to target processes currently locked in memory. These very often are the ones doing the memory hogging. - Drop the wakeup priority of processes currently sleeping while waiting for their page fault to complete. In order for the OOM killer to work well, the killed process and other system processes waiting on memory must be allowed to wakeup first.
Reviewed by: dillon MFC after: 1 week
|
90702 |
15-Feb-2002 |
bde |
Garbage-collect options ACPI_NO_ENABLE_ON_BOOT, AML_DEBUG, BLEED, DEVICE_SYSCTLS, KEY, LOUTB, NFS_MUIDHASHSIZ, NFS_UIDHASHSIZ, PCI_QUIET and SIMPLELOCK_DEBUG.
|
90538 |
11-Feb-2002 |
julian |
In a threaded world, differnt priorirites become properties of different entities. Make it so.
Reviewed by: jhb@freebsd.org (john baldwin)
|
90361 |
07-Feb-2002 |
julian |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
90263 |
05-Feb-2002 |
alfred |
Fix a race with free'ing vmspaces at process exit when vmspaces are shared.
Also introduce vm_endcopy instead of using pointer tricks when initializing new vmspaces.
The race occured because of how the reference was utilized: test vmspace reference, possibly block, decrement reference
When sharing a vmspace between multiple processes it was possible for two processes exiting at the same time to test the reference count, possibly block and neither one free because they wouldn't see the other's update.
Submitted by: green
|
90033 |
31-Jan-2002 |
dillon |
GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid buffer cache lockups for over a year now.
MFC after: 0 days
|
89802 |
25-Jan-2002 |
dwmalone |
Remove a parameter name from a prototype.
|
89464 |
17-Jan-2002 |
bde |
Don't declare vm_swapout() in the NO_SWAPPING case when it is not defined.
Fixed some style bugs.
|
89319 |
14-Jan-2002 |
alfred |
Replace ffind_* with fget calls.
Make fget MPsafe.
Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.
Push giant down in fpathconf().
|
89306 |
13-Jan-2002 |
alfred |
SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and adapting it for KSE.
Locks:
1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked.
1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp); /* increments reference count on a file */
struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */
struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
|
88900 |
05-Jan-2002 |
jhb |
Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed:
The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.)
I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly.
Reviewed by: peter Tested on: i386, alpha
|
88318 |
20-Dec-2001 |
dillon |
Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout.
Hopefully MFC: before the 4.5 release
|
87834 |
14-Dec-2001 |
dillon |
This fixes a large number of bugs in our NFS client side code. A recent commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS.
* An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems)
THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641
* The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side)
* Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF.
* When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS)
* We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side)
* vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side)
* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk).
Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week
|
87157 |
01-Dec-2001 |
luigi |
vm/vm_kern.c: rate limit (to once per second) diagnostic printf when you run out of mbuf address space.
kern/subr_mbuf.c: print a warning message when mb_alloc fails, again rate-limited to at most once per second. This covers other cases of mbuf allocation failures. Probably it also overlaps the one handled in vm/vm_kern.c, so maybe the latter should go away.
This warning will let us gradually remove the printf that are scattered across most network drivers to report mbuf allocation failures. Those are potentially dangerous, in that they are not rate-limited and can easily cause systems to panic.
Unless there is disagreement (which does not seem to be the case judging from the discussion on -net so far), and because this is sort of a safety bugfix, I plan to commit a similar change to STABLE during the weekend (it affects kern/uipc_mbuf.c there).
Discussed-with: jlemon, silby and -net
|
86475 |
17-Nov-2001 |
jlemon |
When laying out objects in a ZONE_INTERRUPT zone, allow them to cross a page boundary, since we've already allocated all our contiguous kva space up front. This eliminates some memory wastage, and allows us to actually reach the # of objects were specified in the zinit() call.
Reviewed by: peter, dillon
|
86236 |
09-Nov-2001 |
dillon |
Fix deadlock introduced in 1.73 (Jan 1998). The paging-in-progress count on a vnode-backed object must be incremented *after* obtaining the vnode lock. If it is bumped before obtaining the vnode lock we can deadlock against vtruncbuf().
Submitted by: peter, ps MFC after: 3 days
|
86092 |
05-Nov-2001 |
dillon |
Adjust vnode_pager_input_smlfs() to not attempt to BMAP blocks beyond the file EOF. This works around a bug in the ISOFS (CDRom) BMAP code which returns bogus values for requests beyond the file EOF rather then returning an error, resulting in either corrupt data being mmap()'d beyond the file EOF or resulting in a seg-fault on the last page of a mmap()'d file (mmap()s of CDRom files).
Reported by: peter / Yahoo MFC after: 3 days
|
85762 |
31-Oct-2001 |
dillon |
Don't let pmap_object_init_pt() exhaust all available free pages (allocating pv entries w/ zalloci) when called in a loop due to an madvise(). It is possible to completely exhaust the free page list and cause a system panic when an expected allocation fails.
|
85541 |
26-Oct-2001 |
dillon |
Move recently added procedure which was incorrectly placed within an #ifdef DDB block.
|
85517 |
26-Oct-2001 |
dillon |
Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect.
Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%.
Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%.
(more optimization work is needed on top of these fixes)
MFC after: 1 week
|
85272 |
21-Oct-2001 |
dillon |
Syntax cleanup and documentation, no operational changes.
MFC after: 1 day
|
85227 |
20-Oct-2001 |
iedowse |
Move the code that computes the system load average from vm_meter.c to kern_synch.c in preparation for adding some jitter to the inter-sample time.
Note that the "vm.loadavg" sysctl still lives in vm_meter.c which isn't the right place, but it is appropriate for the current (bad) name of that sysctl.
Suggested by: jhb (some time ago) Reviewed by: bde
|
85070 |
17-Oct-2001 |
dillon |
contigmalloc1() could cause the vm_page_zero_count to become incorrect. Properly track the count.
Submitted by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>
|
85016 |
15-Oct-2001 |
tegge |
Don't use an uninitialized field reserved for callers in the bio structure passed to swap_pager_strategy(). Instead, use a field reserved for drivers and initialize it before usage.
Reviewed by: dillon
|
84933 |
14-Oct-2001 |
tegge |
Don't remove all mappings of a swapped out process if the vm map contained wired entries. vm_fault_unwire() depends on the mapping being intact.
Reviewed by: dillon
|
84932 |
14-Oct-2001 |
tegge |
Fix locking violations during page wiring:
- vm map entries are not valid after the map has been unlocked.
- An exclusive lock on the map is needed before calling vm_map_simplify_entry().
Fix cleanup after page wiring failure to unwire all pages that had been successfully wired before the failure was detected.
Reviewed by: dillon
|
84869 |
13-Oct-2001 |
dillon |
Makes contigalloc[1]() create the vm_map / underlying wired pages in the kernel map and object in a manner that contigfree() is actually able to free. Previously contigfree() freed up the KVA space but could not unwire & free the underlying VM pages due to mismatched pageability between the map entry and the VM pages.
Submitted by: Thomas Moestl <tmoestl@gmx.net> Testing by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu> MFC after: 3 days
|
84854 |
12-Oct-2001 |
dillon |
Finally fix the VM bug where a file whos EOF occurs in the middle of a page would sometimes prevent a dirty page from being cleaned, even when synced, resulting in the dirty page being re-flushed to disk every 30-60 seconds or so, forever. The problem is that when the filesystem flushes a page to its backing file it typically does not clear dirty bits representing areas of the page that are beyond the file EOF. If the file is also mmap()'d and a fault is taken, vm_fault (properly, is required to) set the vm_page_t->dirty bits to VM_PAGE_BITS_ALL. This combination could leave us with an uncleanable, unfreeable page.
The solution is to have the vnode_pager detect the edge case and manually clear the dirty bits representing areas beyond the file EOF. The filesystem does the rest and the page comes up clean after the write completes.
MFC after: 3 days
|
84827 |
11-Oct-2001 |
jhb |
Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
|
84812 |
11-Oct-2001 |
jhb |
Add missing includes of sys/ktr.h.
|
84783 |
10-Oct-2001 |
ps |
Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable.
Reviewed by: peter MFC after: 2 weeks
|
84488 |
04-Oct-2001 |
iedowse |
Remove the SSLEEP case from the load average computation. This has been a no-op for as long as our CVS history goes back. Processes in state SSLEEP could only be counted if p_slptime == 0, but immediately before loadav() is called, schedcpu() has just incremented p_slptime on all SSLEEP processes.
|
83986 |
26-Sep-2001 |
rwatson |
o Modify access control checks in mmap() to use securelevel_gt() instead of direct variable access.
Obtained from: TrustedBSD Project
|
83366 |
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
83276 |
10-Sep-2001 |
peter |
Rip some well duplicated code out of cpu_wait() and cpu_exit() and move it to the MI area. KSE touched cpu_wait() which had the same change replicated five ways for each platform. Now it can just do it once. The only MD parts seemed to be dealing with fpu state cleanup and things like vm86 cleanup on x86. The rest was identical.
XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional stub in place.
Reviewed by: jake, tmm, dillon
|
82756 |
01-Sep-2001 |
jhb |
Process priority is locked by the sched_lock, not the proc lock.
|
82699 |
31-Aug-2001 |
dillon |
make swapon() MPSAFE (will adjust syscalls.master later)
|
82697 |
31-Aug-2001 |
dillon |
mark obreak() and ovadvise() as being MPSAFE
|
82612 |
31-Aug-2001 |
dillon |
Cleanup
|
82314 |
25-Aug-2001 |
peter |
Implement idle zeroing of pages. I've been tinkering with this on and off since John Dyson left his work-in-progress.
It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on.
There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise.
This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop.
I cleaned up the includes a fair bit here too.
|
82290 |
24-Aug-2001 |
dillon |
Remove support for the badly broken MAP_INHERIT (from -current only).
|
82127 |
22-Aug-2001 |
dillon |
Move most of the kernel submap initialization code, including the timeout callwheel and buffer cache, out of the platform specific areas and into the machine independant area. i386 and alpha adjusted here. Other cpus can be fixed piecemeal.
Reviewed by: freebsd-smp, jake
|
82126 |
22-Aug-2001 |
dillon |
KASSERT if vm_page_t->wire_count overflows.
|
81933 |
20-Aug-2001 |
dillon |
Limit the amount of KVM reserved for the buffer cache and for swap-meta information. The default limits only effect machines with > 1GB of ram and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and kern.maxbcache. This has the effect of leaving more KVM available for sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad adds memory to a machine and then sees the kernel panic on boot due to running out of KVM.
Also change the default swap-meta auto-sizing calculation to allocate half of what it was previously allocating. The prior defaults were way too high. Note that we cannot afford to run out of swap-meta structures so we still stay somewhat conservative here.
|
81399 |
10-Aug-2001 |
jhb |
- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused.
Reviewed by: jasone, peter
|
81397 |
10-Aug-2001 |
jhb |
- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused.
Reviewed by: jasone, peter
|
81148 |
05-Aug-2001 |
tmm |
Add a missing semicolon to unbreak the kernel build with INVARIANTS (which was unfortunately turned off in the confguration I used for the last test build).
Spotted by: jake Pointy hat to: tmm
|
81140 |
04-Aug-2001 |
jhb |
Whitespace fixes.
|
81136 |
04-Aug-2001 |
tmm |
Add a zdestroy() function to the zone allocator. This is needed for the unload case of modules that use their own zones. It has been tested with the nfs module.
|
81029 |
02-Aug-2001 |
alfred |
Fixups for the initial allocation by dillon: 1) allocate fewer buckets 2) when failing to allocate swap zone, keep reducing the zone by a third rather than a half in order to reduce the chance of allocating way too little.
I also moved around some code for readability.
Suggested by: dillon Reviewed by: dillon
|
80705 |
31-Jul-2001 |
jake |
Oops. Last commit to vm_object.c should have got these files too.
Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous.
Discussed with: dillon
|
80704 |
31-Jul-2001 |
jake |
Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous.
Discussed with: dillon
|
80517 |
28-Jul-2001 |
iedowse |
Permit direct swapping to NFS regular files using swapon(2). We already allow this for NFS swap configured via BOOTP, so it is known to work fine.
For many diskless configurations is is more flexible to have the client set up swapping itself; it can recreate a sparse swap file to save on server space for example, and it works with a non-NFS root filesystem such as an in-kernel filesystem image.
|
80204 |
23-Jul-2001 |
assar |
make vm_page_select_cache static
Requested by: bde
|
80089 |
21-Jul-2001 |
assar |
(vm_page_select_cache): add prototype
|
79744 |
15-Jul-2001 |
benno |
The i386-specific includes in this file were "fixed" by bracketing them with #ifndef __alpha__. Fix this for the rest of the world by turning it into #ifdef __i386__.
Reviewed by: obrien
|
79443 |
09-Jul-2001 |
des |
Fix missing newline and terminator at the end of the vm.zone sysctl.
|
79273 |
05-Jul-2001 |
mjacob |
Apply field bandages to the includes so compiles happen on alpha.
|
79265 |
05-Jul-2001 |
dillon |
Move vm_page_zero_idle() from machine-dependant sections to a machine-independant source file, vm/vm_zeroidle.c. It was exactly the same for all platforms and updating them all was getting annoying.
|
79263 |
04-Jul-2001 |
dillon |
Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc). Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system.
vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)
|
79248 |
04-Jul-2001 |
dillon |
Change inlines back into mainline code in preparation for mutexing. Also, most of these inlines had been bloated in -current far beyond their original intent. Normalize prototypes and function declarations to be ANSI only (half already were). And do some general cleanup.
(kernel size also reduced by 50-100K, but that isn't the prime intent)
|
79242 |
04-Jul-2001 |
dillon |
whitespace / register cleanup
|
79224 |
04-Jul-2001 |
dillon |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
79127 |
03-Jul-2001 |
jhb |
Fix a XXX comment by moving the initialization of the number of pbuf's for the vnode pager to a new vnode pager init method instead of making it a hack in getpages().
|
78622 |
22-Jun-2001 |
jhb |
- Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex. - Don't drop the vm mutex while grabbing the pbuf mutex to manipulate said variables.
|
78592 |
22-Jun-2001 |
bmilekic |
Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages:
o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks.
Additional things changed with this addition:
- Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl.
TODO (in order of priority):
- Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc.
Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/
|
78521 |
20-Jun-2001 |
jhb |
Don't lock around swap_pager_swap_init() that is only called once during the pagedaemon's startup code since it calls malloc which results in lock order reversals.
|
78481 |
20-Jun-2001 |
jhb |
Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for now. The proc locking isn't actually safe yet and won't be until the proc locking is finished.
|
78099 |
11-Jun-2001 |
dillon |
Cleanup the tabbing
|
77948 |
09-Jun-2001 |
dillon |
Two fixes to the out-of-swap process termination code. First, start killing processes a little earlier to avoid a deadlock. Second, when calculating the 'largest process' do not just count RSS. Instead count the RSS + SWAP used by the process. Without this the code tended to kill small inconsequential processes like, oh, sshd, rather then one of the many 'eatmem 200MB' I run on a whim :-). This fix has been extensively tested on -stable and somewhat tested on -current and will be MFCd in a few days.
Shamed into fixing this by: ps
|
77604 |
01-Jun-2001 |
tmm |
Change the way information about swap devices is exported to be more canonical: define a versioned struct xswdev, and add a sysctl node handler that allows the user to get this structure for a certain device index by specifying this index as last element of the MIB. This new node handler, vm.swap_info, replaces the old vm.nswapdev and vm.swapdevX.* (where X was the index) sysctls.
|
77582 |
01-Jun-2001 |
tmm |
Clean up the code exporting interrupt statistics via sysctl a bit: - move the sysctl code to kern_intr.c - do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine the length of the intrcnt array - move the declarations of intrnames, eintrnames, intrcnt and eintrcnt from machine-dependent include files to sys/interrupt.h - remove the hw.nintr sysctl, it is not needed. - fix various style bugs
Requested by: bde Reviewed by: bde (some time ago)
|
77398 |
29-May-2001 |
jhb |
Don't hold the VM lock across VOP's and other things that can sleep.
|
77139 |
24-May-2001 |
jhb |
Stick VM syscalls back under Giant if the BLEED option is not defined.
|
77115 |
24-May-2001 |
dillon |
This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece.
Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency.
I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet.
Submitted by: tegge, dillon
|
77094 |
23-May-2001 |
jhb |
- Assert Giant is held in the vnode pager methods. - Lock the VM while walking down a vm_object's backing_object list in vnode_pager_lock().
|
77093 |
23-May-2001 |
jhb |
- Add in several asserts of vm_mtx. - Assert Giant in vm_pageout_scan() for the vnode hacking that it does. - Don't hold vm_mtx around vget() or vput(). - Lock Giant when calling vm_pageout_scan() from the pagedaemon. Also, lock curproc while setting the P_BUFEXHAUST flag. - For now we still hold Giant for all of the vm_daemon. When process limits are locked we will be only need Giant for swapout_procs().
|
77091 |
23-May-2001 |
jhb |
- Assert that the vm lock is held for all of _vm_object_allocate(). - Restore the previous order of setting up a new vm_object. The previous had a small bug where we zero'd out the flags after we set the OBJ_ONEMAPPING flag. - Add several asserts of vm_mtx. - Assert Giant is held rather than locking and unlocking it in a few places. - Add in some #ifdef objlocks code to lock individual vm objects when vm objects each have their own lock someday. - Don't bother acquiring the allproc lock for a ddb command. If DDB blocked on the lock, that would be worse than having an inconsistent allproc list.
|
77090 |
23-May-2001 |
jhb |
- Add lots of vm_mtx assertions. - Add a few KTR tracepoints to track the addition and removal of vm_map_entry's and the creation adn free'ing of vmspace's. - Adjust a few portions of code so that we update the process' vmspace pointer to its new vmspace before freeing the old vmspace.
|
77089 |
23-May-2001 |
jhb |
- Lock the VM around the pmap_swapin_proc() call in faultin(). - Don't lock Giant in the scheduler() function except for when calling faultin(). - In swapout_procs(), lock the VM before the proccess to avoid a lock order violation. - In swapout_procs(), release the allproc lock before calling swapout(). We restart the process scan after swapping out a process. - In swapout_procs(), un #if 0 the code to bump the vmspace reference count and lock the process' vm structures. This bug was introduced by me and could result in the vmspace being free'd out from under a running process. - Fix an old bug where the vmspace reference count was not free'd if we failed the swap_idle_threshold2 test.
|
77088 |
23-May-2001 |
jhb |
- Fix the sw_alloc_interlock to actually lock itself when the lock is acquired. - Assert Giant is held in the strategy, getpages, and putpages methods and the getchainbuf, flushchainbuf, and waitchainbuf functions. - Always call flushchainbuf() w/o the VM lock.
|
77087 |
23-May-2001 |
jhb |
Assert Giant is held for the device pager alloc and getpages methods since we call the mmap method of the cdevsw of the device we are mmap'ing.
|
77083 |
23-May-2001 |
jhb |
- Obtain Giant in mmap() syscall while messing with file descriptors and vnodes. - Fix an old bug that would leak a reference to a fd if the vnode being mmap'd wasn't of type VREG or VCHR. - Lock Giant in vm_mmap() around calls into the VM that can call into pager routines that need Giant or into other VM routines that need Giant. - Replace code that used a goto to jump around the else branch of a test to use an else branch instead.
|
77080 |
23-May-2001 |
jhb |
Acquire Giant around vm_map_remove() inside of the obreak() syscall for vm_object_terminate().
|
77077 |
23-May-2001 |
jhb |
Take a more conservative approach and still lock Giant around VM faults for now.
|
77062 |
23-May-2001 |
jhb |
Set the phys_pager_alloc_lock to 1 when it is acquired so that it is actually locked.
|
77036 |
23-May-2001 |
alfred |
aquire Giant when playing with the buffercache and doing IO. use msleep against the vm mutex while waiting for a page IO to complete.
|
77010 |
22-May-2001 |
alfred |
aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodone with the mutex held.
|
76981 |
22-May-2001 |
jhb |
Remove duplicate include and sort includes.
|
76978 |
22-May-2001 |
jhb |
Sort includes.
|
76974 |
22-May-2001 |
jhb |
Unlock the VM lock at the end of munlock() instead of locking it again.
|
76973 |
22-May-2001 |
jhb |
Sort includes from previous commit.
|
76949 |
22-May-2001 |
jhb |
Sort includes.
|
76827 |
19-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
76778 |
18-May-2001 |
jhb |
- Use a timeout for the tsleep in scheduler() instead of having vmmeter() wakeup proc0 by hand to enforce the timeout. - When swapping out a process, keep the process locked via the proc lock from the first checks up until we clear PS_INMEM and set PS_SWAPPING in swapout(). The swapout() function now must be called with the proc lock held and releases it before returning. - Comment out the code to attempt to lock a process' VM structures before swapping out. It is broken in that it releases the lock after obtaining it. If it does grab the lock, it needs to hand it off to swapout() instead of releasing it. This can be revisisted when the VM is locked as this is a valid test to perform. It also causes a lock order reversal for the time being, which is the immediate cause for temporarily disabling it.
|
76773 |
17-May-2001 |
jhb |
During the code to pick a process to kill when memory is exhausted, keep the process in question locked as soon as we find it and determine it to be eligible until we actually kill it. To avoid deadlock, we don't block on the process lock but skip any process that is already locked during our search.
|
76641 |
15-May-2001 |
jhb |
- Use PROC_LOCK_ASSERT instead of a direct mtx_assert. - Don't hold Giant in the swapper daemon while we walk the list of processes looking for a process to swap back in. - Don't bother grabbing the sched_lock while checking a process' sleep time in swapout_procs() to ensure that a process has been idle for at least swap_idle_threshold2 before swapping it out. If we lose the race we just let a process stay in memory until the next call of swapout_procs(). - Remove some unneeded spl's, sched_lock does all the locking needed in this case.
|
76322 |
06-May-2001 |
phk |
Actually biofinish(struct bio *, struct devstat *, int error) is more general than the bioerror().
Most of this patch is generated by scripts.
|
76244 |
03-May-2001 |
markm |
Putting sys/lockmgr.h in here allows us to depollute userland includes a bit. OK'ed by: bde
|
76166 |
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
76117 |
29-Apr-2001 |
grog |
Revert consequences of changes to mount.h, part 2.
Requested by: bde
|
76084 |
27-Apr-2001 |
alfred |
Address a number of problems with sysctl_vm_zone().
The zone allocator's locks should be leaflocks, meaning that they should never be held when entering into another subsystem, however the sysctl grabs the zone global mutex and individual zone mutexes while holding the lock it calls SYSCTL_OUT which recurses into the VM subsystem in order to wire user memory to do a safe copy. This can block and cause lock order reversals.
To fix this: lock zone global. get a count of the number of zones. unlock global. allocate temporary storage. format and SYSCTL_OUT the banner. lock global. traverse list. make sure we haven't looped more than the initial count taken to avoid overflowing the allocated buffer. lock each nodes. read values and format into buffer. unlock individual node. unlock global. format and SYSCTL_OUT the rest of the data. free storage. return.
Other problems included not checking for errors when doing sysctl out of the column header. Fixed.
Inconsistant termination of the copied string. Fixed.
Objected to by: des (for not using sbuf)
Since the output is not variable length and I'm actually over allocating signifigantly and I'd like to get this fixed now, I'll work on the sbuf convertion at a later date. I would not object to someone else taking it upon themselves to convert it to sbuf. I hold no MAINTIANER rights to this code (for now).
|
75858 |
23-Apr-2001 |
grog |
Correct #includes to work with fixed sys/mount.h.
|
75692 |
19-Apr-2001 |
alfred |
vnode_pager_freepage() is really vm_page_free() in disguise, nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
|
75675 |
18-Apr-2001 |
alfred |
Protect pager object creation with sx locks.
Protect pager object list manipulation with a mutex.
It doesn't look possible to combine them under a single sx lock because creation may block and we can't have the object list manipulation block on anything other than a mutex because of interrupt requests.
|
75644 |
18-Apr-2001 |
alfred |
Fix the botched rev 1.59 where I made it such that without INVARIANTS the map is never locked.
Submitted by: tegge
|
75580 |
17-Apr-2001 |
phk |
This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing.
This patch takes a more general approach and adds a bp->b_op vector where more methods can be added.
The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
|
75523 |
15-Apr-2001 |
alfred |
use TAILQ_FOREACH, fix a comment's location
|
75477 |
13-Apr-2001 |
alfred |
if/panic -> KASSERT
|
75474 |
13-Apr-2001 |
alfred |
protect pbufs and associated counts with a mutex
|
75473 |
13-Apr-2001 |
alfred |
use %p for pointer printf, include sys/systm.h for printf proto
|
75462 |
13-Apr-2001 |
alfred |
Use a macro wrapper over printf along with KASSERT to reduce the amount of code here.
|
75452 |
12-Apr-2001 |
alfred |
remove truncated part from commment
|
74927 |
28-Mar-2001 |
jhb |
Convert the allproc and proctree locks from lockmgr locks to sx locks.
|
74914 |
28-Mar-2001 |
jhb |
Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
|
74670 |
23-Mar-2001 |
tmm |
Export intrnames and intrcnt as sysctls (hw.nintr, hw.intrnames and hw.intrcnt).
Approved by: rwatson
|
74237 |
14-Mar-2001 |
dillon |
Fix a lock reversal problem in the VM subsystem related to threaded programs. There is a case during a fork() which can cause a deadlock.
From Tor - The workaround that consists of setting a flag in the vm map that indicates that a fork is in progress and using that mark in the page fault handling to force a revalidation failure. That change will only affect (pessimize) page fault handling during fork for threaded (linuxthreads style) applications and applications using aio_*().
Submited by: tegge
|
74235 |
14-Mar-2001 |
dillon |
Temporarily remove the vm_map_simplify() call from vm_map_insert(). The call is correct, but it interferes with the massive hack called vm_map_growstack(). The call will be returned after our stack handling code is fixed.
Reported by: tegge
|
74042 |
09-Mar-2001 |
iedowse |
When creating a shadow vm_object in vmspace_fork(), only one reference count was transferred to the new object, but both the new and the old map entries had pointers to the new object. Correct this by transferring the second reference.
This fixes a panic that can occur when mmap(2) is used with the MAP_INHERIT flag.
PR: i386/25603 Reviewed by: dillon, alc
|
73936 |
07-Mar-2001 |
jhb |
Unrevert the pmap_map() changes. They weren't broken on x86.
Sense beaten into me by: peter
|
73903 |
07-Mar-2001 |
jhb |
Back out the pmap_map() change for now, it isn't completely stable on the i386.
|
73862 |
06-Mar-2001 |
jhb |
- Rework pmap_map() to take advantage of direct-mapped segments on supported architectures such as the alpha. This allows us to save on kernel virtual address space, TLB entries, and (on the ia64) VHPT entries. pmap_map() now modifies the passed in virtual address on architectures that do not support direct-mapped segments to point to the next available virtual address. It also returns the actual address that the request was mapped to. - On the IA64 don't use a special zone of PV entries needed for early calls to pmap_kenter() during pmap_init(). This gets us in trouble because we end up trying to use the zone allocator before it is initialized. Instead, with the pmap_map() change, the number of needed PV entries is small enough that we can get by with a static pool that is used until pmap_init() is complete.
Submitted by: dfr Debugging help: peter Tested by: me
|
73534 |
04-Mar-2001 |
alfred |
Simplify vm_object_deallocate(), by decrementing the refcount first. This allows some of the conditionals to be combined.
|
73282 |
01-Mar-2001 |
gallatin |
Allocate vm_page_array and vm_page_buckets from the end of the biggest chunk of memory, rather than from the start.
This fixes problems allocating bouncebuffers on alphas where there is only 1 chunk of memory (unlike PCs where there is generally at least one small chunk and a large chunk). Having 1 chunk had been fatal, because these structures take over 13MB on a machine with 1GB of ram. This doesn't leave much room for other structures and bounce buffers if they're at the front.
Reviewed by: dfr, anderson@cs.duke.edu, silence on -arch Tested by: Yoriaki FUJIMORI <fujimori@grafin.fujimori.cache.waseda.ac.jp>
|
73212 |
28-Feb-2001 |
dillon |
If we intend to make the page writable without requiring another fault, make sure that PG_NOSYNC is properly set. Previously we only set it for a write-fault, but this can occur on a read-fault too. (will be MFCd prior to 4.3 freeze)
|
72949 |
23-Feb-2001 |
rwatson |
Introduce per-swap area accounting in the VM system, and export this information via the vm.nswapdev sysctl (number of swap areas) and vm.swapdevX nodes (where X is the device), which contain the MIBs dev, blocks, used, and flags. These changes are required to allow top and other userland swap-monitoring utilities to run without setgid kmem.
Submitted by: Thomas Moestl <tmoestl@gmx.net> Reviewed by: freebsd-audit
|
72888 |
22-Feb-2001 |
des |
Fix formatting bugs introduced in sysctl_vm_zone() by the previous commit. Also, if SYSCTL_OUT() returns a non-zero value, stop at once.
|
72376 |
12-Feb-2001 |
jake |
Implement a unified run queue and adjust priority levels accordingly.
- All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
|
72200 |
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
71999 |
04-Feb-2001 |
phk |
Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details.
Created with: sed(1) Reviewed by: md5(1)
|
71983 |
04-Feb-2001 |
dillon |
This commit represents work mainly submitted by Tor and slightly modified by myself. It solves a serious vm_map corruption problem that can occur with the buffer cache when block sizes > 64K are used. This code has been heavily tested in -stable but only tested somewhat on -current. An MFC will occur in a few days. My additions include the vm_map_simplify_entry() and minor buffer cache boundry case fix.
Make the buffer cache use a system map for buffer cache KVM rather then a normal map.
Ensure that VM objects are not allocated for system maps. There were cases where a buffer map could wind up with a backing VM object -- normally harmless, but this could also result in the buffer cache blocking in places where it assumes no blocking will occur, possibly resulting in corrupted maps.
Fix a minor boundry case in the buffer cache size limit is reached that could result in non-optimal code.
Add vm_map_simplify_entry() calls to prevent 'creeping proliferation' of vm_map_entry's in the buffer cache's vm_map. Previously only a simple linear optimization was made. (The buffer vm_map typically has only a handful of vm_map_entry's. This stabilizes it at that level permanently).
PR: 20609 Submitted by: (Tor Egge) tegge
|
71610 |
25-Jan-2001 |
jhb |
- Doh, lock faultin() with proc lock in scheduler(). - Lock p_swtime with sched_lock in scheduler() as well.
|
71576 |
24-Jan-2001 |
jasone |
Convert all simplelocks to mutexes and remove the simplelock implementations.
|
71574 |
24-Jan-2001 |
jhb |
Argh, I didn't get this test right when I converted it. Break this up into two separate if's instead of nested if's. Also, reorder things slightly to avoid unnecessary mutex operations.
|
71572 |
24-Jan-2001 |
jhb |
- Catch up to proc flag changes. - Minimal proc locking. - Use queue macros.
|
71571 |
24-Jan-2001 |
jhb |
Add mtx_assert()'s to verify that kmem_alloc() and kmem_free() are called with Giant held.
|
71570 |
24-Jan-2001 |
jhb |
- Catch up to proc flag changes. - Proc locking in a few places. - faultin() now must be called with the proc lock held. - Split up swappable() into a couple of tests so that it can be locke in swapout_procs(). - Use queue macros.
|
71569 |
24-Jan-2001 |
jhb |
- Catch up to proc flag changes.
|
71512 |
24-Jan-2001 |
jhb |
Add missing include.
|
71429 |
23-Jan-2001 |
ume |
Add mibs to hold the number of forks since boot. New mibs are:
vm.stats.vm.v_forks vm.stats.vm.v_vforks vm.stats.vm.v_rforks vm.stats.vm.v_kthreads vm.stats.vm.v_forkpages vm.stats.vm.v_vforkpages vm.stats.vm.v_rforkpages vm.stats.vm.v_kthreadpages
Submitted by: Paul Herman <pherman@frenchfries.net> Reviewed by: alfred
|
71408 |
23-Jan-2001 |
jake |
Sigh. atomic_add_int takes a pointer, not an integer.
Pointy-hat-to: des
|
71406 |
23-Jan-2001 |
des |
Use atomic operations to update the stat counters.
|
71362 |
22-Jan-2001 |
des |
Call vm_zone_init() at the appropriate time.
Reviewed by: jasone, jhb
|
71361 |
22-Jan-2001 |
des |
Give this code a major facelift:
- replace the simplelock in struct vm_zone with a mutex.
- use a proper SLIST rather than a hand-rolled job for the zone list.
- add a subsystem lock that protects the zone list and the statistics counters.
- merge _zalloc() into zalloc() and _zfree() into zfree(), and move them below _zget() so there's no need for a prototype.
- add two initialization functions: one which initializes the subsystem mutex and the zone list, and one that currently doesn't do anything.
- zap zerror(); use KASSERTs instead.
- dike out half of sysctl_vm_zone(), which was mostly trying to do manually what the snprintf() call could do better.
Reviewed by: jhb, jasone
|
71350 |
21-Jan-2001 |
des |
First step towards an MP-safe zone allocator: - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant.
Reviewed by: bmilekic, jasone
|
70480 |
29-Dec-2000 |
alfred |
fix comment which was outdated 3 years ago remove useless assignment purge entire file of 'register' keyword
|
70478 |
29-Dec-2000 |
alfred |
clean up kmem_suballoc(): remove useless assignment remove 'register' variables
|
70390 |
27-Dec-2000 |
assar |
Make zalloc and zfree non-inline functions. This avoids having to have the code calling these be compiled with the same setting for INVARIANTS and SMP.
Reviewed by: dillon
|
70374 |
26-Dec-2000 |
dillon |
This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit.
Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things.
NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
|
70160 |
18-Dec-2000 |
phk |
Fix floppy drives on machines with lots of RAM.
The fix works by reverting the ordering of free memory so that the chances of contig_malloc() succeeding increases.
PR: 23291 Submitted by: Andrew Atrens <atrens@nortel.ca>
|
69972 |
13-Dec-2000 |
tanimura |
- If swap metadata does not fit into the KVM, reduce the number of struct swblock entries by dividing the number of the entries by 2 until the swap metadata fits.
- Reject swapon(2) upon failure of swap_zone allocation.
This is just a temporary fix. Better solutions include: (suggested by: dillon)
o reserving swap in SWAP_META_PAGES chunks, and o swapping the swblock structures themselves.
Reviewed by: alfred, dillon
|
69947 |
13-Dec-2000 |
jake |
- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
|
69847 |
11-Dec-2000 |
dillon |
Be less conservative with a recently added KASSERT. Certain edge cases with file fragments and read-write mmap's can lead to a situation where a VM page has odd dirty bits, e.g. 0xFC - due to being dirtied by an mmap and only the fragment (representing a non-page-aligned end of file) synced via a filesystem buffer. A correct solution that guarentees consistent m->dirty for the file EOF case is being worked on. In the mean time we can't be so conservative in the KASSERT.
|
69781 |
08-Dec-2000 |
dwmalone |
Convert more malloc+bzero to malloc+M_ZERO.
Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
|
69687 |
06-Dec-2000 |
alfred |
Really fix phys_pager:
Backout the previous delta (rev 1.4), it didn't make any difference.
If the requested handle is NULL then don't add it to the list of objects, to be found by handle.
The problem is that when asking for a NULL handle you are implying you want a new object. Because objects with NULL handles were being added to the list, any further requests for phys backed objects with NULL handles would return a reference to the initial NULL handle object after finding it on the list.
Basically one couldn't have more than one phys backed object without a handle in the entire system without this fix. If you did more than one shared memory allocation using the phys pager it would give you your initial allocation again.
|
69641 |
05-Dec-2000 |
alfred |
need to adjust allocation size to properly deal with non PAGE_SIZE allocations, specifically with allocations < PAGE_SIZE when the code doesn't work properly
|
69517 |
02-Dec-2000 |
bde |
Backed out previous commit. Don't depend on namespace pollution in <sys/buf.h>.
|
69516 |
02-Dec-2000 |
jhb |
Protect p_stat with sched_lock.
|
69509 |
02-Dec-2000 |
jhb |
Protect p_stat with sched_lock.
|
69399 |
30-Nov-2000 |
alfred |
remove unneded sys/ucred.h includes
|
69022 |
22-Nov-2000 |
jake |
Protect the following with a lockmgr lock:
allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid
Reviewed by: jhb Obtained from: BSD/OS and netbsd
|
68921 |
20-Nov-2000 |
rwatson |
o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT. This removes a reason that systat requires setgid kmem. More to come.
|
68885 |
18-Nov-2000 |
dillon |
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory situations prior to now.
The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode being locked.
Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op.
In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
|
68884 |
18-Nov-2000 |
dillon |
Add the splvm()'s suggested in PR 20609 to protect vm_pager_page_unswapped(). The remainder of the PR is still open.
PR: kern/20609 (partial fix)
|
68883 |
18-Nov-2000 |
dillon |
This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc...
PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
|
68261 |
02-Nov-2000 |
tegge |
Clear the MAP_ENTRY_USER_WIRED flag from cloned vm_map entries. PR: 2840
|
67885 |
29-Oct-2000 |
phk |
Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing the offending inline function (BUF_KERNPROC) on it being #included already.
I'm not sure BUF_KERNPROC() is even the right thing to do or in the right place or implemented the right way (inline vs normal function).
Remove consequently unneeded #includes of <sys/proc.h>
|
67536 |
25-Oct-2000 |
jhb |
- Catch a machine/mutex.h -> sys/mutex.h I somehow missed. - Close a small race condition. The sched_lock mutex protects p->p_stat as well as the run queues. Another CPU could change p_stat of the process while we are waiting for the lock, and we would end up scheduling a process that isn't runnable.
|
67247 |
17-Oct-2000 |
ps |
Implement write combining for crashdumps. This is useful when write caching is disabled on both SCSI and IDE disks where large memory dumps could take up to an hour to complete.
Taking an i386 scsi based system with 512MB of ram and timing (in seconds) how long it took to complete a dump, the following results were obtained:
Before: After: WCE TIME WCE TIME ------------------ ------------------ 1 141.820972 1 15.600111 0 797.265072 0 65.480465
Obtained from: Yahoo! Reviewed by: peter
|
67082 |
13-Oct-2000 |
dillon |
The swap bitmap allocator was not calculating the bitmap size properly in the face of non-stripe-aligned swap areas. The bug could cause a panic during boot.
Refuse to configure a swap area that is too large (67 GB or so)
Properly document the power-of-2 requirement for SWB_NPAGES.
The patch is slightly different then the one Tor enclosed in the P.R., but accomplishes the same thing.
PR: kern/20273 Submitted by: Tor.Egge@fast.no
|
67046 |
12-Oct-2000 |
jasone |
For lockmgr mutex protection, use an array of mutexes that are allocated and initialized during boot. This avoids bloating sizeof(struct lock). As a side effect, it is no longer necessary to enforce the assumtion that lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been removed.
Idea taken from: BSD/OS.
|
66748 |
06-Oct-2000 |
dwmalone |
If a process is over its resource limit for datasize, still allow it to lower its memory usage. This was mentioned on the mailing lists ages ago, and I've lost the name of the person who brought it up.
Reviewed by: alc
|
66615 |
04-Oct-2000 |
jasone |
Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
|
65904 |
15-Sep-2000 |
jhb |
- Add a new process flag P_NOLOAD that marks a process that should be ignored during load average calcuations. - Set this flag for the idle processes and the softinterrupt process.
|
65770 |
12-Sep-2000 |
bp |
Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency.
Reviewed in general by: mckusick, dillon
|
65557 |
07-Sep-2000 |
jasone |
Major update to the way synchronization is done in the kernel. Highlights include:
* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.)
* Per-CPU idle processes.
* Interrupts are run in their own separate kernel threads and can be preempted (i386 only).
Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
|
65103 |
26-Aug-2000 |
obrien |
Make the arguments match the functionality of the functions.
|
63973 |
28-Jul-2000 |
peter |
Minor cleanups: - remove unused variables (fix warnings) - use a more consistant ansi style rather than a mixture - remove dead #if 0 code and declarations
|
63897 |
26-Jul-2000 |
mckusick |
Clean up the snapshot code so that it no longer depends on the use of the SF_IMMUTABLE flag to prevent writing. Instead put in explicit checking for the SF_SNAPSHOT flag in the appropriate places. With this change, it is now possible to rename and link to snapshot files. It is also possible to set or clear any of the owner, group, or other read bits on the file, though none of the write or execute bits can be set. There is also an explicit test to prevent the setting or clearing of the SF_SNAPSHOT flag via chflags() or fchflags(). Note also that the modify time cannot be changed as it needs to accurately reflect the time that the snapshot was taken.
Submitted by: Robert Watson <rwatson@FreeBSD.org>
|
62976 |
11-Jul-2000 |
mckusick |
Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed.
Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
|
62941 |
11-Jul-2000 |
alfred |
#elsif -> #elif
Noticed by: green
|
62622 |
05-Jul-2000 |
jhb |
Support for unsigned integer and long sysctl variables. Update the SYSCTL_LONG macro to be consistent with other integer sysctl variables and require an initial value instead of assuming 0. Update several sysctl variables to use the unsigned types.
PR: 15251 Submitted by: Kelly Yancey <kbyanc@posi.net>
|
62573 |
04-Jul-2000 |
phk |
Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by: bde
|
62568 |
04-Jul-2000 |
jhb |
Replace the PQ_*CACHE options with a single PQ_CACHESIZE option that you set equal to the number of kilobytes in your cache. The old options are still supported for backwards compatibility.
Submitted by: Kelly Yancey <kbyanc@posi.net>
|
62552 |
04-Jul-2000 |
mckusick |
Simplify and rationalise the management of the vnode free list (preparing the code to add snapshots).
|
62454 |
03-Jul-2000 |
phk |
Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our sources:
-sysctl_vm_zone SYSCTL_HANDLER_ARGS +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
|
62067 |
25-Jun-2000 |
markm |
Nifty idea from Jeroen van Gelderen; don't call a routine to check if we are using the /dev/zero device, just check a flag (supplied by /dev/zero). Reviewed by: dfr
|
61272 |
05-Jun-2000 |
hsu |
Add missing increment of allocation counter.
|
61081 |
29-May-2000 |
dillon |
This is a cleanup patch to Peter's new OBJT_PHYS VM object type and sysv shared memory support for it. It implements a new PG_UNMANAGED flag that has slightly different characteristics from PG_FICTICIOUS.
A new sysctl, kern.ipc.shm_use_phys has been added to enable the use of physically-backed sysv shared memory rather then swap-backed. Physically backed shm segments are not tracked with PV entries, allowing programs which use a large shm segment as a rendezvous point to operate without eating an insane amount of KVM in the PV entry management. Read: Oracle.
Peter's OBJT_PHYS object will also allow us to eventually implement page-table sharing and/or 4MB physical page support for such segments. We're half way there.
|
61074 |
29-May-2000 |
dfr |
Brucify the pmap_enter_temporary() changes.
|
61058 |
29-May-2000 |
dillon |
Fix bug in vm_pageout_page_stats() that always resulted in a full scan of the active queue. This fix is not expected to have any noticeable impact on performance.
Noticed by: Rik van Riel <riel@conectiva.com.br>
|
61036 |
28-May-2000 |
dfr |
Add a new pmap entry point, pmap_enter_temporary() to be used during dumps to create temporary page mappings. This replaces the use of CADDR1 which is fairly x86 specific.
Reviewed by: dillon
|
60938 |
26-May-2000 |
jake |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen.
Requested by: msmith and others
|
60833 |
23-May-2000 |
jake |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct.
Suggested by: phk Reviewed by: phk Approved by: mdodd
|
60757 |
21-May-2000 |
peter |
Checkpoint of a new physical memory backed object type, that does not have pv_entries. This is intended for very special circumstances, eg: a certain database that has a 1GB shm segment mapped into 300 processes. That would consume 2GB of kvm just to hold the pv_entries alone. This would not be used on systems unless the physical ram was available, as it's not pageable.
This is a work-in-progress, but is a useful and functional checkpoint. Matt has got some more fixes for it that will be committed soon.
Reviewed by: dillon
|
60755 |
21-May-2000 |
peter |
Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry.
Inspired by: John Dyson's 1998 patches.
Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha).
Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits).
Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files.
Reviewed by: dillon
|
60557 |
14-May-2000 |
dillon |
Fixed bug in madvise() / MADV_WILLNEED. When the request is offset from the base of the first map_entry the call to pmap_object_init_pt() uses the wrong start VA. MFC to follow.
PR: i386/18095
|
60041 |
05-May-2000 |
phk |
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
|
59915 |
03-May-2000 |
phk |
Convert the vm_pager_strategy() interface to take a struct bio instead of a struct buf. Don't try to examine B_ASYNC, it is a layering violation to do so. The only current user of this interface is vn(4) which, since it emulates a disk interface, operates on struct bio already.
|
59866 |
01-May-2000 |
phk |
Move and staticize the bufchain functions so they become local to the only piece of code using them. This will ease a rewrite of them.
|
59794 |
30-Apr-2000 |
phk |
Remove unneeded #include <vm/vm_zone.h>
Generated by: src/tools/tools/kerninclude
|
59496 |
22-Apr-2000 |
wollman |
Implement POSIX.1b shared memory objects. In this implementation, shared memory objects are regular files; the shm_open(3) routine uses fcntl(2) to set a flag on the descriptor which tells mmap(2) to automatically apply MAP_NOSYNC.
Not objected to by: bde, dillon, dufault, jasone
|
59395 |
19-Apr-2000 |
alc |
vm_object_shadow: Remove an incorrect assertion. In obscure circumstances vm_object_shadow can be called on an object with ref_count > 1 and OBJ_ONEMAPPING set. This isn't really a problem for vm_object_shadow.
|
59368 |
18-Apr-2000 |
phk |
Remove unneeded <sys/buf.h> includes.
Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks by 924 bytes.
|
59249 |
15-Apr-2000 |
phk |
Complete the bio/buf divorce for all code below devfs::strategy
Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case.
CCD not converted yet, casts to struct buf (still safe)
atapi-cd casts to struct buf to examine B_PHYS
|
59017 |
04-Apr-2000 |
msmith |
Fix _zget() so that it checks the return from kmem_alloc(), to avoid zttempting to bzero NULL when the kernel map fills up. _zget() will now return NULL as it seems it was originally intended to do.
|
58934 |
02-Apr-2000 |
phk |
Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
|
58708 |
27-Mar-2000 |
dillon |
Add necessary spl protection for swapper. The problem was located by Alfred while testing his SPLASSERT stuff. This is not a complete fix, more protections are probably needed.
|
58705 |
27-Mar-2000 |
charnier |
Revert spelling mistake I made in the previous commit Requested by: Alan and Bruce
|
58634 |
26-Mar-2000 |
charnier |
Spelling
|
58462 |
22-Mar-2000 |
phk |
Fix one place which knew that B_WRITE was zero.
Fix a stylistic mistake of mine while here.
Found by: Stephen Hocking <shocking@prth.pgs.com>
|
58349 |
20-Mar-2000 |
phk |
Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
|
58345 |
20-Mar-2000 |
phk |
Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
|
58132 |
16-Mar-2000 |
phk |
Eliminate the undocumented, experimental, non-delivering and highly dangerous MAX_PERF option.
|
57975 |
13-Mar-2000 |
phk |
Remove unused 3rd argument from vsunlock() which abused B_WRITE.
|
57550 |
28-Feb-2000 |
ps |
Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2). This This feature allows you to specify if mmap'd data is included in an application's corefile.
Change the type of eflags in struct vm_map_entry from u_char to vm_eflags_t (an unsigned int).
Reviewed by: dillon,jdp,alfred Approved by: jkh
|
57263 |
16-Feb-2000 |
dillon |
Fix null-pointer dereference crash when the system is intentionally run out of KVM through a mmap()/fork() bomb that allocates hundreds of thousands of vm_map_entry structures.
Add panic to make null-pointer dereference crash a little more verbose.
Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number of mmap()'d spaces (discrete vm_map_entry's in the process). The value defaults to around 9000 for a 128MB machine. The test is scaled for the number of processes sharing a vmspace (aka linux threads). Setting the value to 0 disables the feature.
PR: kern/16573 Approved by: jkh
|
56599 |
25-Jan-2000 |
dillon |
The swapdev_vp changes made to rip out the swap specfs interaction also broke diskless swapping. Moving the swapdev_vp initialization to more commonly run code solves the problem.
PR: kern/16165 Additional testing by: David Gilbert <dgilbert@velocet.ca>
|
56378 |
21-Jan-2000 |
dillon |
Fix a deadlock between msync(..., MS_INVALIDATE) and vm_fault. The invalidation code cannot wait for paging to complete while holding a vnode lock, so we don't wait. Instead we simply allow the lower level code to simply block on any busy pages it encounters. I think Yahoo may be the only entity in the entire world that actually uses this msync feature :-).
Bug reported by: Paul Saab <paul@mu.org>
|
55756 |
10-Jan-2000 |
phk |
Give vn_isdisk() a second argument where it can return a suitable errno.
Suggested by: bde
|
55351 |
03-Jan-2000 |
guido |
Use MAP_NOSYNC for vnodes without any links in their filesystem.
This is necessary for vmware: it does not use an anonymous mmap for the memory of the virtual system. In stead it creates a temp file an unlinks it. For a 50 MB file, this results in a ot of syncing every 30 seconds.
Reviewed by: Matthew Dillon <dillon@backplane.com>
|
55206 |
29-Dec-1999 |
peter |
Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
|
55175 |
28-Dec-1999 |
peter |
Fix the swap backed vn case - this was broken by my rev 1.128 to swap_pager.c and related commits.
Essentially swap_pager.c is backed out to before the changes, but swapdev_vp is converted into a real vnode with just VOP_STRATEGY(). It no longer abuses specfs vnops and no longer needs a dev_t and /dev/drum (or /dev/swapdev) for the intermediate layer.
This essentially restores the vnode interface as the interface to the bottom of the swap pager, and vm_swap.c provides a clean vnode interface.
This will need to be revisited when we swap to files (vnodes) - which is the other reason for keeping the vnode interface between the swap pager and the swap devices.
OK'ed by: dillon
|
54655 |
15-Dec-1999 |
eivind |
Introduce NDFREE (and remove VOP_ABORTOP)
|
54467 |
12-Dec-1999 |
dillon |
Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to madvise().
This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis.
MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory.
Reviewed by: julian, alc, dg
|
54444 |
11-Dec-1999 |
eivind |
Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked.
This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS.
Discussed with: grog, mch, peter, phk Reviewed by: peter
|
54188 |
06-Dec-1999 |
luoqi |
User ldt sharing.
|
53899 |
29-Nov-1999 |
phk |
Report swapdevices as cdevs rather than bdevs.
Remove unused dev2budev() function.
|
53701 |
25-Nov-1999 |
alc |
Remove nonsensical vm_map_{clear,set}_recursive() calls from vm_map_pageable(). At the point they called, vm_map_pageable() holds a read (or shared) lock on the map. The purpose of vm_map_{clear,set}_recursive() is to disable/enable repeated write (or exclusive) lock requests by the same process.
|
53627 |
23-Nov-1999 |
alc |
Correct the following error: vm_map_pageable() on a COW'ed (post-fork) vm_map always failed because vm_map_lookup() looked at "vm_map_entry->wired_count" instead of "(vm_map_entry->eflags & MAP_ENTRY_USER_WIRED)". The effect was that many page wiring operations by sysctl were (silently) failing.
|
53594 |
22-Nov-1999 |
phk |
Isolate the swapdev_vp "not quite" vnode in the only source file which needs it now that /dev/drum is gone.
Reviewed by: eivind, peter
|
53338 |
18-Nov-1999 |
peter |
Remove the non-functional "swap device" userland front-end to the multiplexed underlying swap devices (/dev/drum). The only thing it did was to allow root to open /dev/drum, but not do anything with it. Various utilities used to grovel around in here, but Matt has written a much nicer (and clean) front-end to this for libkvm, and nothing uses the old system any more.
The VM system was calling VOP_STRATEGY() on the vp of the first underlying swap device (not the /dev/drum one, the first real device), and using the VOP system to indirectly (and only) call swstrategy() to choose an underlying device and enqueue it on that device. I have changed it to avoid diverting through the VOP system and to call the only possible target directly, saving a little bit of time and some complexity.
In all, nothing much changes, except some scaffolding to support the roundabout way of calling swstrategy() is gone.
Matt gave me the ok to do this some time ago, and I apologize for taking so long to get around to it.
|
53074 |
10-Nov-1999 |
alc |
Two changes: (1) Use vm_page_unqueue_nowakeup in vm_page_alloc instead of duplicating the code. (2) If a wired page is passed to vm_page_free_toq, panic instead of printing a friendly warning. (If we don't panic here, we'll just panic later in vm_page_unwire obscuring the problem.)
|
52974 |
08-Nov-1999 |
alc |
Remove unused declarations.
|
52973 |
07-Nov-1999 |
alc |
Remove unused #include's.
Submitted by: phk
|
52960 |
07-Nov-1999 |
alc |
The functions declared by this header file no longer exist.
Submitted by: phk (in part)
|
52649 |
30-Oct-1999 |
alc |
Reverse the sense of the test in the KASSERT's from the last commit.
|
52647 |
30-Oct-1999 |
alc |
The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to eliminate an extra (useless) level of indirection in half of the page queue accesses and (2) to use a single name for each queue throughout, instead of, e.g., "vm_page_queue_active" in some places and "vm_page_queues[PQ_ACTIVE]" in others.
Reviewed by: dillon
|
52644 |
30-Oct-1999 |
phk |
Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the "rw" argument, rather than hijacking B_{READ|WRITE}.
Fix two bugs (physio & cam) resulting by the confusion caused by this.
Submitted by: Tor.Egge@fast.no Reviewed by: alc, ken (partly)
|
52635 |
29-Oct-1999 |
phk |
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs.
This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
52617 |
29-Oct-1999 |
alc |
Remove the last vestiges of "vm_map_t phys_map". It's been unused since i386/i386/machdep.c rev 1.45 (or 1994 :-) ).
|
52568 |
27-Oct-1999 |
alc |
Shrink "struct vm_object" by not spending a full 32 bits on "objtype_t".
|
52035 |
08-Oct-1999 |
phk |
Fix a panic(8) implementation: hexdump -C < /dev/drum by simply refusing to do I/O from userland. a panic. I'm not sure we even need /dev/drum anymore, it seems to have been broken for a long time thi
|
51930 |
04-Oct-1999 |
phk |
Introduce swopen to prevent blockdevice opens and insist on minor==0.
|
51928 |
04-Oct-1999 |
phk |
Give the swap device a D_DISK flag against my better judgement.
TODO: add an open routing which fails for bdev opens.
|
51810 |
30-Sep-1999 |
dt |
Plug an accounting leak: count pages in ZONE_INTERRUPT zones as wired.
|
51658 |
25-Sep-1999 |
phk |
Remove five now unused fields from struct cdevsw. They should never have been there in the first place. A GENERIC kernel shrinks almost 1k.
Add a slightly different safetybelt under nostop for tty drivers.
Add some missing FreeBSD tags
|
51493 |
21-Sep-1999 |
dillon |
cleanup madvise code, add a few more sanity checks.
Reviewed by: Alan Cox <alc@cs.rice.edu>, dg@root.com
|
51488 |
21-Sep-1999 |
dillon |
Final commit to remove vnode->v_lastr. vm_fault now handles read clustering issues (replacing code that used to be in ufs/ufs/ufs_readwrite.c). vm_fault also now uses the new VM page counter inlines.
This completes the changeover from vnode->v_lastr to vm_entry_t->v_lastr for VM, and fp->f_nextread and fp->f_seqcount (which have been in the tree for a while). Determination of the I/O strategy (sequential, random, and so forth) is now handled on a descriptor-by-descriptor basis for base I/O calls, and on a memory-region-by-memory-region and process-by-process basis for VM faults.
Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
|
51474 |
20-Sep-1999 |
dillon |
Fix bug in pipe code relating to writes of mmap'd but illegal address spaces which cross a segment boundry in the page table. pmap_kextract() is not designed for access to the user space portion of the page table and cannot handle the null-page-directory-entry case.
The fix is to have vm_fault_quick() return a success or failure which is then used to avoid calling pmap_kextract().
|
51343 |
17-Sep-1999 |
dillon |
Remove inappropriate VOP_FSYNC from vm_object_page_clean(). The fsync syncs the entire underlying file rather then just the requested range, resulting in huge inefficiencies when the VM system is articulated in a certain way. The VOP_FSYNC was also found to massively reduce NFS performance in certain cases.
Change MADV_DONTNEED and MADV_FREE to call vm_page_dontneed() instead of vm_page_deactivate(). Using vm_page_deactivate() causes all inactive and cache pages to be recycled before the dontneed/free page is recycled, effectively flushing our entire VM inactive & cache queues continuously even if only a few pages are being actively MADV free'd and reused (such as occurs with a sequential scan of a memory-mapped file).
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
51342 |
17-Sep-1999 |
dillon |
Add 'lastr' field to vm_map_entry in preparation for its removal from the vnode. (The changeover is undergoing final testing and will be committed soon).
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
51340 |
17-Sep-1999 |
dillon |
The vnode pager (used when you do file-backed mmaps) must use the underlying physical sector size when aligning I/O transfer sizes. It cannot assume 512 bytes.
We assume the underlying sector size is a power of 2. If it isn't, mmap() will break badly anyway (in the same way mmap broke with NFS when NFS tried to cache piecemeal write ranges in buffers, before we enforced read-buffer-before-write-piecemeal for NFS).
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
51339 |
17-Sep-1999 |
dillon |
Fix a number of spl bugs related to reserving and freeing swap space. Swap space can be freed from an interrupt and so swap reservation and freeing must occur at splvm.
Add swap_pager_reserve() code to support a new swap pre-reservation capability for the VN device.
Generally cleanup the swap code by simplifying the swp_pager_meta_build() static function and consolidating the SWAPBLK_NONE test from a bit test to an absolute compare. The bit test was left over from a rejected swap allocation scheme that was not ultimately committed. A few other minor cleanups were also made.
Reorganize the swap strategy code, again for VN support, to not reallocate swap when writing as this messes up pre-reservation and can fragment I/O unnecessarily as VN-baesd disk is messed around with.
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
51338 |
17-Sep-1999 |
dillon |
Add required BUF_KERNPROC to flushchainbuf() to disassociate the current process from the exclusive lock prior to initiating I/O.
This fixes a panic related to swap-backed VN disks
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
51337 |
17-Sep-1999 |
dillon |
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
Replace various VM related page count calculations strewn over the VM code with inlines to aid in readability and to reduce fragility in the code where modules depend on the same test being performed to properly sleep and wakeup.
Split out a portion of the page deactivation code into an inline in vm_page.c to support vm_page_dontneed().
add vm_page_dontneed(), which handles the madvise MADV_DONTNEED feature in a related commit coming up for vm_map.c/vm_object.c. This code prevents degenerate cases where an essentially active page may be rotated through a subset of the paging lists, resulting in premature disposal.
|
50477 |
28-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
50405 |
26-Aug-1999 |
phk |
Simplify the handling of VCHR and VBLK vnodes using the new dev_t:
Make the alias list a SLIST.
Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore.
Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively.
Make the revoke syscalls use vcount() instead of VALIASED.
Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag.
vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one.
Print the devicename in specfs/vprint().
Remove a couple of stale LFS vnode flags.
Remove unimplemented/unused LK_DRAINED;
|
50301 |
24-Aug-1999 |
green |
When the SYSINIT() was removed, it was replaced with a make_dev on-demand creation of /dev/drum via calling swapon. However, the make_dev has a bogus (insofar that it hasn't been added yet) cdevsw, so later we end up crashing with a null pointer dereference on the swap vp's specinfo. The specinfo points to a dev_t with a major of 254 (uninitialized), and we get a crash on its d_strategy being called.
The simple solution to this is to call cdevsw_add before the make_dev is ever used. This fixes the panic which occurred upon swapping.
|
50269 |
23-Aug-1999 |
bde |
Use devtoname to print dev_t's instead of casting them to u_long for misprinting with %lx.
Cast pointers to intptr_t instead of casting them to long. Cosmetic.
|
50254 |
23-Aug-1999 |
phk |
Convert DEVFS hooks in (most) drivers to make_dev().
Diskslice/label code not yet handled.
Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)
Add the correct hook for devfs to kern_conf.c
The net result of this excercise is that a lot less files depends on DEVFS, and devtoname() gets more sensible output in many cases.
A few drivers had minor additional cleanups performed relating to cdevsw registration.
A few drivers don't register a cdevsw{} anymore, but only use make_dev().
|
50248 |
23-Aug-1999 |
alc |
Correct the inconsistent formatting in struct vm_map.
Addendum to rev 1.47: submitted by dillon.
|
50247 |
23-Aug-1999 |
alc |
struct vm_map: The lock structure cannot be the first element of the vm_map because this can result in livelock between two or more system processes trying to kmem_alloc_wait.
|
50136 |
22-Aug-1999 |
alc |
Remove two unused variable declarations.
|
50075 |
20-Aug-1999 |
alc |
vm_page_alloc and contigmalloc1: Verify that free pages are not dirty.
Submitted by: dillon
|
50034 |
19-Aug-1999 |
peter |
Update for run queue code.
|
49998 |
18-Aug-1999 |
mjacob |
Fix breakage - an extra brace got inserted where DIAGNOSTIC was defined but MAP_LOCK_DIAGNOSTIC wasn't.
|
49991 |
17-Aug-1999 |
green |
Unbreak the nfs KLD_MODULE. It needs a bit more of vm_page.h than was exported (notably vm_page_undirty()). Also, let vm_page_dirty() work in a KLD.
|
49979 |
17-Aug-1999 |
alc |
vm_page_free_toq: Update the comment to reflect the demise of PQ_ZERO and remove a (now) useless test.
|
49949 |
17-Aug-1999 |
alc |
Correct an accidental omission of one "vm_page_undirty" replacement from the previous commit.
|
49948 |
17-Aug-1999 |
alc |
vm_page_free_toq: Clear the dirty bit mask (vm_page_undirty) before adding the page to the free page queue.
Submitted by: dillon
|
49945 |
17-Aug-1999 |
alc |
Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page.
Use it.
Submitted by: dillon
|
49937 |
17-Aug-1999 |
alc |
vm_pageout_clean: Remove dead code.
Submitted by: dillon
|
49900 |
16-Aug-1999 |
alc |
vm_map_lock*: Remove semicolons or add "do { } while (0)" as necessary to enable the use of these macros in arbitrary statements. (There are no functional changes.)
Submitted by: dillon
|
49858 |
15-Aug-1999 |
alc |
Remove the declarations for "vm_map_t io_map". It's been unused since i386/i386/machdep rev 1.310, i.e., the demise of BOUNCE_BUFFERS.
|
49852 |
15-Aug-1999 |
alc |
Remove the declarations for "vm_map_t u_map". It's been unused since i386/i386/pmap rev 1.190. (The alpha never used it.)
|
49819 |
15-Aug-1999 |
alc |
contigmalloc1 (currently) depends on PQ_FREE and PQ_CACHE not being 0 to tell a valid "struct vm_page" from an invalid one in the vm_page_array. This isn't a very robust method.
|
49813 |
15-Aug-1999 |
mjacob |
Add back in old definitions if we're compiling for alpha.
|
49720 |
14-Aug-1999 |
alc |
Don't create a "struct vpgqueues" for PQ_NONE.
|
49697 |
13-Aug-1999 |
alc |
vm_map_madvise: A complete rewrite by dillon and myself to separate the implementation of behaviors that effect the vm_map_entry from those that effect the vm_object.
A result of this change is that madvise(..., MADV_FREE); is much cheaper.
|
49679 |
13-Aug-1999 |
phk |
The bdevsw() and cdevsw() are now identical, so kill the former.
|
49666 |
12-Aug-1999 |
alc |
Make the default page coloring parameters match a (non-Xeon) Pentium II/III.
This setting is also acceptable for Celerons and Pentium Pros with less than 1MB L2 caches.
Note: PQ_L2_SIZE is a misnomer. The correct number of colors is a function of the cache's degree of associativity as well as its size.
Submitted by: bde and alc
|
49655 |
12-Aug-1999 |
alc |
vm_object_madvise: Update the comments to match the implementation.
Submitted by: dillon
|
49654 |
12-Aug-1999 |
alc |
vm_object_madvise: Support MADV_DONTNEED and MADV_WILLNEED on object types besides OBJT_DEFAULT and OBJT_SWAP.
Submitted by: dillon
|
49618 |
11-Aug-1999 |
alc |
contigmalloc1: If a page is found in the wrong queue, panic instead of silently ignoring the problem.
|
49615 |
10-Aug-1999 |
peter |
Add a contigfree() as a corollary to contigmalloc() as it's not clear which free routine to use and people are tempted to use free() (which doesn't work)
|
49592 |
10-Aug-1999 |
alc |
vm_map_madvise: Now that behaviors are stored in the vm_map_entry rather than the vm_object, it's no longer necessary to instantiate a vm_object just to hold the behavior.
Reviewed by: dillon
|
49558 |
09-Aug-1999 |
phk |
Merge the cons.c and cons.h to the best of my ability. alpha may or may not compile, I can't test it.
|
49535 |
08-Aug-1999 |
phk |
Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>.
Add a few fields to struct specinfo, paving the way for the fun part.
|
49338 |
01-Aug-1999 |
alc |
Move the memory access behavior information provided by madvise from the vm_object to the vm_map.
Submitted by: dillon
|
49326 |
31-Jul-1999 |
alc |
Change the type of vpgqueues::lcnt from "int *" to "int". The indirection served no purpose.
|
49305 |
31-Jul-1999 |
alc |
vm_page_queue_init: Remove the initialization of PQ_NONE's cnt and lcnt. They aren't used.
vm_page_insert: Remove an unnecessary dereference.
vm_page_wire: Remove the one and only (and thus pointless) reference to PQ_NONE's lcnt.
|
48974 |
22-Jul-1999 |
alc |
Reduce the number of "magic constants" used for page coloring by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same thing at different places in the kernel. Drop PQ_PRIME3.
|
48963 |
21-Jul-1999 |
alc |
Fix the following problem:
When creating new processes (or performing exec), the new page directory is initialized too early. The kernel might grow before p_vmspace is initialized for the new process. Since pmap_growkernel doesn't yet know about the new page directory, it isn't updated, and subsequent use causes a failure.
The fix is (1) to clear p_vmspace early, to stop pmap_growkernel from stomping on memory, and (2) to defer part of the initialization of new page directories until p_vmspace is initialized.
PR: kern/12378 Submitted by: tegge Reviewed by: dfr
|
48948 |
20-Jul-1999 |
green |
Make a dev2budev() function, and use it. This refixes pstat (working, broken, working, broken, working) and savecore (working, working, broken, working, working).
Sorta Reviewed by: phk
|
48922 |
20-Jul-1999 |
alc |
Convert a "page not busy" warning to an assertion.
Submitted by: dillon@backplane.com
|
48866 |
17-Jul-1999 |
phk |
Add a field to struct swdevt to avoid a bogus udev2dev() call.
|
48859 |
17-Jul-1999 |
phk |
I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g
|
48833 |
16-Jul-1999 |
alc |
Remove vm_object::last_read. It is used by the old swap pager, but not by the new one, i.e., vm/swap_pager.c rev 1.108.
Reviewed by: dillon@backplane.com
|
48757 |
11-Jul-1999 |
alc |
Cleanup OBJ_ONEMAPPING management.
vm_map.c: Don't set OBJ_ONEMAPPING on arbitrary vm objects. Only default and swap type vm objects should have it set. vm_object_deallocate already handles these cases.
vm_object.c: If OBJ_ONEMAPPING isn't already clear in vm_object_shadow, we are in trouble. Instead of clearing it, make it an assertion that it is already clear.
|
48738 |
10-Jul-1999 |
alc |
Change the data type used to represent page color in the vm_object to be the same as that used in the vm_page. (This change also shrinks the vm_object.)
|
48736 |
10-Jul-1999 |
alc |
Remove unused function prototypes.
|
48658 |
07-Jul-1999 |
ache |
add unused argument to udev2dev() to make kernel compiled
|
48652 |
07-Jul-1999 |
msmith |
Reinstate the previous fix for the broken export of a dev_t in sw_dev, convert back to a dev_t when the value is actually used.
|
48651 |
07-Jul-1999 |
green |
Back out previous commit. It was wrong, and caused panics.
|
48647 |
06-Jul-1999 |
msmith |
swdevt should contain a udev_t not a devt. This resulted in bogus swap device name reporting.
Submitted by: Bill Swingle <unfurl@freebsd.org>
|
48590 |
05-Jul-1999 |
mckay |
Reformat previous fix to remove an uglier than average goto.
Looked OK to: dg
|
48544 |
04-Jul-1999 |
mckusick |
The buffer queue mechanism has been reformulated. Instead of having QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN, QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean and dirty buffers have been separated. Empty buffers with KVM assignments have been separated from truely empty buffers. getnewbuf() has been rewritten and now operates in a 100% optimal fashion. That is, it is able to find precisely the right kind of buffer it needs to allocate a new buffer, defragment KVM, or to free-up an existing buffer when the buffer cache is full (which is a steady-state situation for the buffer cache).
Buffer flushing has been reorganized. Previously buffers were flushed in the context of whatever process hit the conditions forcing buffer flushing to occur. This resulted in processes blocking on conditions unrelated to what they were doing. This also resulted in inappropriate VFS stacking chains due to multiple processes getting stuck trying to flush dirty buffers or due to a single process getting into a situation where it might attempt to flush buffers recursively - a situation that was only partially fixed in prior commits. We have added a new daemon called the buf_daemon which is responsible for flushing dirty buffers when the number of dirty buffers exceeds the vfs.hidirtybuffers limit. This daemon attempts to dynamically adjust the rate at which dirty buffers are flushed such that getnewbuf() calls (almost) never block.
The number of nbufs and amount of buffer space is now scaled past the 8MB limit that was previously imposed for systems with over 64MB of memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed somewhat. The number of physical buffers has been increased with the intention that we will manage physical I/O differently in the future.
reassignbuf previously attempted to keep the dirtyblkhd list sorted which could result in non-deterministic operation under certain conditions, such as when a large number of dirty buffers are being managed. This algorithm has been changed. reassignbuf now keeps buffers locally sorted if it can do so cheaply, and otherwise gives up and adds buffers to the head of the dirtyblkhd list. The new algorithm is deterministic but not perfect. The new algorithm greatly reduces problems that previously occured when write_behind was turned off in the system.
The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive P_BUFEXHAUST bit. This bit allows processes working with filesystem buffers to use available emergency reserves. Normal processes do not set this bit and are not allowed to dig into emergency reserves. The purpose of this bit is to avoid low-memory deadlocks.
A small race condition was fixed in getpbuf() in vm/vm_pager.c.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
|
48409 |
01-Jul-1999 |
peter |
Fix some int/long printf problems for the Alpha
|
48391 |
01-Jul-1999 |
peter |
Slight reorganization of kernel thread/process creation. Instead of using SYSINIT_KT() etc (which is a static, compile-time procedure), use a NetBSD-style kthread_create() interface. kproc_start is still available as a SYSINIT() hook. This allowed simplification of chunks of the sysinit code in the process. This kthread_create() is our old kproc_start internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work the same as the NetBSD one.
One thing I'd like to do shortly is get rid of nfsiod as a user initiated process. It makes sense for the nfs client code to create them on the fly as needed up to a user settable limit. This means that nfsiod doesn't need to be in /sbin and is always "available". This is a fair bit easier to do outside of the SYSINIT_KT() framework.
|
48289 |
27-Jun-1999 |
peter |
Kirk missed a required BUF_KERNPROC(). Even though this is a non-async transfer, the b_iodone hook causes biodone() to release it from interrupt context.
|
48274 |
27-Jun-1999 |
peter |
Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly splbio()/splx()) are #included in time.
|
48252 |
26-Jun-1999 |
peter |
There isn't much point waking up a daemon that hasn't existed since softupdates came in. Try calling speedup_syncer() instead..
|
48225 |
26-Jun-1999 |
mckusick |
Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
|
48099 |
22-Jun-1999 |
alc |
Remove (1) "extern" declarations for variables that were previously made "static" and (2) initialized but unused variables.
|
48059 |
20-Jun-1999 |
alc |
Remove vm_object::cache_count and vm_object::wired_count. They are not used. (Nor is there any planned use by John who introduced them.)
Reviewed by: "John S. Dyson" <toor@dyson.iquest.net>
|
48045 |
20-Jun-1999 |
alc |
Set cnt.v_page_size to PAGE_SIZE rather than DEFAULT_PAGE_SIZE so that "vmstat -s" reports the correct value on the Alpha.
Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
|
48022 |
19-Jun-1999 |
alc |
Remove some unused function and variable declarations.
|
47986 |
17-Jun-1999 |
alc |
vm_map_growstack uses vmspace::vm_ssize as though it contained the stack size in bytes when in fact it is the stack size in pages.
|
47968 |
17-Jun-1999 |
alc |
vm_map_insert sometimes extends an existing vm_map entry, rather than creating a new entry. vm_map_stack and vm_map_growstack can panic when a new entry isn't created. Fixed vm_map_stack and vm_map_growstack.
Also, when extending the stack, always set the protection to VM_PROT_ALL.
|
47966 |
17-Jun-1999 |
alc |
Move vm_map_stack and vm_map_growstack after the definition of the vm_map_clip_end macro. (The next commit will modify vm_map_stack and vm_map_growstack to use vm_map_clip_end.)
|
47965 |
17-Jun-1999 |
alc |
Remove some unused declarations and duplicate initialization.
|
47888 |
12-Jun-1999 |
alc |
vm_map_protect: The wrong vm_map_entry is used to determine if writes must not be allowed due to COW.
|
47841 |
08-Jun-1999 |
dt |
Add a function kmem_alloc_nofault() - same as kmem_alloc_pageable(), but create a nofault entry. It will be used to allocate kmem for upages.
(I am not too happy with all this, but it's better than nothing).
|
47765 |
05-Jun-1999 |
alc |
vm_mmap: Insure that device mappings get MAP_PREFAULT(_PARTIAL) set, so that 4M page mappings are used when possible.
Reviewed by: Luoqi Chen <luoqi@watermarkgroup.com>
|
47673 |
01-Jun-1999 |
phk |
Shorten a detour around dev_t to get a udev_t created.
|
47640 |
31-May-1999 |
phk |
Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing.
cdevsw_add() will print an message if the d_maj field looks bogus.
Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL.
Move bdevsw() and devsw() functions to kern/kern_conf.c
Bump __FreeBSD_version to 400006
This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions
if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.
I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.
|
47625 |
30-May-1999 |
phk |
This commit should be a extensive NO-OP:
Reformat and initialize correctly all "struct cdevsw".
Initialize the d_maj and d_bmaj fields.
The d_reset field was not removed, although it is never used.
I used a program to do most of this, so all the files now use the same consistent format. Please keep it that way.
Vinum and i4b not modified, patches emailed to respective authors.
|
47607 |
30-May-1999 |
alc |
Addendum to 1.155. Verify the existence of the object before checking its reference count.
|
47568 |
28-May-1999 |
alc |
Avoid the creation of unnecessary shadow objects.
|
47290 |
18-May-1999 |
alc |
vm_map_insert: General cleanup. Eliminate coalescing checks that are duplicated by vm_object_coalesce.
|
47258 |
17-May-1999 |
alc |
Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert, eliminating the need for the pmap_object_init_pt calls in imgact_* and mmap.
Reviewed by: David Greenman <dg@root.com>
|
47243 |
16-May-1999 |
alc |
Remove prototypes for functions that don't exist anymore (vm_map.h).
Remove a useless argument from vm_map_madvise's interface (vm_map.c, vm_map.h, and vm_mmap.c).
Remove a redundant test in vm_uiomove (vm_map.c).
Make two changes to vm_object_coalesce:
1. Determine whether the new range of pages actually overlaps the existing object's range of pages before calling vm_object_page_remove. (Prior to this change almost 90% of the calls to vm_object_page_remove were to remove pages that were beyond the end of the object.)
2. Free any swap space allocated to removed pages.
|
47239 |
15-May-1999 |
dt |
Fix confusion of size of transfer with size of the pager.
PR: 11658 Broken in: 1.89 (1998/03/07)
|
47207 |
14-May-1999 |
alc |
Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option.
It never makes sense to specify MAP_COPY_NEEDED without also specifying MAP_COPY_ON_WRITE, and vice versa. Thus, MAP_COPY_ON_WRITE suffices.
Reviewed by: David Greenman <dg@root.com>
|
47111 |
13-May-1999 |
bde |
Casting handles from void * to uintptr_t on the way to dev_t became especially bogus when dev_t became a pointer.
|
47094 |
13-May-1999 |
luoqi |
Device pager's handle is dev_t not udev_t.
|
47064 |
12-May-1999 |
phk |
Fix a udev_t/dev_t mismatch which prevent paging from working.
|
47028 |
11-May-1999 |
phk |
Divorce "dev_t" from the "major|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland.
Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev()
For now they're functions, they will become in-line functions after one of the next two steps in this process.
Return major/minor/makedev to macro-hood for userland.
Register a name in cdevsw[] for the "filedescriptor" driver.
In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device.
In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang).
A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that.
Without DEVT_FASCIST I belive this patch is a no-op.
Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result.
Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
|
46816 |
09-May-1999 |
phk |
No point in swapdev being a static global when used only locally.
|
46676 |
08-May-1999 |
phk |
I got tired of seeing all the cdevsw[major(foo)] all over the place.
Made a new (inline) function devsw(dev_t dev) and substituted it.
Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)
DEVFS will eventually benefit from this change too.
|
46635 |
07-May-1999 |
phk |
Continue where Julian left off in July 1998:
Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function.
Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!)
Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!)
(Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
|
46625 |
07-May-1999 |
phk |
Introduce two functions: physread() and physwrite() and use these directly in *devsw[] rather than the 46 local copies of the same functions.
(grog will do the same for vinum when he has time)
|
46592 |
06-May-1999 |
peter |
Add brackets to silence egcs and help clarity.
|
46580 |
06-May-1999 |
phk |
remove b_proc from struct buf, it's (now) unused.
Reviewed by: dillon, bde
|
46538 |
06-May-1999 |
luoqi |
Don't ignore mmap() address hint below the text section.
|
46381 |
03-May-1999 |
billf |
Add sysctl descriptions to many SYSCTL_XXXs
PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
|
46349 |
02-May-1999 |
alc |
The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas.
The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE.
getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly.
There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE.
Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
|
46153 |
28-Apr-1999 |
dt |
s/static foo_devsw_installed = 0;/static int foo_devsw_installed;/. (Edited automatically)
|
46112 |
27-Apr-1999 |
phk |
Suser() simplification:
1: s/suser/suser_xxx/
2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with later.
There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
|
45960 |
23-Apr-1999 |
dt |
Make pmap_collect() an official pmap interface.
|
45821 |
19-Apr-1999 |
peter |
unifdef -DVM_STACK - it's been on for a while for x86 and was checked and appeared to be working for the Alpha some time ago.
|
45665 |
13-Apr-1999 |
peter |
Move the declaration of faultin() from the vm headers to proc.h, since it is now referenced from a macro there (PHOLD()).
|
45567 |
11-Apr-1999 |
eivind |
Staticize
|
45561 |
10-Apr-1999 |
dt |
Convert usage of vm_page_bits() to the new convention ("Inputs are required to range within a page").
|
45550 |
10-Apr-1999 |
eivind |
Lock vnode correctly for VOP_OPEN.
Discussed with: alc, dillon
|
45365 |
06-Apr-1999 |
peter |
Don't forcibly kill processes that are locked in-core via PHOLD - it was just checking P_NOSWAP before.
|
45363 |
06-Apr-1999 |
peter |
Only use p->p_lock (manage by PHOLD()/PRELE()) - P_NOSWAP/P_PHYSIO is no longer set.
|
45347 |
05-Apr-1999 |
julian |
Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c
Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>
|
45293 |
04-Apr-1999 |
alc |
Two changes to vm_map_delete:
1. Don't bother checking object->ref_count == 1 in order to set OBJ_ONEMAPPING. It's a waste of time. If object->ref_count == 1, vm_map_entry_delete will "run-down" the object and its pages.
2. If object->ref_count == 1, ignore OBJ_ONEMAPPING. Wait for vm_map_entry_delete to "run-down" the object and its pages. Otherwise, we're calling two different procedures to delete the object's pages.
Note: "vmstat -s" will once again show a non-zero value for "pages freed by exiting processes".
|
45069 |
27-Mar-1999 |
alc |
Mainly, eliminate the comments about share maps. (We don't have share maps any more.) Also, eliminate an incorrect comment that says that we don't coalesce vm_map_entry's. (We do.)
|
45057 |
27-Mar-1999 |
eivind |
Correct a comment.
|
44928 |
21-Mar-1999 |
alc |
Two changes:
Remove more (redundant) map timestamp increments from properly synchronized routines. (Changed: vm_map_entry_link, vm_map_entry_unlink, and vm_map_pageable.)
Micro-optimize vm_map_entry_link and vm_map_entry_unlink, eliminating unnecessary dereferences. At the same time, converted them from macros to inline functions.
|
44880 |
19-Mar-1999 |
alc |
Construct the free queue(s) in descending order (by physical address) so that the first 16MB of physical memory is allocated last rather than first. On large-memory machines, this avoids the exhaustion of low physical memory before isa_dmainit has run.
|
44793 |
16-Mar-1999 |
alc |
Correct a problem in kmem_malloc: A kmem_malloc allowing "wait" may block (VM_WAIT) holding the map lock. This is bad. For example, a subsequent kmem_malloc by an interrupt handler on the same map may find the lock held and panic in the lockmgr.
|
44773 |
15-Mar-1999 |
alc |
Two changes:
In general, vm_map_simplify_entry should be performed INSIDE the loop that traverses the map, not outside. (Changed: vm_map_inherit, vm_map_pageable.)
vm_fault_unwire doesn't acquire the map lock (or block holding it). Thus, vm_map_set/clear_recursive shouldn't be called. (Changed: vm_map_user_pageable, vm_map_pageable.)
|
44771 |
15-Mar-1999 |
julian |
Fix breakage in last commit Submitted by: Brian Feldman <green@unixhelp.org>
|
44754 |
14-Mar-1999 |
julian |
A bit of a hack, but allows the vn device to be a module again.
Submitted by: Matt Dillon <dillon@freebsd.org>
|
44739 |
14-Mar-1999 |
julian |
Submitted by: Matt Dillon <dillon@freebsd.org> The old VN device broke in -4.x when the definition of B_PAGING changed. This patch fixes this plus implements additional capabilities. The new VN device can be backed by a file ( as per normal ), or it can be directly backed by swap.
Due to dependencies in VM include files (on opt_xxx options) the new vn device cannot be a module yet. This will be fixed in a later commit. This commit delimitted by tags {PRE,POST}_MATT_VNDEV
|
44733 |
14-Mar-1999 |
alc |
Correct two optimization errors in vm_object_page_remove:
1. The size of vm_object::memq is vm_object::resident_page_count, not vm_object::size.
2. The "size > 4" test sometimes results in the traversal of a ~1000 page memq in order to locate ~10 pages.
|
44682 |
12-Mar-1999 |
alc |
Remove vm_page_frees from kmem_malloc that are performed by vm_map_delete/vm_object_page_remove anyway.
|
44675 |
12-Mar-1999 |
julian |
Stop the mfs from trying to swap out crucial bits of the mfs as this can lead to deadlock. Submitted by: Mat dillon <dillon@freebsd.org>
|
44597 |
09-Mar-1999 |
alc |
Remove (redundant) map timestamp increments from some properly synchronized routines.
|
44569 |
08-Mar-1999 |
alc |
Remove an unused variable from vmspace_fork.
|
44565 |
07-Mar-1999 |
alc |
Change vm_map_growstack to acquire and hold a read lock (instead of a write lock) until it actually needs to modify the vm_map.
Note: it is legal to modify vm_map::hint without holding a write lock.
Submitted by: "Richard Seaman, Jr." <dick@tar.com> with minor changes by myself.
|
44513 |
06-Mar-1999 |
alc |
Upgrading a map's lock to exclusive status should increment the map's timestamp. In general, whenever an exclusive lock is acquired the timestamp should be incremented.
|
44438 |
02-Mar-1999 |
alc |
To avoid a conflict for the vm_map's lock with vm_fault, release the read lock around the subyte operations in mincore. After the lock is reacquired, use the map's timestamp to determine if we need to restart the scan.
|
44396 |
02-Mar-1999 |
alc |
Remove the last of the share map code: struct vm_map::is_main_map.
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
|
44379 |
01-Mar-1999 |
alc |
mincore doesn't modify the vm_map. Therefore, it doesn't require an exclusive lock. A read lock will suffice.
|
44321 |
27-Feb-1999 |
alc |
Reviewed by: "John S. Dyson" <dyson@iquest.net> Submitted by: Matthew Dillon <dillon@apollo.backplane.com> To prevent a deadlock, if we are extremely low on memory, force synchronous operation by the VOP_PUTPAGES in vnode_pager_putpages.
|
44250 |
25-Feb-1999 |
alc |
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Corrected the computation of cnt.v_ozfod in vm_fault: vm_fault was counting the number of unoptimized rather than optimized zero-fill faults.
|
44249 |
25-Feb-1999 |
dillon |
Comment swstrategy() routine.
|
44245 |
24-Feb-1999 |
dillon |
Remove unnecessary page protects on map_split and collapse operations. Fix bug where an object's OBJ_WRITEABLE/OBJ_MIGHTBEDIRTY flags do not get set under certain circumstances ( page rename case ).
Reviewed by: Alan Cox <alc@cs.rice.edu>, John Dyson
|
44206 |
22-Feb-1999 |
dillon |
Removed ENOMEM error on swap_pager_full condition which ignored the availability of physical memory. As per original bug report by Bruce.
Reviewed by: Alan Cox <alc@cs.rice.edu>
|
44179 |
21-Feb-1999 |
dillon |
Remove conditional sysctl's
Leave swap_async_max sysctl intact, remove swap_cluster_max sysctl.
Reviewed by: Alan Cox <alc@cs.rice.edu>
|
44178 |
21-Feb-1999 |
dillon |
Reviewed by: Alan Cox <alc@cs.rice.edu>
Fix problem w/ low-swap/low-memory handling as reported by Bruce Evans.
|
44156 |
19-Feb-1999 |
luoqi |
Eliminate a possible numerical overflow.
|
44146 |
19-Feb-1999 |
luoqi |
Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
|
44135 |
19-Feb-1999 |
dillon |
Submitted by: Alan Cox <alc@cs.rice.edu>
Remove remaining share map garbage from vm_map_lookup() and clean out old #if 0 stuff.
|
44124 |
18-Feb-1999 |
dillon |
Limit number of simultanious asynchronous swap pager I/Os that can be in progress at any given moment.
Add two swap tuneables to sysctl:
vm.swap_async_max: 4 vm.swap_cluster_max: 16
Recommended values are a cluster size of 8 or 16 pages. async_max is about right for 1-4 swap devices. Reduce to 2 if swap is eating too much bandwidth, or even 1 if swap is both eating too much bandwidth and sitting on a slow network (10BaseT).
The defaults work well across a broad range of configurations and should normally be left alone.
|
44098 |
17-Feb-1999 |
dillon |
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
Unlock vnode before messing with map to avoid deadlock between map and vnode ( e.g. with exec_map and underlying program binary vnode ). Solves a deadlock that most often occurs during a large -j# buildworld reported by three people.
|
44051 |
15-Feb-1999 |
dillon |
Minor reorganization of vm_page_alloc(). No functional changes have been made but the code has been reorganized and documented to make it more readable, reduce the size of the code, and optimize the branch path caching capabilities that most modern processors have.
|
44034 |
15-Feb-1999 |
dillon |
Fix a bug in the new madvise() code that would possibly (improperly) free swap space out from under a busy page. This is not legal because the swap may be reallocated and I/O issued while I/O is still in progress on the same swap page from the madvise()'d object. This bug could only occur under extreme paging conditions but might not cause an error until much later. As a side-benefit, madvise() is now even smaller.
|
43941 |
12-Feb-1999 |
dillon |
Minor optimization to madvise() MADV_FREE to make page as freeable as possible without actually unmapping it from the process.
As of now, I declare madvise() on OBJT_DEFAULT/OBJT_SWAP objects to be 'working and complete'.
|
43923 |
12-Feb-1999 |
dillon |
Fix non-fatal bug in vm_map_insert() which improperly cleared OBJ_ONEMAPPING in the case where an object is extended by an additional vm_map_entry must be allocated.
In vm_object_madvise(), remove calll to vm_page_cache() in MADV_FREE case in order to avoid a page fault on page reuse. However, we still mark the page as clean and destroy any swap backing store.
Submitted by: Alan Cox <alc@cs.rice.edu>
|
43795 |
09-Feb-1999 |
dillon |
Addendum to vm_map coalesce optimization. Also, this was backed-out because there was a concensus on current in regards to leaving bss r+w+x instead of r+w. This is in order to maintain reasonable compatibility with existing JIT compilers (e.g. kaffe) and possibly other programs.
|
43777 |
08-Feb-1999 |
dillon |
Revamp vm_object_[q]collapse(). Despite the complexity of this patch, no major operational changes were made. The three core object->memq loops were moved into a single inline procedure and various operational characteristics of the collapse function were documented.
|
43761 |
08-Feb-1999 |
dillon |
General cleanup. Remove #if 0's and remove useless register qualifiers.
|
43752 |
08-Feb-1999 |
dillon |
Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with PQ_FREE. There is little operational difference other then the kernel being a few kilobytes smaller and the code being more readable.
* vm_page_select_free() has been *greatly* simplified. * The PQ_ZERO page queue and supporting structures have been removed * vm_page_zero_idle() revamped (see below)
PG_ZERO setting and clearing has been migrated from vm_page_alloc() to vm_page_free[_zero]() and will eventually be guarenteed to remain tracked throughout a page's life ( if it isn't already ).
When a page is freed, PG_ZERO pages are appended to the appropriate tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended. When locating a new free page, PG_ZERO selection operates from within vm_page_list_find() ( get page from end of queue instead of beginning of queue ) and then only occurs in the nominal critical path case. If the nominal case misses, both normal and zero-page allocation devolves into the same _vm_page_list_find() select code without any specific zero-page optimizations.
Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been added and zero-page tracking adjusted to conform with the other changes. Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free pages. We may wish to increase both parameters as time permits. The hysteresis is designed to avoid silly zeroing in borderline allocation/free situations.
|
43751 |
08-Feb-1999 |
dillon |
Backed out vm_map coalesce optimization - it resulted in 22% more page faults for reasons unknown ( under investigation ). /usr/bin/time -l make in /usr/src/bin went from 67000 faults to 90000 faults.
|
43748 |
07-Feb-1999 |
dillon |
Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to attempt to optimize forks but were essentially given-up on due to problems and replaced with an explicit dup of the vm_map_entry structure. Prior to the removal, they were entirely unused.
|
43747 |
07-Feb-1999 |
dillon |
Remove L1 cache coloring optimization ( leave L2 cache coloring opt ).
Rewrite vm_page_list_find() and vm_page_select_free() - make inline out of nominal case.
|
43729 |
07-Feb-1999 |
dillon |
When shadowing objects, adjust the page coloring of the shadowing object such that pages in the combined/shadowed object are consistantly colored.
Submitted by: "John S. Dyson" <dyson@iquest.net>
|
43700 |
06-Feb-1999 |
dillon |
Add hysteresis to the 'swap_pager_getswapspace; failed' console message. Also widen the hysteresis levels a little ( these really should be dynamically configured ).
|
43638 |
05-Feb-1999 |
dillon |
The elf loader sets the permissions on bss to VM_PROT_READ|VM_PROT_WRITE rather then VM_PROT_ALL. obreak, on the otherhand, uses VM_PROT_ALL. This prevents vm_map_insert() from being able to coalesce the heap and creates an extra map entry. Since current architectures ignore VM_PROT_EXECUTE anyway, and since not having VM_PROT_EXECUTE on data/bss may provide protection in the future, obreak now uses read+write rather then all (r+w+x).
This is an optimization, not a bug fix.
Submitted by: Alan Cox <alc@cs.rice.edu>
|
43616 |
04-Feb-1999 |
dillon |
Fix bug in a KASSERT I introduced in vm_page_qcollapse() rev 1.139.
Since paging is in progress, page scan in vm_page_qcollapse() must be protected at atleast splbio() to prevent pages from being ripped out from under the scan.
|
43547 |
03-Feb-1999 |
dillon |
Submitted by: Alan Cox
The vm_map_insert()/vm_object_coalesce() optimization has been extended to include OBJT_SWAP objects as well as OBJT_DEFAULT objects. This is possible because it costs nothing to extend an OBJT_SWAP object with the new swapper. We can't do this with the old swapper. The old swapper used a linear array that would have had to have been reallocated, costing time as well as a potential low-memory deadlock.
|
43493 |
01-Feb-1999 |
dillon |
This patch eliminates a pointless test from appearing twice in vm_map_simplify_entry. Basically, once you've verified that the objects in the adjacent vm_map_entry's are the same, either NULL or the same vm_object, there's no point in checking that the objects have the same behavior.
Obtained from: Alan Cox <alc@cs.rice.edu>
|
43476 |
31-Jan-1999 |
julian |
Submitted by: Alan Cox <alc@cs.rice.edu> Checked by: "Richard Seaman, Jr." <dick@tar.com> Fix the following problem: As the code stands now, growing any stack, and not just the process's main stack, modifies vm->vm_ssize. This is inconsistent with the code earlier in the same procedure.
|
43311 |
28-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
43287 |
27-Jan-1999 |
dillon |
Remove unintended trigraph sequences in comments for -Wall
|
43209 |
26-Jan-1999 |
julian |
Mostly remove the VM_STACK OPTION. This changes the definitions of a few items so that structures are the same whether or not the option itself is enabled. This allows people to enable and disable the option without recompilng the world.
As the author says:
|I ran into a problem pulling out the VM_STACK option. I was aware of this |when I first did the work, but then forgot about it. The VM_STACK stuff |has some code changes in the i386 branch. There need to be corresponding |changes in the alpha branch before it can come out completely.
what is done: | |1) Pull the VM_STACK option out of the header files it appears in. This |really shouldn't affect anything that executes with or without the rest |of the VM_STACK patches. The vm_map_entry will then always have one |extra element (avail_ssize). It just won't be used if the VM_STACK |option is not turned on. | |I've also pulled the option out of vm_map.c. This shouldn't harm anything, |since the routines that are enabled as a result are not called unless |the VM_STACK option is enabled elsewhere. | |2) Add what appears to be appropriate code the the alpha branch, still |protected behind the VM_STACK switch. I don't have an alpha machine, |so we would need to get some testers with alpha machines to try it out. | |Once there is some testing, we can consider making the change permanent |for both i386 and alpha. | [..] | |Once the alpha code is adequately tested, we can pull VM_STACK out |everywhere. |
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
|
43208 |
26-Jan-1999 |
julian |
Enable Linux threads support by default. This takes the conditionals out of the code that has been tested by various people for a while. ps and friends (libkvm) will need a recompile as some proc structure changes are made.
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
|
43145 |
24-Jan-1999 |
dillon |
Undo last commit - not a bug, just duplicate code. PG_MAPPED and PG_WRITEABLE are already cleared by vm_page_protect().
|
43138 |
24-Jan-1999 |
dillon |
Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL to use the vm_page_dirty() inline.
The inline can thus do sanity checks ( or not ) over all cases.
|
43136 |
24-Jan-1999 |
dillon |
vm_map_split() used to dirty the page manually after calling vm_page_rename(), but never pulled the page off PQ_CACHE if it was on PQ_CACHE. Dirty pages in PQ_CACHE are not allowed and a KASSERT was added in -4.x to test for this... and got hit.
In -4.x, vm_page_rename() automatically dirties the page. This commit also has it deal with the PQ_CACHE case, deactivating the page in that case.
|
43134 |
24-Jan-1999 |
dillon |
Add vm_page_dirty() inline with PQ_CACHE sanity check
|
43129 |
24-Jan-1999 |
dillon |
vm_pager_put_pages() is passed an rcval array to hold per-page return values. The 'int' return value for the procedure was never used and not well defined in any case when there are mixed errors on pages, so it has been removed. vm_pager_put_pages() and associated vm_pager functions now return void.
|
43128 |
24-Jan-1999 |
dillon |
Clear PG_MAPPED as well as PG_WRITEABLE when a page is moved to the cache.
|
43127 |
24-Jan-1999 |
dillon |
Added warning printf ( needs INVARIANTS ) when busy cache page is found while trying to free memory.
|
43123 |
24-Jan-1999 |
dillon |
It is possible for a page in the cache to be busy. vm_pageout.c was not checking for this condition while it tried to free cache pages. Fixed.
|
43122 |
24-Jan-1999 |
dillon |
Add invariants to vm_page_busy() and vm_page_wakeup() to check for PG_BUSY stupidity.
|
43121 |
24-Jan-1999 |
dillon |
Clear PG_WRITEABLE in vm_page_cache(). This may or may not be a bug, but the bit should definitely be cleared.
|
43120 |
24-Jan-1999 |
dillon |
Depreciate vm_object_pmap_copy() - nobody uses it. Everyone uses vm_object_pmap_copt_1() now, apparently.
|
43119 |
24-Jan-1999 |
dillon |
Get rid of unused old_m in vm_fault. Add INVARIANTS to test whether page is still busy after all the hell vm_fault goes through.. it is supposed to be, and printf() if it isn't. don't panic, though.
|
43086 |
23-Jan-1999 |
dillon |
Reenable John Dyson's low-memory VM_WAIT code for page reactivations out of PQ_CACHE. Add comments explaining what it accomplishes and its limitations.
|
42979 |
21-Jan-1999 |
dillon |
Mainly changes to support the new swapper. The big adjustment is that swap blocks are now in PAGE_SIZE'd increments instead of DEV_BSIZE'd increments. We still convert to DEV_BSIZE'd increments for the backing store I/O, but everything else is in PAGE_SIZE increments.
|
42978 |
21-Jan-1999 |
dillon |
Move many of the vm_pager_*() functions from vm_pager.c to inlines in vm_pager.h
|
42977 |
21-Jan-1999 |
dillon |
Move many of the vm_pager_*() functions from vm_pager.c to inlines in vm_pager.h
Added argument to getpbuf() and relpbuf() to allow each subsystem to specify a different hard limit on the number of simultanious physical bufferes that said subsystem may allocate. Without this feature, one subsystem ( e.g. the vfs clustering code ) could hog *ALL* the pbufs, causing a deadlock in the pager in a low memory situation.
Same for trypbuf().
|
42976 |
21-Jan-1999 |
dillon |
Reorganized some of the low memory testing code to make it more useful.
Removed call to vm_object_collapse(), which can block. This was being called without the pageout code holding any sort of reference on the vm_object or vm_page_t structures being manipulated. Since this code can block, it was possible for other kernel code to shred the state the pageout code was assuming remained intact.
Fixed potential blocking condition in vm_pageout_page_free() ( which could cause a deadlock in a low-memory situation ).
Currently there is a hack in-place to deal with clean filesystem meta-data polluting the inactive page queue. John doesn't like the hack, and neither do I.
Revamped and commented a portion of the pageout loop.
Added protection against potential memory deadlocks with OBJT_VNODE when using VOP_ISLOCKED(). The problem is that vp->v_data can be NULL which causes VOP_ISLOCKED() to return a less informed answer.
remove vm_pager_sync() -- none of the pagers use it any more ( the old swapper used to. The new one does not ).
|
42975 |
21-Jan-1999 |
dillon |
The TAILQ hashq has been turned into a singly-linked=list link, reducing the size of vm_page_t.
SWAPBLK_NONE and SWAPBLK_MASK are defined here. These actually are more generalized then their names imply, but their placement is somewhat of a legacy issue from a prior test version of this code that put the swapblk in the vm_page_t structure. That test code was eventually thrown away. The legacy remains.
Added vm_page_flash() inline. Similar to vm_page_wakeup() except that it does not clear PG_BUSY ( one assumes that PG_BUSY is already clear ). Used by a number of routines to wakeup waiters.
Collapsed some of the code in inline calls to make other inline calls. GCC will optimize this well and it reduces duplication.
vm_page_free() and vm_page_free_zero() inlines added to convert to the proper vm_page_free_toq() call.
vm_page_sleep_busy() inline added, replacing vm_page_sleep() ( which has been removed ). This implements a much more optimizable page-waiting function.
|
42974 |
21-Jan-1999 |
dillon |
The hash table used to be a table of doubly-link list headers ( two pointers per entry ). The table has been changed to a singly linked list of vm_page_t pointers. The table has been doubled in size, but the entries only take half the space so a net-zero change in memory use.
The hash function has been changed, hopefully for the better. The combination of the larger hash table size of changed function should keep the chain length down to a reasonable number (0-3, average 1).
vm_object->page_hint has been removed. This 'optimization' was not only never needed, but costs as much as a hash chain link to implement. While having page_hint in vm_object might result in better locality of reference, the cost is not worth the space in vm_object or the extra instructions in my view.
vm_page_alloc*() functions have been inlined and call a generalized non-inlined vm_page_alloc_toq() which combines the standard alloc and zero-page alloc functions together, reducing code size and the L1 cache footprint. Some reordering has been done... not much. The delinking code should be faster ( because unlinking a doubly-linked list requires four memory ops and unlinking a singly linked list only requires two ), and we get a hash consistancy check for free.
vm_page_rename() now automatically sets the page's dirty bits.
vm_page_alloc() does not try to manually inline freeing a cache page. Instead, it now properly calls vm_page_free(m) ... vm_page_free() is really too complex to manually inline.
vm_await(), supporting asleep(), has been added.
|
42973 |
21-Jan-1999 |
dillon |
The vm_object structure is now somewhat smaller due to the removal of most of the swap-pager-specific fields, the removal of the id, and the removal of paging_offset.
A new inline, vm_object_pip_wakeupn() has been added to subtract an arbitrary number n from the paging_in_progress count and then wakeup waiters as necessary. n may be 0, resulting in a 'flash'.
|
42972 |
21-Jan-1999 |
dillon |
object->id was badly implemented. It has simply been removed.
object->paging_offset has been removed - it was used to optimize a single OBJT_SWAP collapse case yet introduced massive confusion throughout vm_object.c. The optimization was inconsequential except for the claim that it didn't have to allocate any memory. The optimization has been removed.
madvise() has been fixed. The old madvise() could be made to operate on shared objects which is a big no-no. The new one is much more careful in what it modifies. MADV_FREE was totally broken and has now been fixed.
vm_page_rename() now automatically dirties a page, so explicit dirtying of the page prior to calling vm_page_rename() has been removed.
|
42971 |
21-Jan-1999 |
dillon |
Objects associated with raw devices are no longer counted in the VM stats total because they may contain absurd numbers ( like the size of all of physical memory if you mmap() /dev/mem ).
|
42970 |
21-Jan-1999 |
dillon |
General cleanup related to the new pager. We no longer have to worry about conversions of objects to OBJT_SWAP, it is done automatically now.
Replaced manually inserted code with inline calls for busy waiting on pages, which also incidently fixes a potential PG_BUSY race due to the code not running at splvm().
vm_objects no longer have a paging_offset field ( see vm/vm_object.c )
|
42969 |
21-Jan-1999 |
dillon |
Potential bug fix, do not just clear PG_BUSY... call vm_page_wakeup() instead to properly handle any waiters.
Added comments, added support for M_ASLEEP. Generally treat M_ flags as flags instead of constants to compare against.
|
42968 |
21-Jan-1999 |
dillon |
Removed low-memory blockages at fork. This is the wrong place to put this sort of test. We need to fix the low-memory handling in general.
|
42967 |
21-Jan-1999 |
dillon |
Mainly cleanup. Removed some inappropriate low-memory handling code and added lots of comments. Add tie-in to vm_pager ( and thus the new swapper ) to deallocate backing swap for dirtied pages on the fly.
|
42966 |
21-Jan-1999 |
dillon |
The default_pager's interaction with the swap_pager has been reorganized, and the swap_pager has been completely replaced.
The new swap pager uses the new blist radix-tree based bitmap allocator for low level swap allocation and deallocation. The new allocator is effectively O(5) while the old one was O(N), and the new allocator allocates all required memory at init time rather then at allocate memory on the fly at run time.
Swap metadata is allocated in clusters and stored in a hash table, eliminating linearly allocated structures.
Many, many features have been rewritten or added. Swap space is now reallocated on the fly providing a poor-mans auto defragmentation of swap space. Swap space that is no longer needed is freed on a timely basis so no garbage collection is necessary.
Swap I/O is marked B_ASYNC and NFS has been fixed to do the right thing with it, so NFS-based paging now has around 10x the performance as it did before ( previously NFS enforced synchronous I/O for paging ).
|
42957 |
21-Jan-1999 |
dillon |
This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
|
42453 |
10-Jan-1999 |
eivind |
KNFize, by bde.
|
42408 |
08-Jan-1999 |
eivind |
Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers.
Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic.
Reviewed by: msmith
|
42379 |
07-Jan-1999 |
julian |
Changes to the LINUX_THREADS support to only allocate extra memory for shared signal handling when there is shared signal handling being used.
This removes the main objection to making the shared signal handling a standard ability in rfork() and friends and 'unconditionalising' this code. (i.e. the allocation of an extra 328 bytes per process).
Signal handling information remains in the U area until such a time as it's reference count would be incremented to > 1. At that point a new struct is malloc'd and maintained in KVM so that it can be shared between the processes (threads) using it.
A function to check the reference count and move the struct back to the U area when it drops back to 1 is also supplied. Signal information is therefore now swapable for all processes that are not sharing that information with other processes. THis should addres the concerns raised by Garrett and others.
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
|
42360 |
06-Jan-1999 |
julian |
Add (but don't activate) code for a special VM option to make downward growing stacks more general. Add (but don't activate) code to use the new stack facility when running threads, (specifically the linux threads support). This allows people to use both linux compiled linuxthreads, and also the native FreeBSD linux-threads port.
The code is conditional on VM_STACK. Not using this will produce the old heavily tested system.
Submitted by: Richard Seaman <dick@tar.com>
|
42248 |
02-Jan-1999 |
bde |
Ifdefed conditionally used simplock variables.
|
42153 |
29-Dec-1998 |
dt |
Don't free swap in swap_pager_getpages(): this code probably cause the "dying daemons" problem. (I thought this code was introduced in rev.1.80, but it just relaxed the condition.)
Also, kill related "suggest more swap space" warning (also introduced in 1.80). It was confusing, to say the least...
Requested by: msmith Not objected by: dg
|
42026 |
23-Dec-1998 |
dillon |
Update comments to routines in vm_page.c, most especially whether a routine can block or not as part of a general effort to carefully document blocking/non-blocking calls in the kernel.
|
41936 |
19-Dec-1998 |
julian |
Fix two bogons created by 'patch(1)' in my last commit.
|
41931 |
19-Dec-1998 |
julian |
Reviewed by: Luoqi Chen, Jordan Hubbard Submitted by: "Richard Seaman, Jr." <lists@tar.com> Obtained from: linux :-)
Code to allow Linux Threads to run under FreeBSD.
By default not enabled This code is dependent on the conditional COMPAT_LINUX_THREADS (suggested by Garret) This is not yet a 'real' option but will be within some number of hours.
|
41620 |
09-Dec-1998 |
dt |
Don't disable mmap with large file offset.
|
41591 |
07-Dec-1998 |
archie |
The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
|
41514 |
04-Dec-1998 |
archie |
Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>
|
41503 |
04-Dec-1998 |
rvb |
In vnode_pager_input_old, set auio.uio_procp = curproc vs auio.uio_procp = (struct proc *) 0
|
41322 |
25-Nov-1998 |
dg |
Add missing splvm protection around unqueue call. Without this, the page queues would eventually get corrupted.
|
41250 |
19-Nov-1998 |
bde |
Fixed a null pointer panic in spc_free(). swap_pager_putpages() almost always causes this panic for the curproc != pageproc case. This case apparently doesn't happen in normal operation, but it happens when vm_page_alloc_contig() is called when there is a memory hogging application that hasn't already been paged out.
PR: 8632 Reviewed by: info@opensound.com (Dev Mazumdar), dg Broken in: rev.1.89 (1998/02/23)
|
41093 |
11-Nov-1998 |
dg |
Closed a small race condition between wiring/unwiring pages that involved the page's wire_count.
|
41059 |
10-Nov-1998 |
peter |
add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()
|
41004 |
08-Nov-1998 |
dfr |
* Fix a couple of places in the device pager where an address was truncated to 32 bits. * Change the calling convention of the device mmap entry point to pass a vm_offset_t instead of an int for the offset allowing devices with a larger memory map than (1<<32) to be supported on the alpha (/dev/mem is one such).
These changes are required to allow the X server to mmap the various I/O regions used for device port and memory access on the alpha.
|
40931 |
05-Nov-1998 |
dg |
Implemented zero-copy TCP/IP extensions via sendfile(2) - send a file to a stream socket. sendfile(2) is similar to implementations in HP-UX, Linux, and other systems, but the API is more extensive and addresses many of the complaints that the Apache Group and others have had with those other implementations. Thanks to Marc Slemko of the Apache Group for helping me work out the best API for this. Anyway, this has the "net" result of speeding up sends of files over TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU cycles) when compared to a traditional read/write loop.
|
40794 |
31-Oct-1998 |
peter |
Add John Dyson's SYSCTL descriptions, and an export of more stats to a sysctl hierarchy (vm.stats.*). SYSCTL descriptions are only present in source, they do not get compiled into the binaries taking up memory.
|
40790 |
31-Oct-1998 |
peter |
Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.
|
40701 |
28-Oct-1998 |
dg |
Fixed wrong comments in and about vm_page_deactivate().
|
40700 |
28-Oct-1998 |
dg |
Added a second argument, "activate" to the vm_page_unwire() call so that the caller can select either inactive or active queue to put the page on.
|
40673 |
27-Oct-1998 |
dg |
Added needed splvm() protection around object page traversal in vm_object_terminate().
|
40650 |
25-Oct-1998 |
bde |
Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted when bdevsw[] became sparse. We still depend on magic to avoid having to check that (v_rdev) device numbers in vnodes are not NODEV.
Removed a redundant `major(dev) < nblkdev' test instead of updating it.
Don't follow a garbage bdevsw pointer for attempts to swap on empty regular files. This case currently can't happen. Swapping on regular files is ifdefed out in swapon() and isn't attempted for empty files in nfs_mountroot().
|
40648 |
25-Oct-1998 |
phk |
Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.
|
40605 |
23-Oct-1998 |
dg |
Oops, revert part of last fix. vm_pager_dealloc() can't be called until after the pages are removed from the object...so fix the problem by not printing the diagnostic for wired fictitious pages (which is normal).
|
40604 |
23-Oct-1998 |
dg |
Fixed two bugs in recent commit: in vm_object_terminate, vm_pager_dealloc needs to be called prior to freeing remaining pages in the object so that the device pager has an opportunity to grab its "fake" pages. Also, in the case of wired pages, the page must be made busy prior to calling vm_page_remove. This is a difference from 2.2.x that I overlooked when I brought these changes forward.
|
40560 |
22-Oct-1998 |
dg |
Make the VM system handle the case where a terminating object contains legitimately wired pages. Currently we print a diagnostic when this happens, but this will be removed soon when it will be common for this to occur with zero-copy TCP/IP buffers.
|
40558 |
22-Oct-1998 |
dg |
Convert fake page allocs to use the zone allocator, thus eliminating the private pool management code in here.
|
40557 |
21-Oct-1998 |
dg |
Set m->object to NULL in dev_pager_getfake().
|
40548 |
21-Oct-1998 |
dg |
Nuked PG_TABLED flag. Replaced with m->object != NULL.
|
40546 |
21-Oct-1998 |
dg |
Add a diagnostic printf for freeing a wired page. This will eventually be turned into a panic, but I want to make sure that all cases of freeing pages with wire_count==1 (which is/was allowed) have first been fixed.
|
40286 |
13-Oct-1998 |
dg |
Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.
|
40087 |
09-Oct-1998 |
jdp |
Fix a panic on SMP systems, caused by sleeping while holding a simple-lock.
The reviewer raises the following caveat: "I believe these changes open a non-critical race condition when adding memory to the pool for the zone. I think what will happen is that you could have two threads that are simultaneously adding additional memory when the pool runs out. This appears to not be a problem, however, since the re-aquisition of the lock will protect the list pointers." The submitter agrees that the race is non-critical, and points out that it already existed for the non-SMP case. He suggests that perhaps a sleep lock (using the lock manager) should be used to close that race. This might be worth revisiting after 3.0 is released.
Reviewed by: dg (David Greenman) Submitted by: tegge (Tor Egge)
|
39873 |
01-Oct-1998 |
jdp |
Fix a bug in which a page index was used where a byte offset was expected. This bug caused builds of Modula-3 to fail in mysterious ways on SMP kernels. More precisely, such builds failed on systems with kern.fast_vfork equal to 0, the default and only supported value for SMP kernels.
PR: kern/7468 Submitted by: tegge (Tor Egge)
|
39770 |
29-Sep-1998 |
abial |
Make #define NO_SWAPPING a normal kernel config option.
Reviewed by: jkh
|
39739 |
28-Sep-1998 |
rvb |
John Dyson approved of this solution; make vnode_pager_input_old set m->valid
|
39700 |
28-Sep-1998 |
dg |
Be more selctive about when we clear p->valid. Submitted by: John Dyson <toor@dyson.iquest.net>
|
39512 |
20-Sep-1998 |
bde |
Removed unused file.
|
38866 |
05-Sep-1998 |
bde |
Instantiate `nfs_mount_type' in a standard file so that it is present when nfs is an LKM. Declare it in a header file. Don't forget to use it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0 will soon be the mount type number for the first vfs loaded.
NetBSD uses strcmp() to avoid this ugly global.
|
38799 |
04-Sep-1998 |
dfr |
Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.
|
38729 |
01-Sep-1998 |
wollman |
Separate wakeup conditions for page I/O count (pg_busy) and lock (PG_BUSY). This is not sa completely solution to the deadlock, but the additional wakeups have helped in my observation.
Suggested by: John Dyson
|
38542 |
25-Aug-1998 |
luoqi |
Fix a rounding problem that causes vnode pager to fail to remove the last partially filled page during a truncation.
PR: kern/7422
|
38517 |
24-Aug-1998 |
dfr |
Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).
Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable.
Reviewed by: Bruce Evans <bde@zeta.org.au>
|
38479 |
22-Aug-1998 |
mckay |
Correct/clarify some comments.
|
38298 |
13-Aug-1998 |
dfr |
Protect all modifications to paging_in_progress with splvm().
|
38135 |
06-Aug-1998 |
dfr |
Protect all modifications to paging_in_progress with splvm(). The i386 managed to avoid corruption of this variable by luck (the compiler used a memory read-modify-write instruction which wasn't interruptable) but other architectures cannot.
With this change, I am now able to 'make buildworld' on the alpha (sfx: the crowd goes wild...)
|
37918 |
28-Jul-1998 |
bde |
Fixed two spl nesting bugs. They caused (at least) the entire pageout daemon to run at splvm() forever after swap_pager_putpages() is called from vm_pageout_scan().
Broken in: rev.1.189 (1998/02/23)
|
37874 |
26-Jul-1998 |
dfr |
Notify pmap when a page is freed on the alpha to allow it to clean up its emulated modified/referenced bits.
|
37843 |
22-Jul-1998 |
dg |
Improved pager input failure message.
|
37821 |
22-Jul-1998 |
phk |
There is a comment in vm_param.h which doesn't belong to the code still left in there. The macros it describes disapeared some- time since 4.4BSD lite.
PR: 7246 Reviewed by: phk Submitted by: Stefan Eggers <seggers@semyam.dinoco.de>
|
37653 |
15-Jul-1998 |
bde |
Cast pointers to [u]intptr_t instead of to [unsigned] long.
|
37649 |
15-Jul-1998 |
bde |
Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
|
37641 |
14-Jul-1998 |
bde |
Print pointers using %p instead of attempting to print them by casting them to long, etc. Fixed some nearby printf bogons (sign errors not warned about by gcc, and style bugs, but not truncation of vm_ooffset_t's).
|
37640 |
14-Jul-1998 |
bde |
Print pointers using %p instead of attempting to print them by casting them to long, etc. Fixed some nearby printf bogons (sign errors not warned about by gcc, and style bugs, but not truncation of vm_ooffset_t's).
Use slightly less bogus casts for passing pointers to ddb command functions.
|
37563 |
11-Jul-1998 |
bde |
Fixed printf format errors.
|
37562 |
11-Jul-1998 |
bde |
Fixed printf format errors.
|
37555 |
11-Jul-1998 |
bde |
Fixed printf format errors.
|
37546 |
10-Jul-1998 |
alex |
Removed no longer valid comment about swb_block being int instead of daddr_t.
PR: 7238 Submitted by: Stefan Eggers <seggers@semyam.dinoco.de>
|
37545 |
10-Jul-1998 |
alex |
Removed unnecessary test from if/else construct.
PR: 7233 Submitted by: Stefan Eggers <seggers@semyam.dinoco.de>
|
37395 |
05-Jul-1998 |
dfr |
Don't truncate the return value of mmap to sizeof(int).
|
37389 |
04-Jul-1998 |
julian |
There is no such thing any more as "struct bdevsw".
There is only cdevsw (which should be renamed in a later edit to deventry or something). cdevsw contains the union of what were in both bdevsw an cdevsw entries. The bdevsw[] table stiff exists and is a second pointer to the cdevsw entry of the device. it's major is in d_bmaj rather than d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).
rawread()/rawwrite() went away as part of this though it's not strictly the same patch, just that it involves all the same lines in the drivers.
cdroms no longer have write() entries (they did have rawwrite (?)). tapes no longer have support for bdev operations.
Reviewed by: Eivind Eklund and Mike Smith Changes suggested by eivind.
|
37384 |
04-Jul-1998 |
julian |
VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>
|
37282 |
30-Jun-1998 |
jmg |
document some VM paging options for cache sizes: PQ_NOOPT no coloring PQ_LARGECACHE used for 512k/16k cache PQ_HUGECACHE used for 1024k/16k cache
|
37153 |
25-Jun-1998 |
phk |
Remove bdevsw_add(), change the only two users to use bdevsw_add_generic(). Extend cdevsw to be superset of bdevsw. Remove non-functional bdev lkm support. Teach wcd what the open() args mean.
|
37101 |
21-Jun-1998 |
bde |
Removed unused includes.
|
37094 |
21-Jun-1998 |
bde |
Removed unused includes.
|
36735 |
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
36677 |
05-Jun-1998 |
dg |
Changed the log() of "Out of mbuf clusters - increase maxusers" to a printf() of "Out of mbuf clusters - adjust NMBCLUSTERS or increase maxusers" so that the message is more informative and so that it will appear in the kernel message buffer.
|
36583 |
02-Jun-1998 |
dyson |
Cleanup and remove some dead code from the initialization.
|
36582 |
02-Jun-1998 |
dyson |
Correct sleep priority.
|
36326 |
24-May-1998 |
dyson |
Support a 16K first level cache for 512K 2nd level. Also, add support for 1MB 2nd level cache.
|
36275 |
21-May-1998 |
dyson |
Make flushing dirty pages work correctly on filesystems that unexpectedly do not complete writes even with sync I/O requests. This should help the behavior of mmaped files when using softupdates (and perhaps in other circumstances also.)
|
36177 |
19-May-1998 |
peter |
Make the previous commit compile..
|
36164 |
18-May-1998 |
guido |
Plug hole reported on Bugtraq: do not allow mmap with WRITE privs for append-only and immutable files.
Obtained from: OpenBSD (partly)
|
36112 |
16-May-1998 |
dyson |
An important fix for proper inheritance of backing objects for object splits. Another excellent detective job by Tor. Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
|
35694 |
04-May-1998 |
dyson |
Fix the shm panic. I mistakenly used the shadow_count to keep the object from being split, and instead added an OBJ_NOSPLIT.
|
35669 |
04-May-1998 |
dyson |
Work around some VM bugs, the worst being an overly aggressive swap space free calculation. More complete fixes will be forthcoming, in a week.
|
35615 |
02-May-1998 |
dyson |
Another minor cleanup of the split code. Make sure that pages are busied during the entire time, so that the waits for pages being unbusy don't make the objects inconsistant.
|
35612 |
02-May-1998 |
peter |
Seatbelts for vm_page_bits() in case a file offset is passed in rather than the page offset. If a large file offset was passed in, a large negative array index could be generated which could cause page faults etc at worst and file corruption at the least. (Pages are allocated within file space on page alignment boundaries, so a file offset being passed in here is harmless to DTRT. The case where this was happening has already been fixed though, this is in case it happens again).
Reviewed by: dyson
|
35571 |
01-May-1998 |
dyson |
Fix minor bug with new over used swap fix.
|
35499 |
29-Apr-1998 |
dyson |
Add a needed prototype, and fix a panic problem with the new memory code.
|
35497 |
29-Apr-1998 |
dyson |
Tighten up management of memory and swap space during map allocation, deallocation cycles. This should provide a measurable improvement on swap and memory allocation on loaded systems. It is unlikely a complete solution. Also, provide more map info with procfs. Chuck Cranor spurred on this improvement.
|
35485 |
28-Apr-1998 |
dyson |
Fix a pseudo-swap leak problem. This mitigates "leaks" due to freeing partial objects, not freeing entire objects didn't free any of it. Simple fix to the map code. Reviewed by: dg
|
35447 |
25-Apr-1998 |
dyson |
Correct copyright.
|
35210 |
15-Apr-1998 |
bde |
Support compiling with `gcc -ansi'.
|
34961 |
30-Mar-1998 |
phk |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences.
Reviewed by: bde
|
34924 |
28-Mar-1998 |
bde |
Moved some #includes from <sys/param.h> nearer to where they are actually used.
|
34611 |
16-Mar-1998 |
dyson |
Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!!
pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.)
pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances.
vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.)
vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust.
vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true.
vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.)
vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities.
vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.)
vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.)
10) Modify FFS to use vtruncbuf.
vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.)
vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly.
vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.)
vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.
|
34525 |
12-Mar-1998 |
guido |
Fix for mmap of char devices bug as described in OpenBSD advisory of 1998/02/20 Reviewed by: John Dyson Submitted by: "Cy Schubert" <cschuber@uumail.gov.bc.ca>
|
34403 |
09-Mar-1998 |
msmith |
Complement diagnostic messages about missing per-FS VOP page operations, but don't make their absence fatal. Submitted by: terry
|
34321 |
08-Mar-1998 |
dyson |
Quell unneeded pageout daemon activity.
|
34320 |
08-Mar-1998 |
dyson |
Remove a very ill advised vm_page_protect. This was being called for a non-managed page. That is a big no-no.
|
34236 |
08-Mar-1998 |
dyson |
Some cruft left over from my megacommit. A page rotation optimization was a good idea, but can cause instability. That optimization is now removed.
|
34235 |
08-Mar-1998 |
dyson |
Several minor fixes: 1) When freeing pages, it is a good idea to protect them off. (This is probably gratuitious, but good form.) 2) Allow collapsing pages in the backing object that are PQ_CACHE. This will improve memory utilization. 3) Correct the collapse code so that pages that were on the cache queue are moved to the inactive queue. This is done when pages are marked dirty (so that those pages will be properly paged out instead of freed), so that cached pages will not be paradoxically marked dirty.
|
34206 |
07-Mar-1998 |
dyson |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
34202 |
07-Mar-1998 |
dyson |
Make vm_fault much cleaner by removing the evil macro inlines, and put alot of it's context into a data structure. This allows significant shortening of its codepath, and will significantly decrease it's cache footprint.
Also, add some stats to vmmeter. Note that you'll have to rebuild/recompile vmstat, systat, etc... Otherwise, you'll get "very interesting" paging stats.
|
34030 |
04-Mar-1998 |
dufault |
Reviewed by: msmith, bde long ago POSIX.4 headers and sysctl variables. Nothing should change unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
|
33936 |
01-Mar-1998 |
dyson |
1) Use a more consistent page wait methodology. 2) Do not unnecessarily force page blocking when paging pages out. 3) Further improve swap pager performance and correctness, including fixing the paging in progress deadlock (except in severe I/O error conditions.) 4) Enable vfs_ioopt=1 as a default. 5) Fix and enable the page prezeroing in SMP mode.
All in all, SMP systems especially should show a significant improvement in "snappyness."
|
33847 |
26-Feb-1998 |
msmith |
In the author's words:
These diffs implement the first stage of a VOP_{GET|PUT}PAGES pushdown for local media FS's.
See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation details for generic *_{get|put}pages for local media FS's. Support is trivial to add for any FS that formerly relied on the default behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the ffs_getpages() code for the FS in question's *_{get|put}pages).
Obviously, it would be better if each local media FS implemented a more optimal method, instead of calling an exported interface from the /sys/vm/vnode_pager.c, but this is a necessary first step in getting the FS's to a point where they can be supplied with better implementations on a case-by-case basis.
Obviously, the cd9660_putpages() can be rather trivial (since it is a read-only FS type 8-)).
A slight (temporary) modification is made to print a diagnostic message in the case where the underlying filesystem attempts to engage in the previous behaviour. Failure is likely to be ungraceful.
Submitted by: terry@freebsd.org (Terry Lambert)
|
33817 |
25-Feb-1998 |
dyson |
Fix page prezeroing for SMP, and fix some potential paging-in-progress hangs. The paging-in-progress diagnosis was a result of Tor Egge's excellent detective work. Submitted by: Partially from Tor Egge.
|
33784 |
24-Feb-1998 |
dyson |
Correct some severe VM tuning problems for small systems (<=16MB), and improve tuning on larger systems. (A couple of the VM tuning params for small systems were so badly chosen that the system could hang under load.)
The broken tuning was originaly my fault.
|
33758 |
23-Feb-1998 |
dyson |
Significantly improve the efficiency of the swap pager, which appears to have declined due to code-rot over time. The swap pager rundown code has been clean-up, and unneeded wakeups removed. Lots of splbio's are changed to splvm's. Also, set the dynamic tunables for the pageout daemon to be more sane for larger systems (thereby decreasing the daemon overheadla.)
|
33757 |
23-Feb-1998 |
dyson |
Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden in a way identically as before.) I had problems with the system properly handling the number of vnodes when there is alot of system memory, and the default VM_KMEM_SIZE. Two new options "VM_KMEM_SIZE_SCALE" and "VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems with greater than 128MB.
Add some accouting for vm_zone memory allocations, and provide properly for vm_zone allocations out of the kmem_map. Also move the vm_zone allocation stats to the VM OID tree from the KERN OID tree.
|
33676 |
20-Feb-1998 |
bde |
Removed unused #includes.
|
33622 |
19-Feb-1998 |
msmith |
Move the 'sw' device off block major #1, which is now occupied by 'wfd'.
|
33181 |
09-Feb-1998 |
eivind |
Staticize.
|
33173 |
08-Feb-1998 |
dyson |
Fix an argument to vn_lock. It appears that alot of the vn_lock usage is a bit undisciplined, and should be checked carefully.
|
33134 |
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
33109 |
05-Feb-1998 |
dyson |
1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.
|
33108 |
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
33058 |
03-Feb-1998 |
bde |
Added #include of <sys/queue.h> so that this file is more "self"-sufficent.
|
33034 |
03-Feb-1998 |
dyson |
This fix should help the panic problems in -current. There were some errors in "interval" management. Due to the clustering mechanism, the code is necessarily complex and error prone.
|
32995 |
01-Feb-1998 |
bde |
Forward declare more structs that are used in prototypes here - don't depend on <sys/types.h> forward declaring common ones.
|
32952 |
01-Feb-1998 |
dyson |
Fix a performance problem caused by an earlier commit.
|
32946 |
31-Jan-1998 |
dyson |
contigalloc doesn't place the allocated page(s) into an object, and now this breaks vm_page_wire (due to wired page accounting per object.)
This should fix a problem as described by Donald Maddox.
|
32937 |
31-Jan-1998 |
dyson |
Change the busy page mgmt, so that when pages are freed, they MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes."
Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.
|
32751 |
25-Jan-1998 |
eivind |
Turn NSWAPDEV into a new-style option.
|
32726 |
24-Jan-1998 |
eivind |
Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.
This introduce an xxxFS_BOOT for each of the rootable filesystems. (Presently not required, but encouraged to allow a smooth move of option *FS to opt_dontuse.h later.)
LFS is temporarily disabled, and will be re-enabled tomorrow.
|
32724 |
24-Jan-1998 |
dyson |
Add better support for larger I/O clusters, including larger physical I/O. The support is not mature yet, and some of the underlying implementation needs help. However, support does exist for IDE devices now.
|
32702 |
22-Jan-1998 |
dyson |
VM level code cleanups.
1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
|
32670 |
21-Jan-1998 |
dyson |
Allow gdb to work again.
|
32585 |
17-Jan-1998 |
dyson |
Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management.
The system should be much more stable, but all subtile bugs aren't fixed yet.
|
32454 |
12-Jan-1998 |
dyson |
Fix some vnode management problems, and better mgmt of vnode free list. Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock.
PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.
|
32305 |
07-Jan-1998 |
dyson |
Turn off the VTEXT flag when an object is no longer referenced, so that an executable that is no longer running can be written to. Also, clear the OBJ_OPT flag more often, when appropriate.
|
32286 |
06-Jan-1998 |
dyson |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does.
When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex.
When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
32132 |
31-Dec-1997 |
alex |
caddr_t --> void *
|
32072 |
29-Dec-1997 |
dyson |
Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem with the object cache removal.
|
32071 |
29-Dec-1997 |
dyson |
Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include:
1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync.
Be gentle, and please give me feedback asap.
|
31991 |
25-Dec-1997 |
dyson |
The ioopt code is still buggy, but wasn't fully disabled.
|
31970 |
24-Dec-1997 |
dyson |
Support running with inadequate swap space. Additionally, the code will complain with a suggestion of increasing it.
|
31935 |
22-Dec-1997 |
dyson |
Improve my copyright.
|
31857 |
19-Dec-1997 |
dyson |
Change bogus usage of btoc to atop. The incorrect usage of btoc was pointed out by bde.
|
31853 |
19-Dec-1997 |
dyson |
Some performance improvements, and code cleanups (including changing our expensive OFF_TO_IDX to btoc whenever possible.)
|
31778 |
16-Dec-1997 |
eivind |
Make COMPAT_43 and COMPAT_SUNOS new-style options.
|
31729 |
15-Dec-1997 |
dyson |
Fix a recursive kernel_map lock problem in vm_zone allocator. PR: 5298
|
31712 |
14-Dec-1997 |
dyson |
Slight improvement to the vm_zone stats output. Also, some other superficial cleanups.
|
31709 |
14-Dec-1997 |
dyson |
After one of my analysis passes to evaluate methods for SMP TLB mgmt, I noticed some major enhancements available for UP situations. The number of UP TLB flushes is decreased much more than significantly with these changes. Since a TLB flush appears to cost minimally approx 80 cycles, this is a "nice" enhancement, equiv to eliminating between 40 and 160 instructions per TLB flush.
Changes include making sure that kernel threads all use the same PTD, and eliminate unneeded PTD switches at context switch time.
|
31667 |
11-Dec-1997 |
dyson |
Fix the prototype for swapout_procs(); Submitted by: dima@best.net
|
31563 |
06-Dec-1997 |
dyson |
Support an optional, sysctl enabled feature of idle process swapout. This is apparently useful for large shell systems, or systems with long running idle processes. To enable the feature:
sysctl -w vm.swap_idle_enabled=1
Please note that some of the other vm sysctl variables have been renamed to be more accurate. Submitted by: Much of it from Matt Dillon <dillon@best.net>
|
31561 |
05-Dec-1997 |
bde |
Don't include <sys/lock.h> in headers when only `struct simplelock' is required. Fixed everything that depended on the pollution.
|
31550 |
05-Dec-1997 |
dyson |
Add new (very useful) tunable for pageout daemon. The flag changes the maximum pageout rate:
sysctl -w vm.vm_maxlaunder=n
1 < n < inf.
If paging heavily on large systems, it is likely that a performance improvement can be achieved by increasing the parameter. On a large system, the parm is 32, but numbers as large as 128 can make a big difference. If paging is expensive, you might try decreasing the number to 1-8.
|
31542 |
04-Dec-1997 |
dyson |
Support applications that need to resist or deny use of swap space.
sysctl -w vm.defer_swap_pageouts=1 Causes the system to resist the use of swap space. In low memory conditions, performance will decrease. sysctl -w vm.disable_swap_pageouts=1 Causes the system to mostly disable the use of swap space. In low memory conditions, the system will likely start killing processes.
|
31493 |
02-Dec-1997 |
phk |
In all such uses of struct buf: 's/b_un.b_addr/b_data/g'
|
31393 |
24-Nov-1997 |
bde |
Removed all traces of P_IDLEPROC. It was tested but never set.
|
31392 |
24-Nov-1997 |
bde |
Don't #define max() to get a version that works with vm_ooffset's. Just use qmax().
This should be fixed more generally using overloaded functions.
|
31252 |
18-Nov-1997 |
bde |
Removed unused #include of <sys/malloc.h>. This file now uses only zalloc(). Many more cases like this are probably obscured by not including <vm/zone.h> explicitly (it is spammed into <sys/malloc.h>).
|
31175 |
14-Nov-1997 |
tegge |
Simplify map entries during user page wire and user page unwire operations in vm_map_user_pageable().
Check return value of vm_map_lock_upgrade() during a user page wire operation.
|
31017 |
07-Nov-1997 |
phk |
Rename some local variables to avoid shadowing other local variables.
Found by: -Wshadow
|
31016 |
07-Nov-1997 |
phk |
Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by: -Wunused
|
30994 |
06-Nov-1997 |
phk |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
30989 |
06-Nov-1997 |
dyson |
Fix the "missing page" problem. Also, improve the performance of page allocation in common cases.
|
30813 |
28-Oct-1997 |
bde |
Removed unused #includes.
|
30701 |
25-Oct-1997 |
dyson |
Support garbage collecting the pmap pv entries. The management doesn't happen until the system would have nearly failed anyway, so no signficant overhead is added. This helps large systems with lots of processes.
|
30700 |
24-Oct-1997 |
dyson |
Decrease the initial allocation for the zone allocations.
|
30354 |
12-Oct-1997 |
phk |
Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them.
A couple of finer points by: bde
|
30309 |
11-Oct-1997 |
phk |
Distribute and statizice a lot of the malloc M_* types.
Substantial input from: bde
|
30297 |
11-Oct-1997 |
peter |
Attempt to fix the previous fix to the contigmalloc1 prototype. struct malloc_type isn't defined in all cases (eg: from ddb), and the line wrapping was very badly mangled.
|
30286 |
10-Oct-1997 |
phk |
Fix contigmalloc() and contigmalloc1() arguments.
|
30139 |
06-Oct-1997 |
dyson |
Improve management of pages moving from the inactive to active queue. Additionally, add some much needed comments.
|
30137 |
06-Oct-1997 |
dyson |
Relax the vnode locking for read only operations.
|
29657 |
21-Sep-1997 |
peter |
Fix some style(9) and formatting problems. tabsize 4 formatting doesn't look too great with 'more' etc.
Approved by: dyson (with a minor grumble :-)
|
29653 |
21-Sep-1997 |
dyson |
Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
|
29368 |
14-Sep-1997 |
peter |
Update select -> poll in drivers.
|
29324 |
13-Sep-1997 |
peter |
Print correct function name in panics
|
29316 |
12-Sep-1997 |
jlemon |
Do not consider VM_PROT_OVERRIDE_WRITE to be part of the protection entry when handling a fault. This is set by procfs whenever it wants to write to a page, as a means of overriding `r-x COW' entries, but causes failures in the `rwx' case.
Submitted by: bde
|
29208 |
07-Sep-1997 |
bde |
Removed yet more vestiges of config-time swap configuration and/or cleaned up nearby cruft.
|
28992 |
01-Sep-1997 |
bde |
Removed unused #includes.
|
28991 |
01-Sep-1997 |
bde |
Some staticized variables were still declared to be extern.
|
28990 |
01-Sep-1997 |
bde |
Print a device number in hex instead of decimal.
|
28954 |
31-Aug-1997 |
phk |
Change the 0xdeadb hack to a flag called VDOOMED. Introduce VFREE which indicates that vnode is on freelist. Rename vholdrele() to vdrop(). Create vfree() and vbusy() to add/delete vnode from freelist. Add vfree()/vbusy() to keep (v_holdcnt != 0 || v_usecount != 0) vnodes off the freelist. Generalize vhold()/v_holdcnt to mean "do not recycle". Fix reassignbuf()s lack of use of vhold(). Use vhold() instead of checking v_cache_src list. Remove vtouch(), the vnodes are always vget'ed soon enough after for it to have any measuable effect. Add sysctl debug.freevnodes to keep track of things. Move cache_purge() up in getnewvnodes to avoid race. Decrement v_usecount after VOP_INACTIVE(), put a vhold() on it during VOP_INACTIVE() Unmacroize vhold()/vdrop() Print out VDOOMED and VFREE flags (XXX: should use %b)
Reviewed by: dyson
|
28940 |
30-Aug-1997 |
peter |
Allow non-page aligned file offset mmap's, providing that the system is allowed to choose the address, or that the MAP_FIXED address has the same remainder when modulo PAGE_SIZE as the file offset. Apparently this is posix1003.1b specified behavior. SVR4 and the other *BSD's allow it too. It costs us nothing to support and means we don't get EINVAL on some mmap code that works perfectly elsewhere.
Obtained from: NetBSD
|
28751 |
25-Aug-1997 |
bde |
Fixed type mismatches for functions with args of type vm_prot_t and/or vm_inherit_t. These types are smaller than ints, so the prototypes should have used the promoted type (int) to match the old-style function definitions. They use just vm_prot_t and/or vm_inherit_t. This depends on gcc features to work. I fixed the definitions since this is easiest. The correct fix may be to change the small types to u_int, to optimize for time instead of space.
|
28558 |
22-Aug-1997 |
dyson |
This is a trial improvement for the vnode reference count while on the vnode free list problem. Also, the vnode age flag is no longer used by the vnode pager. (It is actually incorrect to use then.) Constructive feedback welcome -- just be kind.
|
28551 |
21-Aug-1997 |
bde |
#include <machine/limits.h> explicitly in the few places that it is required.
|
28349 |
18-Aug-1997 |
fsmp |
Added includes of smp.h for SMP. This eliminates a bazillion warnings about implicit s_lock & friends.
|
28345 |
18-Aug-1997 |
dyson |
Fix kern_lock so that it will work. Additionally, clean-up some of the VM systems usage of the kernel lock (lockmgr) code. This is a first pass implementation, and is expected to evolve as needed. The API for the lock manager code has not changed, but the underlying implementation has changed significantly. This change should not materially affect our current SMP or UP code without non-standard parameters being used.
|
28028 |
10-Aug-1997 |
dyson |
The "cutsie" register parameter passing that I had mistakenly used breaks profiling. Since it doesn't really improve perf much, I have backed it out.
|
27947 |
07-Aug-1997 |
dyson |
More vm_zone cleanup. The sysctl now accounts for items better, and counts the number of allocations.
|
27930 |
06-Aug-1997 |
dyson |
Add exposure of some vm_zone allocation stats by sysctl. Also, change the initialization parameters of some zones in VM map. This contains only optimizations and not bugfixes.
|
27924 |
05-Aug-1997 |
dyson |
Fixed the commit botch that was causing crashes soon after system startup. Due to the error, the initialization of the zone for pv_entries was missing. The system should be usable again.
|
27923 |
05-Aug-1997 |
dyson |
Another attempt at cleaning up the new memory allocator.
|
27922 |
05-Aug-1997 |
dyson |
Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use the new zone code in pmap.c so that we can get rid of the ugly ad-hoc allocations in pmap.c.
|
27905 |
05-Aug-1997 |
dyson |
Modify pmap to use our new memory allocator. Also, change the vm_map_entry allocations to be interrupt safe.
|
27901 |
05-Aug-1997 |
dyson |
A very simple zone allocator.
|
27899 |
05-Aug-1997 |
dyson |
Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of a simple, clean zone type allocator. This new allocator will also be used for machine dependent pmap PV entries.
|
27845 |
02-Aug-1997 |
bde |
Removed unused #includes.
|
27716 |
27-Jul-1997 |
dyson |
Add the ability for the pageout daemon to measure stats on memory usage before the system is out of memory. The daemon does a minimal amount of work that increases as the system becomes more likely to run out of memory and page in/out.
The default tuning is fairly low in background CPU usage, and sysctl variables have been added to enable flexable operation. This is an experimental feature that will likely be changed and improved over time.
|
27715 |
27-Jul-1997 |
dyson |
Fix a very subtile problem that causes unnessary numbers of objects backing a single logical object. Submitted by: Alan Cox <alc@cs.rice.edu>
|
27464 |
17-Jul-1997 |
dyson |
Add support for 4MB pages. This includes the .text, .data, .data parts of the kernel, and also most of the dynamic parts of the kernel. Additionally, 4MB pages will be allocated for display buffers as appropriate (only.)
The 4MB support for SMP isn't complete, but doesn't interfere with operation either.
|
26851 |
23-Jun-1997 |
tegge |
Don't try upgrading an existing exclusive lock in vm_map_user_pageable. This should close PR kern/3180. Also remove a bogus unconditional call to vm_map_unlock_read in vm_map_lookup.
|
26811 |
22-Jun-1997 |
peter |
Kill some stale leftovers from the earlier attempts at SMP per-cpu pages
|
26780 |
22-Jun-1997 |
dyson |
Remove a window during running down a file vnode. Also, the OBJ_DEAD flag wasn't being respected during vref(), et. al. Note that this isn't the eventual fix for the locking problem. Fine grained SMP in the VM and VFS code will require (lots) more work.
|
26668 |
15-Jun-1997 |
dyson |
Correct the return code for the mlock system call. Also add the stubs for mlockall and munlockall.
|
26667 |
15-Jun-1997 |
dyson |
Fix a reference problem with maps. Only appears to manifest itself when sharing address spaces.
|
26258 |
29-May-1997 |
peter |
Update the #include "opt_smpxxx.h" includes - opt_smp.h isn't needed very much in the generic parts of the kernel now.
|
25930 |
19-May-1997 |
dfr |
Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully can be tidied up later by a slight redesign.
PR: kern/2573, kern/2754, kern/3046 (possibly) Reviewed by: dyson
|
25352 |
01-May-1997 |
dyson |
Check the correct queue for waking up the pageout daemon. Specifically, the pageout daemon wasn't always being waken up appropriately when the (cache + free) queues were depleted. Submitted by: David S. Miller <davem@jenolan.rutgers.edu>
|
25164 |
26-Apr-1997 |
peter |
Man the liferafts! Here comes the long awaited SMP -> -current merge!
There are various options documented in i386/conf/LINT, there is more to come over the next few days.
The kernel should run pretty much "as before" without the options to activate SMP mode.
There are a handful of known "loose ends" that need to be fixed, but have been put off since the SMP kernel is in a moderately good condition at the moment.
This commit is the result of the tinkering and testing over the last 14 months by many people. A special thanks to Steve Passe for implementing the APIC code!
|
25074 |
21-Apr-1997 |
peter |
Send this to the Attic so there's no mixups over which kern_lock.c is in use in -current.
|
24917 |
14-Apr-1997 |
peter |
Unused variable (upobj is now purely handled within pmap)
|
24848 |
13-Apr-1997 |
dyson |
Fully implement vfork. Vfork is now much much faster than even our fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)
Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory from the other threads of a group.
Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating possible existing shares with other threads/processes.
Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a thread from the rest of the group.
Fix the case where a thread does an exec. It is almost nonsense for a thread to modify the other threads address space by an exec, so we now automatically divorce the address space before modifying it.
|
24691 |
07-Apr-1997 |
peter |
The biggie: Get rid of the UPAGES from the top of the per-process address space. (!)
Have each process use the kernel stack and pcb in the kvm space. Since the stacks are at a different address, we cannot copy the stack at fork() and allow the child to return up through the function call tree to return to user mode - create a new execution context and have the new process begin executing from cpu_switch() and go to user mode directly. In theory this should speed up fork a bit.
Context switch the tss_esp0 pointer in the common tss. This is a lot simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer to each process's tss since the esp0 pointer is a 32 bit pointer, and the sd_base setting is split into three different bit sections at non-aligned boundaries and requires a lot of twiddling to reset.
The 8K of memory at the top of the process space is now empty, and unmapped (and unmappable, it's higher than VM_MAXUSER_ADDRESS).
Simplity the pmap code to manage process contexts, we no longer have to double map the UPAGES, this simplifies and should measuably speed up fork().
The following parts came from John Dyson:
Set PG_G on the UPAGES that are now in kernel context, and invalidate them when swapping them out.
Move the upages object (upobj) from the vmspace to the proc structure.
Now that the UPAGES (pcb and kernel stack) are out of user space, make rfork(..RFMEM..) do what was intended by sharing the vmspace entirely via reference counting rather than simply inheriting the mappings.
|
24678 |
06-Apr-1997 |
peter |
Commit a typo fix that's been sitting in my tree for ages, quite forgotten. The typo was detected once apon a time with the -Wunused compile option. The result was that a block of code for implementing madvise(.. MADV_SEQUENTIAL..) behavior was "dead" and unused, probably negating the effect of activating the option.
Reviewed by: dyson
|
24668 |
06-Apr-1997 |
dyson |
Make vm_map_protect be more complete about map simplification. This is useful when a process changes it's page range protections very much. Submitted by: Alan Cox <alc@cs.rice.edu>
|
24667 |
06-Apr-1997 |
dyson |
Correction to the prototype for vm_fault.
|
24666 |
06-Apr-1997 |
dyson |
Fix the gdb executable modify problem. Thanks to the detective work by Alan Cox <alc@cs.rice.edu>, and his description of the problem.
The bug was primarily in procfs_mem, but the mistake likely happened due to the lack of vm system support for the operation. I added better support for selective marking of page dirty flags so that vm_map_pageable(wiring) will not cause this problem again.
The code in procfs_mem is now less bogus (but maybe still a little so.)
|
24478 |
01-Apr-1997 |
bde |
Removed potentially harmful garbage <vm/lock.h> and fixed bogus use of it. It was actually harmless because the use was null due to fortuitous include orders and identical (wrong) idempotency macros.
|
24437 |
31-Mar-1997 |
dg |
Changed the way that the exec image header is read to be filesystem- centric rather than VM-centric to fix a problem with errors not being detectable when the header is read. Killed exech_map as a result of these changes. There appears to be no performance difference with this change.
|
24131 |
23-Mar-1997 |
bde |
Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
|
24130 |
23-Mar-1997 |
dyson |
Fix a significant error in the accounting for pre-zeroed pages. This is a candidate for RELENG_2_2...
|
23502 |
08-Mar-1997 |
dyson |
When removing IN_RECURSE support during the Lite/2 merge, read/write to/from mmaped regions was broken. This commit fixes the breakage, and uses the new Lite/2 locking mechanisms.
|
23157 |
27-Feb-1997 |
bde |
Removed a wrong LK_INTERLOCK flag.
|
22975 |
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
22878 |
18-Feb-1997 |
bde |
Removed vestiges of Mach lock types.
vm_map.h: Removed #include of <sys/proc.h>. curproc is only used in some macros and users of the macros already include <sys/proc.h>.
|
22670 |
13-Feb-1997 |
wollman |
Provide an alternative interface to contigmalloc() which allows a specific map to be used when allocating the kernel va (e.g., mb_map). The VM gurus may want to look this over.
|
22521 |
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
22156 |
31-Jan-1997 |
dyson |
Another fix to inheriting shared segments. Do the copy on write thing if needed. Submitted by: Alan Cox <alc@cs.rice.edu>
|
21987 |
24-Jan-1997 |
dg |
Added a check/panic for v_usecount being 0 (no vnode reference) in vnode_pager_alloc().
|
21940 |
22-Jan-1997 |
dyson |
Fix two problems where a NULL object is dereferenced. One problem was in the VM_INHERIT_SHARE case of vmspace_fork, and also in vm_map_madvise. Submitted by: Alan Cox <alc@cs.rice.edu>
|
21881 |
20-Jan-1997 |
dyson |
Make MADV_FREE work better. Specifically, it did not wait for the page to be unbusy, and it caused some algorithmic problems as a result. There were some other problems with it also, so this is a general cleanup of the code. Submitted by: Douglas Crosher <dtc@scrooge.ee.swin.oz.au> and myself.
|
21754 |
16-Jan-1997 |
dyson |
Change the map entry flags from bitfields to bitmasks. Allows for some code simplification.
|
21737 |
15-Jan-1997 |
dg |
Fix bug related to map entry allocations where a sleep might be attempted when allocating memory for network buffers at interrupt time. This is due to inadequate checking for the new mcl_map. Fixed by merging mb_map and mcl_map into a single mb_map.
Reviewed by: wollman
|
21733 |
15-Jan-1997 |
bde |
Removed redundant spl0()'s from kernel processes. They were work-arounds for a bug in fork().
|
21673 |
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
21530 |
11-Jan-1997 |
dyson |
Slightly correct the code that moves pages from the active to the inactive queue. This is only a minor performance improvement, but will not affect perf on machines that don't have ref bits.
|
21529 |
11-Jan-1997 |
dyson |
Prepare better for multi-platform by eliminating another required pmap routine (pmap_is_referenced.) Upper level recoded to use pmap_ts_referenced.
|
21258 |
03-Jan-1997 |
dyson |
Undo the collapse breakage (swap space usage problem.)
|
21157 |
01-Jan-1997 |
dyson |
Guess what? We left alot of the old collapse code that is not needed anymore with the "full" collapse fix that we added about 1yr ago!!! The code has been removed by optioning it out for now, so we can put it back in ASAP if any problems are found.
|
21134 |
31-Dec-1996 |
dyson |
A very significant improvement in the management of process maps and objects. Previously, "fancy" memory management techniques such as that used by the M3 RTS would have the tendancy of chopping up processes allocated memory into lots of little objects. Alan has come up with some improvements to migtigate the sitution to the point where even the M3 RTS only has one object for bss and it's managed memory (when running CVSUP.) (There are still cases where the situation isn't improved when the system pages -- but this is much much better for the vast majority of cases.) The system will now be able to much more effectively merge map entries.
Submitted by: Alan Cox <alc@cs.rice.edu>
|
21039 |
30-Dec-1996 |
dyson |
Let the VM system know that on certain arch's that VM_PROT_READ also implies VM_PROT_EXEC. We support it that way for now, since the break system call by default gives VM_PROT_ALL. Now we have a better chance of coalesing map entries when mixing mmap/break type operations. This was contributing to excessive numbers of map entries on the modula-3 runtime system. The problem is still not "solved", but the situation makes more sense.
Eventually, when we work on architectures where VM_PROT_READ is orthogonal to VM_PROT_EXEC, we will have to visit this issue carefully (esp. regarding security issues.)
|
21037 |
30-Dec-1996 |
dyson |
EEEK!!! useracc and kernacc didn't lock their respective maps. Additionally, eliminate the map->hint distortion associated with useracc. That may/may-not be the "right" thing to do -- but time will tell. Submitted by: Partially by Alan Cox <alc@cs.rice.edu>
|
20999 |
29-Dec-1996 |
dyson |
Superficial cleanup of comment.
|
20993 |
28-Dec-1996 |
dyson |
Eliminate the redundancy due to the similarity between the routines vm_map_simplify and vm_map_simplify_entry. Make vm_map_simplify_entry handle wired maps so that we can get rid of vm_map_simplify. Modify the callers of vm_map_simplify to properly use vm_map_simplify_entry. Submitted by: Alan Cox <alc@cs.rice.edu>
|
20991 |
28-Dec-1996 |
dyson |
The code unnecessarily created an object with no handle up-front, which has the negative effect of disabling some map optimizations. This patch defers the creation of the object until it needs to be at fault time. Submitted by: Alan Cox <alc@cs.rice.edu>
|
20821 |
22-Dec-1996 |
joerg |
Make DFLDSIZ and MAXDSIZ fully-supported options.
"Don't forget to do a ``make depend''" :-)
|
20449 |
14-Dec-1996 |
dyson |
Implement closer-to POSIX mlock semantics. The major difference is that we do allow mlock to span unallocated regions (of course, not mlocking them.) We also allow mlocking of RO regions (which the old code couldn't.) The restriction there is that once a RO region is wired (mlocked), it cannot be debugged (or EVER written to.)
Under normal usage, the new mlock code will be a significant improvement over our old stuff.
|
20189 |
07-Dec-1996 |
dyson |
Expunge inlines...
|
20187 |
07-Dec-1996 |
dyson |
Fix a map entry leak problem found by DG. Also, de-inline a function vm_map_entry_dispose, because it won't help being inlined.
|
20182 |
07-Dec-1996 |
dyson |
Make vm_map_insert much more intelligent in the MAP_NOFAULT case so that map entries are coalesced when appropriate. Also, conditionalize some code that is currently not used in vm_map_insert. This mod has been added to eliminate unnecessary map entries in buffer map.
Additionally, there were some cases where map coalescing could be done when it shouldn't. That problem has been resolved.
|
20054 |
30-Nov-1996 |
dyson |
Implement a new totally dynamic (up to MAXPHYS) buffer kva allocation scheme. Additionally, add the capability for checking for unexpected kernel page faults. The maximum amount of kva space for buffers hasn't been decreased from where it is, but it will now be possible to do so.
This scheme manages the kva space similar to the buffers themselves. If there isn't enough kva space because of usage or fragementation, buffers will be reclaimed until a buffer allocation is successful. This scheme should be very resistant to fragmentation problems until/if the LFS code is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS is not likely to use such a scheme.
Now there should be NO problem allocating buffers up to MAXPHYS.
|
20007 |
28-Nov-1996 |
dyson |
Make the kernel smaller with at worst a neutral effect on perf by de-inlining some VM calls. (Actually, I measured a small improvement.)
|
19830 |
17-Nov-1996 |
dyson |
Improve the locality of reference for variables in vm_page and vm_kern by moving them from .bss to .data. With this change, there is a measurable perf improvement in fork/exec.
|
19415 |
05-Nov-1996 |
dyson |
Vastly improved contigmalloc routine. It does not solve the problem of allocating contiguous buffer memory in general, but make it much more likely to work at boot-up time. The best chance for an LKM-type load of a sound driver is immediately after the mount of the root filesystem.
This appears to work for a 64K allocation on an 8MB system.
|
19259 |
29-Oct-1996 |
dyson |
Change mmap to use OBJT_DEFAULT instead of OBJT_SWAP by default for anonymous objects. The system will automatically change the type to SWAP if needed (for size or pageout reasons.)
|
19216 |
27-Oct-1996 |
phk |
The way we get a vnode for swapdev is not quite kosher. In particular it breaks in the DEVFS_ROOT case. replicate a bit too much of bdevvp() in here to circumvent the problem. The real problem is the magic that lives in bdevsw[1].
|
19142 |
24-Oct-1996 |
dyson |
Remove a bogus optimization in the mmap code. It is superfluous, and at best is the same speed as the unoptimized code. At worst, it slows down trivial programs.
|
18974 |
17-Oct-1996 |
dyson |
Make processes waken up eligible for immediate swap-in.
|
18973 |
17-Oct-1996 |
dyson |
Clean up the rundown of the object backing a vnode. This should fix NFS problems associated with forcible dismounts.
|
18942 |
15-Oct-1996 |
bde |
Removed nested include of <sys/proc.h> from <vm/vm_object.h> and fixed the one place that depended on it. wakeup() is now prototyped in <sys/systm.h> so that it is normally visible.
Added nested include of <sys/queue.h> in <vm/vm_object.h>. The queue macros are a more fundamental prerequisite for <vm/vm_object.h> than the wakeup prototype and previously happened to be included by namespace pollution from <sys/proc.h> or elsewhere.
|
18937 |
15-Oct-1996 |
dyson |
Move much of the machine dependent code from vm_glue.c into pmap.c. Along with the improved organization, small proc fork performance is now about 5%-10% faster.
|
18908 |
13-Oct-1996 |
phk |
Remove a stale comment.
|
18893 |
12-Oct-1996 |
bde |
Removed __pure's and __pure2's. __pure is a no-op for recent versions of gcc by definition, and __pure2 is a no-op in effect (presumably the compiler can see when an inline function has no side effects).
|
18779 |
06-Oct-1996 |
dyson |
Make the default cache size optim to be 256K, the old default was 64K. The change has essentially neutral effect on those machines with little or no cache, and has a positive effect on "normal" machines with 256K or more cache.
|
18768 |
06-Oct-1996 |
dyson |
Fix a problem with the page coloring code that the system will not always be able to use all of the free pages. This can manifest as a panic using DIAGNOSTIC, or as a panic on an indirect memory reference.
|
18542 |
28-Sep-1996 |
bde |
Fixed undeclared variables for the !(PQ_L2_SIZE > 1) case.
Removed redundant #include.
|
18526 |
28-Sep-1996 |
dyson |
Reviewed by: Submitted by: Obtained from:
|
18389 |
19-Sep-1996 |
dg |
Fixed bug with reversed trunc/round_page() in madvise...start must be trunced, end must be rounded.
|
18307 |
15-Sep-1996 |
bde |
Removed iprintf(). It was copied to db_iprintf() in ddb.
|
18298 |
14-Sep-1996 |
bde |
Attached vm ddb commands `show map', `show vmochk', `show object', `show vmopag', `show page' and `show pageq'. Moved all vm ddb stuff to the ends of the vm source files.
Changed printf() to db_printf(), `indent' to db_indent, and iprintf() to db_iprintf() in ddb commands. Moved db_indent and db_iprintf() from vm to ddb.
vm_page.c: Don't use __pure. Staticized.
db_output.c: Reduced page width from 80 to 79 to inhibit double spacing for long lines (there are still some problems if words are printed across column 79).
|
18205 |
10-Sep-1996 |
dyson |
The whole issue of not support VOP_LOCK for VBLK devices should be rethought. This fixes YET another problem with unmounting filesystems. The root cause is not fixed here, but at least the problem has gone away.
|
18178 |
08-Sep-1996 |
dyson |
Fixed the use of the wrong variable in vm_map_madvise.
|
18169 |
08-Sep-1996 |
dyson |
Addition of page coloring support. Various levels of coloring are afforded. The default level works with minimal overhead, but one can also enable full, efficient use of a 512K cache. (Parameters can be generated to support arbitrary cache sizes also.)
|
18163 |
08-Sep-1996 |
dyson |
Improve the scalability of certain pmap operations.
|
17761 |
21-Aug-1996 |
dyson |
Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc.
Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility.
Reviewed by: davidg
|
17334 |
30-Jul-1996 |
dyson |
Backed out the recent changes/enhancements to the VM code. The problem with the 'shell scripts' was found, but there was a 'strange' problem found with a 486 laptop that we could not find. This commit backs the code back to 25-jul, and will be re-entered after the snapshot in smaller (more easily tested) chunks.
|
17313 |
28-Jul-1996 |
dg |
Slight performance tweak for previous commit.
|
17312 |
28-Jul-1996 |
dyson |
Undo part of the scalability commit. Many of the changes in vm_fault had some performance enhancements not ready for prime time. This commit backs out some of the changes.
|
17301 |
27-Jul-1996 |
dyson |
Allow sequentially created mmap'ed anonymous regions to coalesce. There is little or no reason to create a swap pager for small mmap's. The vm_map_insert code will automatically create a swap pager if the object becomes too large. This fix, per a request from phk.
|
17298 |
27-Jul-1996 |
dyson |
Clean up some lint.
|
17297 |
27-Jul-1996 |
dyson |
Remove experimental header file. My test-build must have picked it up in an unexpected place. Submitted by: jkh
|
17295 |
27-Jul-1996 |
dyson |
Missing (prototype) change from the previous commit.
|
17294 |
27-Jul-1996 |
dyson |
This commit is meant to solve a couple of VM system problems or performance issues.
1) The pmap module has had too many inlines, and so the object file is simply bigger than it needs to be. Some common code is also merged into subroutines. 2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls. Unfortunately, a few have needed to be added also. The removal caused the need for more vm_page_lookups. I added lookup hints to minimize the need for the page table lookup operations. 3) Removal of some bogus performance improvements, that mostly made the code more complex (tracking individual page table page updates unnecessarily). Those improvements actually hurt 386 processors perf (not that people who worry about perf use 386 processors anymore :-)). 4) Changed pv queue manipulations/structures to be TAILQ's. 5) The pv queue code has had some performance problems since day one. Some significant scalability issues are resolved by threading the pv entries from the pmap AND the physical address instead of just the physical address. This makes certain pmap operations run much faster. This does not affect most micro-benchmarks, but should help loaded system performance *significantly*. DG helped and came up with most of the solution for this one. 6) Most if not all pmap bit operations follow the pattern: pmap_test_bit(); pmap_clear_bit(); That made for twice the necessary pv list traversal. The pmap interface now supports only pmap_tc_bit type operations: pmap_[test/clear]_modified, pmap_[test/clear]_referenced. Additionally, the modified routine now takes a vm_page_t arg instead of a phys address. This eliminates a PHYS_TO_VM_PAGE operation. 7) Several rewrites of routines that contain redundant code to use common routines, so that there is a greater likelihood of keeping the cache footprint smaller.
|
17108 |
12-Jul-1996 |
bde |
Don't use NULL in non-pointer contexts.
|
17004 |
08-Jul-1996 |
dyson |
Back-off on the previous commit, specifically remove the look-ahead optimization on the active queue scan. I will do this correctly later.
|
17003 |
08-Jul-1996 |
dyson |
Fix a problem with the pageout daemon RSS limiting, where it degrades performance to LRU or worse when RSS limiting takes effect. Also, make an end condition in the active queue scan more efficient in the case where pages are removed from the active queue as a side effect of a pmap operation.
|
16993 |
07-Jul-1996 |
dg |
In all special cases for spl or page_alloc where kmem_map is check for, mb_map (a submap of kmem_map) must also be checked. Thanks to wcarchive (err...sort of) for demonstrating this bug.
|
16892 |
02-Jul-1996 |
dyson |
Properly set the PG_MAPPED and PG_WRITEABLE flags. This fixes some potential problems with vm_map_remove/vm_map_delete.
|
16858 |
30-Jun-1996 |
dyson |
Make -current consistant with -stable regarding time that a process sleeps before being swapped out. The time is increased from 4 secs to 10 secs. Originally I had decreased it from 20 to 4, but that is a bit severe. 20 is too long though.
|
16834 |
29-Jun-1996 |
dg |
Make sure we have an object in the map entry before trying to trim pages from it.
|
16750 |
26-Jun-1996 |
dyson |
This commit does a couple of things: Re-enables the RSS limiting, and the routine is now tail-recursive, making it much more safe (eliminates the possiblity of kernel stack overflow.) Also, the RSS limiting is a little more intelligent about finding the likely objects that are pushing the process over the limit.
Added some sysctls that help with VM system tuning.
New sysctl features: 1) Enable/disable lru pageout algorithm. vm.pageout_algorithm = 0, default algorithm that works well, especially using X windows and heavy memory loading. Can have adverse effects, sometimes slowing down program loading.
vm.pageout_algorithm = 1, close to true LRU. Works much better than clock, etc. Does not work as well as the default algorithm in general. Certain memory "malloc" type benchmarks work a little better with this setting.
Please give me feedback on the performance results associated with these.
2) Enable/disable swapping. vm.swapping_enabled = 1, default.
vm.swapping_enabled = 0, useful for cases where swapping degrades performance.
The config option "NO_SWAPPING" is still operative, and takes precedence over the sysctl. If "NO_SWAPPING" is specified, the sysctl still exists, but "vm.swapping_enabled" is hard-wired to "0".
Each of these can be changed "on the fly."
|
16679 |
25-Jun-1996 |
dyson |
Fix some serious problems with limits checking in the sbrk(2)/brk(2) code. Reviewed by: bde
|
16664 |
24-Jun-1996 |
dyson |
Remove RSS limiting until I rewrite the code to be non-recursive. The code can overrun the kernel stack under very stressful conditions.
|
16562 |
21-Jun-1996 |
dyson |
Improve algorithm for page hash queue. It was previously about as bad as it could be. This algorithm appears to improve fork performance (barely) measurably.
|
16415 |
17-Jun-1996 |
dyson |
Several bugfixes/improvements: 1) Make it much less likely to miss a wakeup in vm_page_free_wakeup 2) Create a new entry point into pmap: pmap_ts_referenced, eliminates the need to scan the pv lists twice in many cases. Perhaps there is alot more to do here to work on minimizing pv list manipulation 3) Minor improvements to vm_pageout including the use of pmap_ts_ref. 4) Major changes and code improvement to pmap. This code has had several serious bugs in page table page manipulation. In order to simplify the problem, and hopefully solve it for once and all, page table pages are no longer "managed" with the pv list stuff. Page table pages are only (mapped and held/wired) or (free and unused) now. Page table pages are never inactive, active or cached. These changes have probably fixed the hold count problems, but if they haven't, then the code is simpler anyway for future bugfixing. 5) The pmap code has been sorely in need of re-organization, and I have taken a first (of probably many) steps. Please tell me if you have any ideas.
|
16409 |
16-Jun-1996 |
dyson |
Various bugfixes/cleanups from me and others: 1) Remove potential race conditions on waking up in vm_page_free_wakeup by making sure that it is at splvm(). 2) Fix another bug in vm_map_simplify_entry. 3) Be more complete about converting from default to swap pager when an object grows to be large enough that there can be a problem with data structure allocation under low memory conditions. 4) Make some madvise code more efficient. 5) Added some comments.
|
16377 |
14-Jun-1996 |
dg |
Move a case of PG_MAPPED being set before a pmap_enter(). This will likely make no difference, but it will make it consistent with other uses of PG_MAPPED.
|
16324 |
12-Jun-1996 |
dyson |
Fix a very significant cnt.v_wire_count leak in vm_page.c, and some minor leaks in pmap.c. Bruce Evans made me aware of this problem.
|
16318 |
12-Jun-1996 |
dyson |
Fix some serious errors in vm_map_simplify_entries.
|
16274 |
10-Jun-1996 |
dyson |
Mostly superficial code improvements, add a diagnostic. The code improvements include significant simplification of the reservation of the swap pager control blocks for reads. Add a panic for an inconsistent swap pager control block count.
|
16268 |
10-Jun-1996 |
dyson |
Keep the vm_fault/vm_pageout from getting into an "infinite paging loop", by reserving "cached" pages before waking up the pageout daemon. This will reserve the faulted page, and keep the system from thrashing itself to death given this condition.
|
16197 |
08-Jun-1996 |
dyson |
Adjust the threshold for blocking on movement of pages from the cache queue in vm_fault.
Move the PG_BUSY in vm_fault to the correct place.
Remove redundant/unnecessary code in pmap.c.
Properly block on rundown of page table pages, if they are busy.
I think that the VM system is in pretty good shape now, and the following individuals (among others, in no particular order) have helped with this recent bunch of bugs, thanks! If I left anyone out, I apologize!
Stephen McKay, Stephen Hocking, Eric J. Chet, Dan O'Brien, James Raynard, Marc Fournier.
|
16122 |
05-Jun-1996 |
dyson |
Keep page-table pages from ever being sensed as dirty. This should fix some problems with the page-table page management code, since it can't deal with the notion of page-table pages being paged out or in transit. Also, clean up some stylistic issues per some suggestions from Stephen McKay.
|
16058 |
01-Jun-1996 |
dyson |
Disable madvise optimizations for device pager objects (some of the operations don't work with FICTITIOUS pages.) Also, close a window between PG_MANAGED and pmap_enter that can mess up the accounting of the managed flag. This problem could likely cause a hold_count error for page table pages.
|
16026 |
31-May-1996 |
dyson |
This commit is dual-purpose, to fix more of the pageout daemon queue corruption problems, and to apply Gary Palmer's code cleanups. David Greenman helped with these problems also. There is still a hang problem using X in small memory machines.
|
15980 |
29-May-1996 |
dyson |
Correct some unfortunately chosen constants, otherwise, not enough pages are calculated for deferred allocation of swap pager data structures. This is a follow-on to the previous commit to this file.
|
15979 |
29-May-1996 |
dyson |
After careful review by David Greenman and myself, David had found a case where blocking can occur, thereby giving other process's a chance to modify the queue where a page resides. This could cause numerous process and system failures.
|
15978 |
29-May-1996 |
dyson |
Make sure that pageout deadlocks cannot occur. There is a problem that the datastructures needed to support the swap pager can take enough space to fully deplete system memory, and cause a deadlock. This change keeps large objects from being filled with dirty pages without the appropriate swap pager datastructures. Right now, default objects greater than 1/4 the size of available system memory are converted to swap objects, thereby eliminating the risk of deadlock.
|
15905 |
26-May-1996 |
dyson |
Fix a couple of problems in the pageout_scan routine. First, there is a condition when blocking can occur, and the daemon did not check properly for a page remaining on the expected queue. Additionally, the inactive target was being set much too large for small memory machines. It is now being calculated based upon the amount of user memory available on every pageout daemon run. Another problem was that if memory was very low, the pageout daemon could fail repeatedly to traverse the inactive queue.
|
15904 |
26-May-1996 |
dyson |
I think this covers (fixes) the last batch of freeing active/held/busy page problem. BY MISTAKE, the vm_page_unqueue (or equiv) was removed from the vm_fault code. Really bad things appear to happen if a page is on a queue while it is being faulted.
|
15890 |
24-May-1996 |
dyson |
Add an assert to vm_page_cache. We should never cache a dirty page.
|
15889 |
24-May-1996 |
dyson |
Add apparently needed splvm protection to the active queue, and eliminate an unnecessary test for dirty pages if it is already known to be dirty.
|
15888 |
24-May-1996 |
dyson |
Eliminate inefficient check for dirty pages for pages in the PQ_CACHE queue. Also, modify the MADV_FREE policy (it probably still isn't the final version.)
|
15887 |
24-May-1996 |
dyson |
Make the conversion from the default pager to swap pager more robust in the face of low memory conditions.
|
15876 |
23-May-1996 |
dyson |
Eliminate a vm_page_free, busy panic, in kern_malloc.
|
15873 |
23-May-1996 |
dyson |
Initial support for MADV_FREE, support for pages that we don't care about the contents anymore. This gives us alot of the advantage of freeing individual pages through munmap, but with almost none of the overhead.
|
15841 |
21-May-1996 |
dyson |
After reviewing the previous commit to vm_object, the page protection is never necessary, not just for PG_FICTICIOUS.
|
15836 |
21-May-1996 |
dyson |
Don't protect non-managed pages off during object rundown. This fixes a hang that occurs under certain circumstances when exiting X.
|
15819 |
19-May-1996 |
dyson |
Initial support for mincore and madvise. Both are almost fully supported, except madvise does not page in with MADV_WILLNEED, and MADV_DONTNEED doesn't force dirty pages out.
|
15811 |
18-May-1996 |
dyson |
One more file missing from the mega-commit. This inlines some very simple routines in vm_page.c, so that an unnecessary subroutine call is removed.
|
15810 |
18-May-1996 |
dyson |
File mistakenly left out of the previous mega-commit. This provides a global defn for 'exech_map.'
|
15809 |
18-May-1996 |
dyson |
This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:
More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.
|
15722 |
10-May-1996 |
wollman |
Allocate mbufs from a separate submap so that NMBCLUSTERS works as expected.
|
15583 |
03-May-1996 |
phk |
Another sweep over the pmap/vm macros, this time with more focus on the usage. I'm not satisfied with the naming, but now at least there is less bogus stuff around.
|
15543 |
02-May-1996 |
phk |
removed: CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei() ptei() kvtopte() ptetov() ispt() ptetoav() &c &c new: NPDEPG
Major macro cleanup.
|
15534 |
02-May-1996 |
phk |
KGDB is dead. It may come back one day if somebody does it.
|
15459 |
29-Apr-1996 |
dyson |
Move the map entry allocations from the kmem_map to the kernel_map. As a side effect, correct the associated object offset.
|
15367 |
24-Apr-1996 |
dyson |
This fixes kmem_malloc/kmem_free (and malloc/free of objects of > 8K). A page index was calculated incorrectly in vm_kern, and vm_object_page_remove removed pages that should not have been.
|
15203 |
11-Apr-1996 |
bde |
Fixed a spl hog. The vmdaemon process ran entirely at splhigh. It sometimes disabled clock interrupts for 60 msec or more on a P133. Clock interrupts were lost ...
Reviewed by: dyson
|
15153 |
09-Apr-1996 |
dyson |
Reinstitute the map lock for processes being swapped out. This is needed because of the vm_fault used to bring the page table page for the kernel stack (UPAGES) back in. The consequence of the previous incorrect change was a system hang.
|
15134 |
08-Apr-1996 |
dyson |
Map lock checks not needed anymore for swapping out. We don't use map operations for it anymore. Certain deadlocks should never happen anymore.
|
15117 |
07-Apr-1996 |
bde |
Removed never-used #includes of <machine/cpu.h>. Many were apparently copied from bad examples.
|
15018 |
03-Apr-1996 |
dyson |
Fixed a problem that the UPAGES of a process were being run down in a suboptimal manner. I had also noticed some panics that appeared to be at least superficially caused by this problem. Also, included are some minor mods to support more general handling of page table page faulting. More details in a future commit.
|
14900 |
29-Mar-1996 |
dg |
Revert to previous calculation of vm_object_cache_max: it simply works better in most real-world cases.
|
14882 |
28-Mar-1996 |
bde |
Undid last revision. It duplicated part of second last revision.
|
14879 |
28-Mar-1996 |
scrappy |
devfs_add_devsw() -> devfs_add_devswf modifications
Reviewed by: julian@freebsd.org
|
14866 |
28-Mar-1996 |
dyson |
Add a function prototype for pmap_prefault.
|
14865 |
28-Mar-1996 |
dyson |
VM performance improvements, and reorder some operations in VM fault in anticipation of a fix in pmap that will allow the mlock system call to work without panicing the system.
|
14864 |
28-Mar-1996 |
dyson |
More map_simplify fixes from Alan Cox. This very significanly improves the performance when the map has been chopped up. The map simplify operations really work now. Reviewed by: dyson Submitted by: Alan Cox <alc@cs.rice.edu>
|
14854 |
27-Mar-1996 |
bde |
Added drum device.
Submitted by: partly by "Marc G. Fournier" <scrappy@ki.net>
|
14693 |
19-Mar-1996 |
dyson |
Fix the problem that unmounting filesystems that are backed by a VMIO device have reference count problems. We mark the underlying object ono-persistent, and account for the reference count that the VM system maintainsfor the special device close. This should fix the removable device problem.
|
14638 |
16-Mar-1996 |
dg |
Force device mappings to always be shared. It doesn't make sense for them to ever be COW and we need the mappings to be shared for backward compatibilty.
Reviewed by: dyson
|
14610 |
13-Mar-1996 |
dyson |
This commit is as a result of a comment by Alan Cox (alc@cs.rice.edu) regarding the "real" problem with maps that we have been having over the last few weeks. He noted that the first_free pointer was left dangling in certain circumstances -- and he was right!!! This should fix the map problems that we were having, and also give us the advantage of being able to simplify maps more aggressively.
|
14589 |
12-Mar-1996 |
dyson |
Fix the map corruption problem that appears as a u_map allocation error.
|
14574 |
12-Mar-1996 |
dyson |
Allow mmap'ed devices to work correctly across forks. The sanest solution appeared to be to allow the child to maintain the same mapping as the parent.
|
14531 |
11-Mar-1996 |
hsu |
For Lite2: proc LIST changes. Reviewed by: davidg & bde
|
14432 |
09-Mar-1996 |
dyson |
Delay forking a process until there are more pages available. It was possible to deadlock with the low threshold that we had used.
|
14431 |
09-Mar-1996 |
dyson |
Modify a threshold for waking up the pageout daemon. Also, add a consistancy check for making sure that held pages aren't freed (DG).
|
14430 |
09-Mar-1996 |
dyson |
Add a missing initialization of the hold_count for device pager ficticiouse pages.
|
14429 |
09-Mar-1996 |
dyson |
Fix a calculation for a paging parameter.
|
14428 |
09-Mar-1996 |
dyson |
Fix two problems: The pmap_remove in vm_map_clean incorrectly unmapped the entire map entry. The new vm_map_simplify_entry code had an error (the offset of the combined map entry was not set correctly.) Submitted by: Alan Cox <alc@cs.rice.edu>
|
14427 |
09-Mar-1996 |
dyson |
Set the page valid bits in fewer places, as opposed to being scattered in various places.
|
14396 |
06-Mar-1996 |
dyson |
Fix a problem in the swap pager that caused some of the pages that were paged in under low swap space conditions to both loose their backing store and their dirty bits. This would cause pages to be demand zeroed under certain conditions in low VM space conditions and consequential sig-11's or sig-10's. This situation was made worse lately when the level for swap space reclaim threshold was increased.
|
14366 |
04-Mar-1996 |
dyson |
Fix a problem that pages in a mapped region were not always properly invalidated. Now we traverse the object shadow chain properly.
|
14364 |
03-Mar-1996 |
dyson |
In order to fix some concurrency problems with the swap pager early on in the FreeBSD development, I had made a global lock around the rlist code. This was bogus, and now the lock is maintained on a per resource list basis. This now allows the rlist code to be used for almost any non-interrupt level application.
|
14360 |
03-Mar-1996 |
peter |
Remove the #ifdef notyet from the prototype of vm_map_simplify. John re-enabled the function but missed the prototype, causing a warning.
|
14325 |
02-Mar-1996 |
peter |
Oops.. I nearly forgot the actual core of the length/rounding/etc fixes that Bruce asked for.
These still are not quite perfect, and in particular, it can get upset on extreme boundary cases (addr = 0xfff, len = 0xffffffff, which would end up mapping a single page rather than failing), but this is better code that I committed before.
(note, the VM system does not (apparently) support single mmap segment sizes above 0x80000000 anyway)
|
14316 |
02-Mar-1996 |
dyson |
1) Eliminate unnecessary bzero of UPAGES. 2) Eliminate unnecessary copying of pages during/after forks. 3) Add user map simplification.
|
14221 |
23-Feb-1996 |
peter |
kern_descrip.c: add fdshare()/fdcopy() kern_fork.c: add the tiny bit of code for rfork operation. kern/sysv_*: shmfork() takes one less arg, it was never used. sys/shm.h: drop "isvfork" arg from shmfork() prototype sys/param.h: declare rfork args.. (this is where OpenBSD put it..) sys/filedesc.h: protos for fdshare/fdcopy. vm/vm_mmap.c: add minherit code, add rounding to mmap() type args where it makes sense. vm/*: drop unused isvfork arg.
Note: this rfork() implementation copies the address space mappings, it does not connect the mappings together. ie: once the two processes have split, the pages may be shared, but the address space is not. If one does a mmap() etc, it does not appear in the other. This makes it not useful for pthreads, but it is useful in it's own right for having light-weight threads in a static shared address space.
Obtained from: Original by Ron Minnich, extended by OpenBSD
|
14178 |
22-Feb-1996 |
dg |
Add a "NO_SWAPPING" option to disable swapping. This was originally done to help diagnose a problem on wcarchive (where the kernel stack was sometimes not present), but is useful in its own right since swapping actually reduces performance on some systems (such as wcarchive). Note: swapping in this context means making the U pages pageable and has nothing to do with generic VM paging, which is unaffected by this option.
Reviewed by: <dyson>
|
14036 |
11-Feb-1996 |
dyson |
Fixed a really bogus problem with msync ripping pages away from objects before they were written. Also, don't allow processes without write access to remove pages from vm_objects.
|
13909 |
04-Feb-1996 |
dyson |
Changed vm_fault_quick in vm_machdep.c to be global. Needed for new pipe code.
|
13790 |
31-Jan-1996 |
dg |
"out of space" -> "out of swap space".
|
13788 |
31-Jan-1996 |
dg |
Improved killproc() log message and made it and the other similar message tolerant of p_ucred being invalid. Starting using killproc() where appropriate.
|
13786 |
31-Jan-1996 |
dg |
Print a more descriptive message when the mb_map is filled (out of mbuf clusters), and tell the operator what to do about it (increase maxusers).
|
13765 |
30-Jan-1996 |
mpp |
Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
|
13705 |
29-Jan-1996 |
dg |
Added a check/panic for vm_map_find failing to find space for the page tables/u-pages when forking. This is a "can't happen" case. :-)
|
13642 |
27-Jan-1996 |
bde |
Added a `boundary' arg to vm_alloc_page_contig(). Previously the only way to avoid crossing a 64K DMA boundary was to specify an alignment greater than the size even when the alignment didn't matter, and for sizes larger than a page, this reduced the chance of finding enough contiguous pages. E.g., allocations of 8K not crossing a 64K boundary previously had to be allocated on 8K boundaries; now they can be allocated on any 4K boundary except (64 * n + 60)K.
Fixed bugs in vm_alloc_page_contig(): - the last page wasn't allocated for sizes smaller than a page. - failures of kmem_alloc_pageable() weren't handled.
Mutated vm_page_alloc_contig() to create a more convenient interface named contigmalloc(). This is the same as the one in 1.1.5 except it has `low' and `high' args, and the `alignment' and `boundary' args are multipliers instead of masks.
|
13628 |
25-Jan-1996 |
phk |
Don't use %r, we havn't got it anymore. Submitted by: bde
|
13490 |
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
13228 |
04-Jan-1996 |
wollman |
Convert DDB to new-style option.
|
13226 |
04-Jan-1996 |
wollman |
Convert SYSV IPC to new-style options. (I hope I got everything...) The LKMs will need an extra file, to come later.
|
13223 |
04-Jan-1996 |
dg |
Increased vm_object_cache_max by about 50% to yield better utilization of memory when lots of small files are cached.
Reviewed by: dyson
|
13122 |
30-Dec-1995 |
peter |
recording cvs-1.6 file death
|
12954 |
21-Dec-1995 |
julian |
i386/i386/conf.c is no longer needed.. remove it from files.i386 redistribute a few last routines to beter places and shoot the file
I haven't act actually 'deleted' the file yet togive people time to have done a config.. I.e. they are likely to have done one in a week or so so I'll remove it then.. it's now empty. makes the question of a USL copyright rather moot.
|
12914 |
17-Dec-1995 |
dyson |
Fix paging from ext2fs (and other fs w/block size < PAGE_SIZE). This should fix kern/900.
|
12905 |
17-Dec-1995 |
bde |
Cleaned up prototypes in pmap headers: removed ones for nonexistent functions; moved misplaced ones; restored most of KNFish formatting from 4.4lite version; removed bogus __BEGIN/END_DECLS.
|
12904 |
17-Dec-1995 |
bde |
Fixed 1TB filesize changes. Some pindexes had bogus names and types but worked because vm_pindex_t is indistinuishable from vm_offset_t.
|
12820 |
14-Dec-1995 |
phk |
Another mega commit to staticize things.
|
12819 |
14-Dec-1995 |
phk |
A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
|
12813 |
13-Dec-1995 |
julian |
devsw tables are now arrays of POINTERS to struct [cb]devsw seems to work hre just fine though I can't check every file that changed due to limmited h/w, however I've checked enught to be petty happy withe hte code..
WARNING... struct lkm[mumble] has changed so it might be an idea to recompile any lkm related programs
|
12808 |
13-Dec-1995 |
dyson |
There was a bug that the size for an msync'ed region was not rounded up. The effect of this was that msync with a size would generally sync 1 page less than it should. This problem was brought to my attention by Darrel Herbst <dherbst@gradin.cis.upenn.edu> and Ron Minnich <rminnich@sarnoff.com>.
|
12779 |
11-Dec-1995 |
dyson |
Some new anti-deadlock code ended up messing up the paging stats. A modified version of the code is now in place, and gausspage performance is back up to where it should be.
|
12778 |
11-Dec-1995 |
dyson |
Some DIAGNOSTIC code was enabled all of the time in error. The diagnostic code is now conditional on #ifdef DIAGNOSTIC again.
|
12767 |
11-Dec-1995 |
dyson |
Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
|
12737 |
10-Dec-1995 |
bde |
Replaced nxdump by nodump (if the dump function gets called, then the device must be configured, so ENXIO is a bogus errno).
Replaced zerosize by nopsize. zerosize was a temporary alias.
|
12726 |
10-Dec-1995 |
bde |
Restored used includes of <vm/vm_extern.h>.
|
12710 |
10-Dec-1995 |
bde |
Moved the declaration of boolean_t from <vm/vm_param.h> to <sys/types.h> (if KERNEL is defined). This allows removing bogus dependencies on vm stuff in several places (e.g., ddb) and stops <vm_param.h> from depending on <vm_param.h>
Added declaration of boolean_t to <vm/vm.h> (if KERNEL is not defined). It never belonged in <vm/vm_param.h>. Unfortunately, it is required for some vm headers that are included by applications.
Deleted declarations of TRUE and FALSE from <vm/vm_param.h>. They are defined in <sys/param.h> if KERNEL is defined and we'll soon find out if any applications depend on them being defined in a vm header.
|
12678 |
08-Dec-1995 |
phk |
Julian forgot to make the *devsw structures static.
|
12675 |
08-Dec-1995 |
julian |
Pass 3 of the great devsw changes most devsw referenced functions are now static, as they are in the same file as their devsw structure. I've also added DEVFS support for nearly every device in the system, however many of the devices have 'incorrect' names under DEVFS because I couldn't quickly work out the correct naming conventions. (but devfs won't be coming on line for a month or so anyhow so that doesn't matter)
If you "OWN" a device which would normally have an entry in /dev then search for the devfs_add_devsw() entries and munge to make them right.. check out similar devices to see what I might have done in them in you can't see what's going on.. for a laugh compare conf.c conf.h defore and after... :) I have not doen DEVFS entries for any DISKSLICE devices yet as that will be a much more complicated job.. (pass 5 :)
pass 4 will be to make the devsw tables of type (cdevsw * ) rather than (cdevsw) seems to work here.. complaints to the usual places.. :)
|
12662 |
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
12642 |
05-Dec-1995 |
bde |
Moved the declaration of vm_object_t from <vm/vm.h> to <sys/types.h> (if KERNEL is defined). This allows removing the #includes of vm stuff in vnode_if.h, which will speed up the compilation of LINT by about 5%.
|
12623 |
04-Dec-1995 |
phk |
A major sweep over the sysctl stuff.
Move a lot of variables home to their own code (In good time before xmas :-)
Introduce the string descrition of format.
Add a couple more functions to poke into these marvels, while I try to decide what the correct interface should look like.
Next is adding vars on the fly, and sysctl looking at them too.
Removed a tine bit of defunct and #ifdefed notused code in swapgeneric.
|
12610 |
03-Dec-1995 |
bde |
Fixed the type mismatch in check for the bogus mmap function `nullop'. The test should never succeed and should go away. Temporarily print a warning if it does succeed.
|
12591 |
03-Dec-1995 |
bde |
Completed function declarations and/or added prototypes.
Staticized some functions.
__purified some functions. Some functions were bogusly declared as returning `const'. This hasn't done anything since gcc-2.5. For later versions of gcc, the equivalent is __attribute__((const)) at the end of function declarations.
|
12569 |
02-Dec-1995 |
bde |
Finished (?) cleaning up sysinit stuff.
|
12521 |
29-Nov-1995 |
julian |
If you're going to mechanically replicate something in 50 files it's best to not have a (compiles cleanly) typo in it! (sigh)
|
12517 |
29-Nov-1995 |
julian |
OK, that's it.. That's EVERY SINGLE driver that has an entry in conf.c.. my next trick will be to define cdevsw[] and bdevsw[] as empty arrays and remove all those DAMNED defines as well..
Each of these drivers has a SYSINIT linker set entry that comes in very early.. and asks teh driver to add it's own entry to the two devsw[] tables.
some slight reworking of the commits from yesterday (added the SYSINIT stuff and some usually wrong but token DEVFS entries to all these devices.
BTW does anyone know where the 'ata' entries in conf.c actually reside? seems we don't actually have a 'ataopen() etc...
If you want to add a new device in conf.c please make sure I know so I can keep it up to date too..
as before, this is all dependent on #if defined(JREMOD) (and #ifdef DEVFS in parts)
|
12453 |
21-Nov-1995 |
bde |
Completed function declarations and/or added prototypes.
|
12423 |
20-Nov-1995 |
phk |
Remove unused vars & funcs, make things static, protoize a little bit.
|
12325 |
16-Nov-1995 |
bde |
Fixed recent staticizations. Some protypes for static functions were left in headers and not staticized.
|
12300 |
14-Nov-1995 |
phk |
staticize.
|
12286 |
14-Nov-1995 |
phk |
Move all the VM sysctl stuff home where it belongs.
|
12259 |
13-Nov-1995 |
dg |
Fixed up a comment and removed some #if 0'd code.
|
12226 |
12-Nov-1995 |
dg |
Moved vm_map_lock call to inside the splhigh protection in vm_map_find(). This closes a probably rare but nonetheless real window that would result in a process hanging or the system panicing.
Reviewed by: dyson, davidg Submitted by: kato@eclogite.eps.nagoya-u.ac.jp (KATO Takenori)
|
12221 |
12-Nov-1995 |
bde |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls.
Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
12206 |
11-Nov-1995 |
bde |
Fixed type of obreak(). The args struct member name conflicted with the (better) machine generated one in <sys/sysproto.h>.
|
12128 |
06-Nov-1995 |
dg |
Initialize lock struct entries explicitly rather than calling bzero().
|
12118 |
06-Nov-1995 |
bde |
Replaced bogus macros for dummy devswitch entries by functions. These functions went away:
enosys (hasn't been used for some time) enxio enodev enoioctl (was used only once, actually for a vop)
if_tun.c: Continued cleaning up...
conf.h: Probably fixed the type of d_reset_t. It is hard to tell the correct type because there are no non-dummy device reset functions.
Removed last vestige of ambiguous sleep message strings.
|
12110 |
05-Nov-1995 |
dyson |
Greatly simplify the msync code. Eliminate complications in vm_pageout for msyncing. Remove a bug that manifests itself primarily on NFS (the dirty range on the buffers is not set on msync.)
|
12006 |
02-Nov-1995 |
dg |
Move page fixups (pmap_clear_modify, etc) that happen after paging input completes out of vm_fault and into the pagers. This get rid of some redundancy and improves the architecture.
Reviewed by: John Dyson <dyson>
|
11943 |
30-Oct-1995 |
bde |
Don't pass an extra trailing arg to some functions.
Added the prototypes that found this bug.
|
11709 |
23-Oct-1995 |
dyson |
Get rid of machine-dependent NBPG and replace with PAGE_SIZE.
|
11708 |
23-Oct-1995 |
dyson |
Remove of now unused PG_COPYONWRITE.
|
11705 |
23-Oct-1995 |
dyson |
First phase of removing the PG_COPYONWRITE flag, and an architectural cleanup of mapping files.
|
11701 |
23-Oct-1995 |
dyson |
Finalize GETPAGES layering scheme. Move the device GETPAGES interface into specfs code. No need at this point to modify the PUTPAGES stuff except in the layered-type (NULL/UNION) filesystems.
|
11621 |
21-Oct-1995 |
dyson |
Implement mincore system call.
|
11576 |
19-Oct-1995 |
dg |
Fix initialization of "bsize" in vnode_pager_haspage(). It must happen after the check for the mount point still existing or else the system will panic if someone forcibly unmounted the filesystem.
|
11526 |
16-Oct-1995 |
dyson |
Remove an unnecessary tsleep in the swapin code. This tsleep can defer swapping in processes and is just not the right thing to do.
|
11317 |
07-Oct-1995 |
dg |
Fix argument passing to the "freeer" routine. Added some prototypes. (bde) Moved extern declaration of swap_pager_full into swap_pager.h and out of the various files that reference it. (davidg)
Submitted by: bde & davidg
|
11260 |
06-Oct-1995 |
phk |
Avoid a 64bit divide.
|
11194 |
05-Oct-1995 |
bde |
Fix pollution of application namespace by declarations of kernel functions. The application header <sys/user.h> includes <vm/vm.h> which includes <vm/lock.h>...
vm.h: Don't include <machine/cpufunc.h>. It is already included by <sys/systm.h> in the kernel and isn't designed to be included by applications (the 2.1 version causes a syntax error in C++ and the current version has initializers that are invalid in strict C++).
lock.h: Only declare kernel functions if KERNEL is defined.
|
10989 |
24-Sep-1995 |
dyson |
Perform more checking for proper loading of the UPAGES when a process is swapped in. Also, remove unnecessary map locking/unlocking during selection of processes to be swapped out.
This code might afford proper panics as opposed to spontaneous reboots on certain systems. This should allow us to debug these problems better.
|
10988 |
24-Sep-1995 |
dyson |
Significantly simplify the fault clustering code. After some analysis by David Greenman, it has been determined that the more sophisticated code only made a very minor difference in fault performance. Therefore, this code eliminates some of the complication of the fault code, decreasing the amount of CPU used to scan shadow chains.
|
10984 |
24-Sep-1995 |
dg |
Check that the swap block is valid before including it in a cluster.
Submitted by: John Dyson
|
10835 |
17-Sep-1995 |
dg |
Check the return value from vm_map_pageable() when mapping the process's UPAGES and associated page table page. Panic on error. This is less than optimial and will be fixed in the future, but is better than the old behavior of panicing with a "kernel page directory invalid" in pmap_enter.
|
10728 |
14-Sep-1995 |
dyson |
Fixed a typo in vm_fault_additional_pages.
|
10702 |
12-Sep-1995 |
dyson |
Fix really bogus casting of a block number to a long. Also change the comparison from a "< 0" to "== -1" like it should be.
|
10670 |
11-Sep-1995 |
dyson |
Make sure that the prezero flag is cleared when needed.
|
10669 |
11-Sep-1995 |
dyson |
Fix an error that can cause attempted reading beyond the end of file.
|
10668 |
11-Sep-1995 |
dyson |
Code cleanup and minor performance improvement in the faultin cluster code.
|
10653 |
09-Sep-1995 |
dg |
Fixed init functions argument type - caddr_t -> void *. Fixed a couple of compiler warnings.
|
10579 |
06-Sep-1995 |
dyson |
Fixed a sign reversal problem -- might have cause some Sig-11s that people have been seeing.
|
10576 |
06-Sep-1995 |
dyson |
Minor performance improvements, additional prototype for additional exported symbol.
|
10556 |
04-Sep-1995 |
dyson |
Allow the fault code to use additional clustering info from both bmap and the swap pager. Improved fault clustering performance.
|
10551 |
04-Sep-1995 |
dyson |
Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count for VOP_BMAP. Updated affected filesystems...
|
10548 |
03-Sep-1995 |
dyson |
Machine independent changes to support pre-zeroed free pages. This significantly improves demand-zero performance.
|
10544 |
03-Sep-1995 |
dyson |
Added prototype for new routine "vm_page_set_validclean" and initial declarations for the prezeroed pages mechanism.
|
10542 |
03-Sep-1995 |
dyson |
New subroutine "vm_page_set_validclean" for a vfs_bio improvement.
|
10358 |
28-Aug-1995 |
julian |
Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular..
NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases..
certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task)
The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.
|
10345 |
26-Aug-1995 |
bde |
Change vm_object_print() to have the correct number and type of args for a ddb command.
|
10344 |
26-Aug-1995 |
bde |
Change vm_map_print() to have the correct number and type of args for a ddb command.
|
10080 |
16-Aug-1995 |
bde |
Make everything except the unsupported network sources compile cleanly with -Wnested-externs.
|
9759 |
29-Jul-1995 |
bde |
Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
|
9582 |
20-Jul-1995 |
dg |
#if 0'd one of the DIAGNOSTIC checks in vm_page_alloc(). It was too expensive for "normal" use.
|
9548 |
16-Jul-1995 |
dg |
1) Merged swpager structure into vm_object. 2) Changed swap_pager internal interfaces to cope w/#1. 3) Eliminated object->copy as we no longer have copy objects. 4) Minor stylistic changes.
|
9514 |
13-Jul-1995 |
dg |
Added a copyright to this file.
|
9513 |
13-Jul-1995 |
dg |
Oops, forgot to add the "default" pager files...
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
9507 |
13-Jul-1995 |
dg |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
9468 |
10-Jul-1995 |
dg |
swapout_threads() -> swapout_procs().
|
9467 |
10-Jul-1995 |
dg |
Increased global RSS limit to total RAM.
|
9456 |
09-Jul-1995 |
dg |
Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places that call vnode_pager_alloc() so that a failure return can be dealt with. This fixes a panic seen on NFS clients when a file being opened is deleted on the server before the open completes.
|
9411 |
06-Jul-1995 |
dg |
Fixed an object allocation race condition that was causing a "object deallocated too many times" panic when using NFS.
Reviewed by: John Dyson
|
9356 |
28-Jun-1995 |
dg |
1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
|
9202 |
11-Jun-1995 |
rgrimes |
Merge RELENG_2_0_5 into HEAD
|
8876 |
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
8743 |
25-May-1995 |
dg |
Removed check for sw_dev == NODEV; this is a normal condition for swap over NFS and was gratuitously panicing when it happens.
Reviewed by: John Dyson Submitted by: Pierre Beyssac via Poul-Henning Kamp
|
8692 |
21-May-1995 |
dg |
Changes to fix the following bugs:
1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping.
Reviewed by: David Greenman Submitted by: John Dyson
|
8624 |
19-May-1995 |
dg |
NFS diskless operation was broken because swapdev_vp wasn't initialized. These changes solve the problem in a general way by moving the initialization out of the individual fs_mountroot's and into swaponvp().
Submitted by: Poul-Henning Kamp
|
8588 |
18-May-1995 |
dg |
Fixed a bug that managed to slip in during Poul's dynamic swap partition changes. The check for nswap was bogus, but the code was so convoluted that it was difficult to tell. It's better now. :-)
Reviewed by: David Greenman (extensively), and John Dyson Submitted by: Poul-Henning Kamp, w/tweaks by me.
|
8585 |
18-May-1995 |
dg |
Accessing pages beyond the end of a mapped file results in internal inconsistencies in the VM system that eventually lead to a panic. These changes fix the behavior to conform to the behavior in SunOS, which is to deny faults to pages beyond the EOF (returning SIGBUS). Internally, this is implemented by requiring faults to be within the object size boundaries. These changes exposed another bug, namely that passing in an offset to mmap when trying to map an unnamed anonymous region also results in internal inconsistencies. In this case, the offset is forced to zero.
Reviewed by: John Dyson and others
|
8504 |
14-May-1995 |
dg |
Changed swap partition handling/allocation so that it doesn't require specific partitions be mentioned in the kernel config file ("swap on foo" is now obsolete).
From Poul-Henning:
The visible effect is this:
As default, unless options "NSWAPDEV=23" is in your config, you will have four swap-devices. You can swapon(2) any block device you feel like, it doesn't have to be in the kernel config.
There is a performance/resource win available by getting the NSWAPDEV right (but only if you have just one swap-device ??), but using that as default would be too restrictive.
The invisible effect is that:
Swap-handling disappears from the $arch part of the kernel. It gets a lot simpler (-145 lines) and cleaner.
Reviewed by: John Dyson, David Greenman Submitted by: Poul-Henning Kamp, with minor changes by me.
|
8464 |
12-May-1995 |
phk |
I'm about to jump on the swap-initialization, and having talked with davidg about it, I hereby kill two undocumented misfeatures: The code to skip a miniroot in the swapdev is not particular useful, and if we need it we need it to be done properly, ie size the fs and skip all of it not some hardcoded size, and subtract what we skip from the length in the first place. The SEQSWAP dies too. It's not the way to do it, it doesn't work, and nobody have expressed any great desire for it to work. The way to implement it correctly would be a second argument to swapon(2) to give a priority/policy information. Low priority swapdevs can be made so by adding them at a far offset (0x80000000 kind of thing), with almost no modification to the strategy routine (in particular a offset per swapdev). But until the need is obvious, it will not be done.
|
8416 |
10-May-1995 |
dg |
Changed "handle" from type caddr_t to void *; "handle" is several different types of pointers, and "char *" is a bad choice for the type.
|
8319 |
07-May-1995 |
dyson |
Another error in the correction for trimming swap allocation for small objects. (This code needs to be revisited.)
|
8315 |
07-May-1995 |
dyson |
Fixed a calculation that would once-in-a-while cause the swap_pager to emit spurious page outside of object type messages. It is not a fatal condition anyway, so the message will be omitted for release. Also, the code that "clips" the allocation size, associated with the above problem, was fixed.
|
8216 |
02-May-1995 |
dg |
Changed object hash list to be a list rather than a tailq. This saves space for the hash list buckets and is a little faster. The features of tailq aren't needed. Increased the size of the object hash table to improve performance. In the future, this will be changed so that the table is sized dynamically.
|
8059 |
25-Apr-1995 |
dg |
Fixed a "bswbuf" hang caused by the wakeup in relpbuf() waking up the wrong thing.
|
8010 |
23-Apr-1995 |
bde |
inline -> __inline.
Headers should always use `__inline' for inline functions to avoid syntax errors when modules that don't even use the offending functions are compiled with `gcc -ansi'.
|
7968 |
21-Apr-1995 |
dyson |
Fixed a problem in _vm_object_page_clean that could cause an infinite loop.
|
7935 |
19-Apr-1995 |
dg |
New flag: B_PAGING. Added as part of the vn driver hack.
|
7904 |
17-Apr-1995 |
dg |
Fixed a logic bug that caused the vmdaemon to not wake up when intended.
Submitted by: John Dyson
|
7888 |
16-Apr-1995 |
dg |
Removed obsolete/unused variable declarations. Killed externs and included appropriate include files.
|
7887 |
16-Apr-1995 |
dg |
Removed obsolete/unused variable declarations. Removed some extern declarations and included the correct include files.
|
7883 |
16-Apr-1995 |
dg |
Moved some zero-initialized variables into .bss. Made code intended to be called only from DDB #ifdef DDB. Removed some completely unused globals.
|
7879 |
16-Apr-1995 |
dg |
Removed gratuitous m->blah=0 assignments when initializing the vm_page structs in vm_page_startup(). The vm_page structs are already completely zeroed.
|
7873 |
16-Apr-1995 |
dg |
Make "print_page_info" #ifdef DDB.
|
7870 |
16-Apr-1995 |
dg |
Fixed a few bugs in vm_object_page_clean, mostly related to not syncing pages that are in FS buffers. This fixes the (believed to already have been fixed) problem with msync() not doing it's job...in other words, the stuff that Andrew has continuously been complaining about.
Submitted by: John Dyson, w/minor changes by me.
|
7695 |
09-Apr-1995 |
dg |
Changes from John Dyson and myself:
Fixed remaining known bugs in the buffer IO and VM system.
vfs_bio.c: Fixed some race conditions and locking bugs. Improved performance by removing some (now) unnecessary code and fixing some broken logic. Fixed process accounting of # of FS outputs. Properly handle NFS interrupts (B_EINTR).
(various) Replaced calls to clrbuf() with calls to an optimized routine called vfs_bio_clrbuf().
(various FS sync) Sync out modified vnode_pager backed pages.
ffs_vnops.c: Do two passes: Sync out file data first, then indirect blocks.
vm_fault.c: Fixed deadly embrace caused by acquiring locks in the wrong order.
vnode_pager.c: Changed to use buffer I/O system for writing out modified pages. This should fix the problem with the modification date previous not getting updated. Also dramatically simplifies the code. Note that this is going to change in the future and be implemented via VOP_PUTPAGES().
vm_object.c: Fixed a pile of bugs related to cleaning (vnode) objects. The performance of vm_object_page_clean() is terrible when dealing with huge objects, but this will change when we implement a binary tree to keep the object pages sorted.
vm_pageout.c: Fixed broken clustering of pageouts. Fixed race conditions and other lockup style bugs in the scanning of pages. Improved performance.
|
7430 |
28-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) that I didn't notice when I fixed "all" such warnings before.
|
7427 |
28-Mar-1995 |
dg |
Fixed typo...using wrong variable in page_shortage calculation.
|
7424 |
28-Mar-1995 |
dg |
Fixed "pages freed by daemon" statistic (again).
|
7411 |
27-Mar-1995 |
dg |
Explicitly set page dirty if this is a write fault - reduces calls to pmap_is_modified() later.
|
7400 |
26-Mar-1995 |
dg |
Removed some obsolete flags.
Submitted by: John Dyson
|
7366 |
25-Mar-1995 |
dg |
Fix logic bug I just introduced with the flags to msync().
|
7365 |
25-Mar-1995 |
dg |
Pass syncio flag to vm_object_clean(). It remains unimplemented, however.
|
7364 |
25-Mar-1995 |
dg |
Disallow both MS_ASYNC and MS_INVALIDATE flags being set at the same time in msync().
|
7360 |
25-Mar-1995 |
dg |
Added "flags" argument to msync, and implemented MS_ASYNC and MS_INVALIDATE. The MS_ASYNC flag doesn't current work, and MS_INVALIDATE will only toss out the pages in the address space (not all pages in the shadow chain).
|
7352 |
25-Mar-1995 |
dg |
Implemented cnt.v_reactivated and moved vm_page_activate() routine to before vm_page_deactivate().
|
7350 |
25-Mar-1995 |
dg |
Removed (almost) meaningless "object cache lookups/hits" statistic. In our framework, these numbers will usually be nearly the same, and not because of any sort of high 'hit rate'.
|
7346 |
25-Mar-1995 |
dg |
Removed cnt.v_nzfod: In our current scheme of things it is not possible to accurately track this. It isn't an indicator of resource consumption anyway. Removed cnt.v_kernel_pages: We don't implement this and doing so accurately would be very difficult (and ambiguous - since process pages are often double mapped in the kernel and the process address spaces).
|
7263 |
23-Mar-1995 |
dg |
Fixed warning caused by returning a value in a void function (introduced in a recent commit by me). Relaxed checks before calling vm_object_remove; a non-internal object always has a pager.
|
7246 |
22-Mar-1995 |
dg |
Removed unused fifth argument to vm_object_page_clean(). Fixed bug with VTEXT not always getting cleared when it is supposed to. Added check to make sure that vm_object_remove() isn't called with a NULL pager or for a pager for an OBJ_INTERNAL object (neither of which will be on the hash list). Clear OBJ_CANPERSIST if we decide to terminate it because of no resident pages.
|
7243 |
22-Mar-1995 |
dg |
Fixed potential sleep/wakeup race conditional with splhigh().
Submitted by: John Dyson
|
7240 |
22-Mar-1995 |
dg |
Added a check for wrong object size; print a warning, but deal with it correctly. The warning will tell us that there is a bug somewhere else in sizing the object correctly.
Submitted by: John Dyson
|
7239 |
22-Mar-1995 |
dg |
Fixed bug in vm_mmap() where the object that is created in some cases was the wrong size. This is the likely cause of panics reported by Lars Fredriksen and Paul Richards related to a -1 blkno when paging via the swap_pager.
Submitted by: John Dyson
|
7236 |
21-Mar-1995 |
dg |
Removed unused variable declaration missed in previous commit.
|
7235 |
21-Mar-1995 |
dg |
Removed do-nothing VOP_UPDATE() call.
|
7215 |
21-Mar-1995 |
dg |
Disallow non page-aligned file offsets in vm_mmap(). We don't support this in either the high or low level parts of the VM system. Just return EINVAL in this case, just like SunOS does.
|
7209 |
21-Mar-1995 |
dg |
Fixed bug in the size == 0 case of msync() caused by a bogus return value check..
|
7204 |
21-Mar-1995 |
dg |
Added a new boolean argument to vm_object_page_clean that causes it to only toss out clean pages if TRUE.
|
7187 |
20-Mar-1995 |
dg |
Don't gain/lose an object reference in vnode_pager_setsize(). It will cause vnode locking problems in vm_object_terminate(). Implement proper vnode locking in vm_object_terminate().
|
7185 |
20-Mar-1995 |
dg |
Fixed "objde1" hang. It was caused by a "&" where an "&&" belonged in the expression that decides if a wakeup should occur.
|
7180 |
20-Mar-1995 |
dg |
Removed an unnecessary call to vinvalbuf after the page clean.
|
7178 |
19-Mar-1995 |
dg |
Do proper vnode locking when doing paging I/O. Removed the asynchronous paging capability to facilitate this (we saw little or no measureable improvement with this anyway).
Submitted by: John Dyson
|
7170 |
19-Mar-1995 |
dg |
Removed redundant newlines that were in some panic strings.
|
7162 |
19-Mar-1995 |
dg |
Incorporated 4.4-lite vnode_pager_uncache() and vnode_pager_umount() routines (and merged local changes). The changed vnode_pager_uncache gets rids of the bogosity that you can call the routine without having the vnode locked. The changed vnode_pager_umount properly locks the vnode before calling vnode_pager_uncache.
|
7120 |
18-Mar-1995 |
dg |
In vm_page_alloc_contig: Removed a redundant semicolon and used 'm' instead of &pga[i] in one place.
|
7090 |
16-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
7066 |
15-Mar-1995 |
dg |
Special cased the handling of mb_map in the M_WAITOK case. kmem_malloc() now returns NULL and sets a global 'mb_map_full' when the map is full. m_clalloc() has further been taught to expect this and do the right thing. This should fix the "mb_map full" panics that several people have reported.
|
7029 |
12-Mar-1995 |
bde |
Move a kernel inline function inside `#ifdef KERNEL' so that including <vm/vm.h> doesn't cause warnings about nonexistent functions called by the inline function. Clean up the formatting of the function.
|
7017 |
12-Mar-1995 |
dg |
Fixed obsolete comment.
|
7016 |
12-Mar-1995 |
dg |
Deleted vm_object_setpager().
|
7015 |
12-Mar-1995 |
dg |
Deleted vm_object_setpager().
|
7014 |
12-Mar-1995 |
dg |
Explicitly set object->flags = OBJ_CANPERSIST.
|
7008 |
11-Mar-1995 |
dg |
Fix completely bogus comment.
|
7007 |
11-Mar-1995 |
dg |
Clear OBJ_INTERNAL flag for device pager objects and named anonymous objects.
|
6947 |
07-Mar-1995 |
dg |
Set VAGE flag when pager is destroyed. This usually happens when an object has fallen off the end of the cached list - this is likely the last reference to the vnode and it should be reused before non file vnodes that are already on the free list (VDIR mostly).
|
6944 |
07-Mar-1995 |
dg |
Fixed object reference count problem that occurred in the MAP_PRIVATE case after we rewrote vm_mmap(). Added some comments to make it easier to follow the reference counts.
|
6943 |
07-Mar-1995 |
dg |
Don't attempt to reverse collapse non OBJ_INTERNAL objects.
|
6897 |
04-Mar-1995 |
jkh |
Remove a gratutious cast.
|
6816 |
01-Mar-1995 |
dg |
Various changes from John and myself that do the following:
New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that are used to reduce the actual number of wakeups. New function vm_page_protect which is used in conjuction with some new page flags to reduce the number of calls to pmap_page_protect. Minor changes to reduce unnecessary spl nesting. Rewrote vm_page_alloc() to improve readability. Various other mostly cosmetic changes.
|
6806 |
01-Mar-1995 |
dg |
Slight change to include file order to accommodate upcoming changes.
|
6709 |
25-Feb-1995 |
bde |
Don't use __P(()) in a function definition.
|
6703 |
25-Feb-1995 |
dg |
Fixed severely broken printf (arguments out of order, no newline).
|
6673 |
23-Feb-1995 |
dg |
Removed redundant HOLDRELE()'s.
|
6626 |
22-Feb-1995 |
dg |
Changed return value from vnode_pager_addr to be in DEV_BSIZE units so that 9 bits aren't lost in the conversion. Changed all callers to expect this. This allows paging on large (>2GB) filesystems.
Submitted by: John Dyson
|
6625 |
22-Feb-1995 |
dg |
vm_page.c: Use request==VM_ALLOC_NORMAL rather than object!=kmem_object in deciding if the caller is "important" in vm_page_alloc(). Also established a new low threshold for non-interrupt allocations via cnt.v_interrupt_free_min.
vm_pageout.c: Various algorithmic cleanup. Some calculations simplified. Initialize cnt.v_interrupt_free_min to 2 pages.
Submitted by: John Dyson
|
6624 |
22-Feb-1995 |
dg |
Just return in the case of a page not on any queue in vm_page_unqueue(). Return VM_PAGE_BITS_ALL even if size > PAGE_SIZE in vm_page_bits().
Submitted by: John Dyson
|
6623 |
22-Feb-1995 |
dg |
Removed object locking code (it was a left over from an abortion that was done a month or so ago).
Submitted by: John Dyson
|
6622 |
22-Feb-1995 |
dg |
Removed bogus copy object collapse check (the idea is right, but the spcific check was bogus). Removed old copy of vm_object_page_clean and took out the #if 1 around the remaining one.
Submitted by: John Dyson
|
6618 |
22-Feb-1995 |
dg |
Only do object paging_in_progress wakeups if someone is waiting on this condition.
Submitted by: John Dyson
|
6617 |
22-Feb-1995 |
dg |
Rewrote MAP_PRIVATE case of vm_mmap() - all of the COW portion of this routine was highly convoluted.
Submitted by: John Dyson
|
6601 |
21-Feb-1995 |
dg |
Panic if u_map allocation fails.
|
6587 |
21-Feb-1995 |
dg |
vm_extern.h: removed vm_allocate_with_pager. Removed vm_user.c...it's now completely deprecated.
|
6585 |
21-Feb-1995 |
dg |
Deprecated remaining use of vm_deallocate. Deprecated vm_allocate_with_ pager(). Almost completely rewrote vm_mmap(); when John gets done with the bottom half, it will be a complete rewrite. Deprecated most use of vm_object_setpager(). Removed side effect of setting object persist in vm_object_enter and moved this into the pager(s). A few other cosmetic changes.
|
6584 |
21-Feb-1995 |
dg |
Set page alloced for map entries as valid.
|
6582 |
20-Feb-1995 |
dg |
Removed vm_allocate(), vm_deallocate(), and vm_protect() functions. The only function remaining in this file is vm_allocate_with_pager(), and this will be going RSN. The file will be removed when this happens.
|
6580 |
20-Feb-1995 |
dg |
Moved ACT_MAX, ACT_ADVANCE, and ACT_DECLINE to vm_page.h.
|
6573 |
20-Feb-1995 |
dg |
vm_inherit function has been deprecated.
|
6572 |
20-Feb-1995 |
dg |
Stop using vm_allocate and vm_deallocate.
|
6571 |
20-Feb-1995 |
dg |
VM for the kernel stack and page tables doesn't need to be explicitly deallocated as it isn't inherited across the fork. Use vm_map_find not vm_allocate.
Submitted by: John Dyson
|
6567 |
20-Feb-1995 |
dg |
Panic if object is deallocated too many times. Slight change to reverse collapsing so that vm_object_deallocate doesn't have to be called recursively. Removed half of a previous fix - the renamed page during a collapse doesn't need to be marked dirty because the pager backing store pointers are copied - thus preserving the page's data. This assumes that pages without backing store are always dirty (except perhaps for when they are first zeroed, but this doesn't matter). Switch order of two lines of code so that the correct pager is removed from the hash list. The previous code bogusly passed a NULL pointer to vm_object_remove(). The call to vm_object_remove() should be unnecessary if named anonymous objects were being dealt with correctly. They are currently marked as OBJ_INTERNAL, which really screws up things (such as this).
|
6566 |
20-Feb-1995 |
dg |
Don't allow act_count to exceed ACT_MAX when bumping it up. Small optimization to vm_page_bits().
Submitted by: John Dyson
|
6565 |
20-Feb-1995 |
dg |
Fully initialize pages returned via vm_page_alloc_contig() so that the memory can be later freed.
|
6541 |
18-Feb-1995 |
dg |
1) Added protection against collapsing OBJ_DEAD objects. 2) bump reference counts by 2 instead of 1 so that an object deallocate doesn't try to recursively collapse the object. 3) mark pages renamed during the collapse as dirty so that their contents are preserved.
Submitted by: John and me.
|
6435 |
15-Feb-1995 |
dg |
Don't bother calling pmap_create() when creating the temporary map. The whole COW section of vm_mmap() should be rewritten; the current implementation is highly convoluted.
|
6357 |
14-Feb-1995 |
phk |
YF fix.
|
6356 |
14-Feb-1995 |
phk |
YF Fix.
|
6351 |
14-Feb-1995 |
dg |
Fixed problem with msync causing a panic.
Submitted by: John Dyson
|
6326 |
12-Feb-1995 |
dg |
Carefully choose the value for vm_object_cache_max. The previous calculation was rather bogus in most cases; the new value works very well for both large and small memory machines.
|
6278 |
09-Feb-1995 |
dg |
Killed MACHVMCOMPAT function prototypes as the functions don't exist.
|
6277 |
09-Feb-1995 |
dg |
Killed MACHVMCOMPAT code. It doesn't compile, and in its present state would require some work to make it not a serious security problem. It's non-standard and not very useful anyway.
|
6258 |
09-Feb-1995 |
dg |
Minor algorithmic adjustments that reduce the CPU consumption of the pagedaemon in half while not reducing its effectiveness.
Submitted by: me & John
|
6151 |
03-Feb-1995 |
dg |
Fixed bmap run-length brokeness. Use bmap run-length extension when doing clustered paging.
Submitted by: John Dyson
|
6129 |
02-Feb-1995 |
dg |
swap_pager.c: Fixed long standing bug in freeing swap space during object collapses. Fixed 'out of space' messages from printing out too often. Modified to use new kmem_malloc() calling convention. Implemented an additional stat in the swap pager struct to count the amount of space allocated to that pager. This may be removed at some point in the future. Minimized unnecessary wakeups.
vm_fault.c: Don't try to collect fault stats on 'swapped' processes - there aren't any upages to store the stats in. Changed read-ahead policy (again!).
vm_glue.c: Be sure to gain a reference to the process's map before swapping. Be sure to lose it when done.
kern_malloc.c: Added the ability to specify if allocations are at interrupt time or are 'safe'; this affects what types of pages can be allocated.
vm_map.c: Fixed a variety of map lock problems; there's still a lurking bug that will eventually bite.
vm_object.c: Explicitly initialize the object fields rather than bzeroing the struct. Eliminated the 'rcollapse' code and folded it's functionality into the "real" collapse routine. Moved an object_unlock() so that the backing_object is protected in the qcollapse routine. Make sure nobody fools with the backing_object when we're destroying it. Added some diagnostic code which can be called from the debugger that looks through all the internal objects and makes certain that they all belong to someone.
vm_page.c: Fixed a rather serious logic bug that would result in random system crashes. Changed pagedaemon wakeup policy (again!).
vm_pageout.c: Removed unnecessary page rotations on the inactive queue. Changed the number of pages to explicitly free to just free_reserved level.
Submitted by: John Dyson
|
5973 |
28-Jan-1995 |
dg |
Completed the fix for attempting to page out pages via the device_pager.
Submitted by: John Dyson
|
5915 |
26-Jan-1995 |
dg |
Use the VM_PAGE_BITS_ALL in a place it can be used. Comment out call to pmap_prefault() until stability problems can be thoroghly analyzed.
|
5903 |
25-Jan-1995 |
dg |
Don't attempt to clean device_pager backed objects at terminate time. There is similar bogusness in the pageout daemon that will be fixed soon. This fixes a panic pointed out to me by Bruce Evans that occurs when /dev/mem is used to map managed memory.
|
5841 |
24-Jan-1995 |
dg |
Added ability to detect sequential faults and DTRT. (swap_pager.c) Added hook for pmap_prefault() and use symbolic constant for new third argument to vm_page_alloc() (vm_fault.c, various) Changed the way that upages and page tables are held. (vm_glue.c) Fixed architectural flaw in allocating pages at interrupt time that was introduced with the merged cache changes. (vm_page.c, various) Adjusted some algorithms to acheive better paging performance and to accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c) Fixed pbuf handling problem, changed policy on handling read-behind page. (vnode_pager.c)
Submitted by: John Dyson
|
5636 |
15-Jan-1995 |
dg |
Moved some splx's down a few lines in vm_page_insert and vm_page_remove to make the locking a bit more clear - this change is currently a NOP as the calls to those routines are already at splhigh().
|
5571 |
13-Jan-1995 |
dg |
Protect a qcollapse call with an object lock before calling. The locks need to be moved into the qcollapse and rcollapse routines, but I don't have time at the moment to make all the required changes...this will do for now.
|
5520 |
11-Jan-1995 |
dg |
Improve my previous change to use the same tests as are used in qcollapse.
|
5519 |
11-Jan-1995 |
dg |
Fixed a panic that Garrett reported to me...the OBJ_INTERNAL flag wasn't being cleared in some cases for vnode backed objects; we now do this in vnode_pager_alloc proper to guarantee it. Also be more careful in the rcollapse code about messing with busy/bmapped pages.
|
5465 |
10-Jan-1995 |
dg |
Kill VM_PAGE_INIT macro as it is only used once and makes the code more difficult to understand. Got rid of unused vm_page flags.
|
5464 |
10-Jan-1995 |
dg |
Fixed some formatting weirdness that I overlooked in the previous commit.
|
5455 |
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
5404 |
05-Jan-1995 |
dg |
Make sure that the object being collapsed doesn't go away on us...by gaining extra references to it.
Submitted by: John Dyson Obtained from:
|
5348 |
02-Jan-1995 |
ats |
Submitted by: Ben Jackson just a missing newline in a kernel printf added.
|
5283 |
30-Dec-1994 |
bde |
Clean up previous commits (format for 80 columns...).
|
5203 |
23-Dec-1994 |
dg |
Do vm_page_rename more conservatively in rcollapse and qcollapse, and change list walk so that it doesn't get stuck in an infinite loop.
Submitted by: John Dyson
|
5202 |
23-Dec-1994 |
dg |
Initialize b_vnbuf.le_next before returning a new buffer in getpbuf and trypbuf. Move a couple of splbio's to be slightly less conservative.
|
5186 |
22-Dec-1994 |
dg |
Fixed a benign off by one error.
|
5166 |
19-Dec-1994 |
dg |
Don't ever clear B_BUSY on a pbuf (or any other flag for that matter). This appears to be the cause of some buffer confusion that leads to a panic during heavy paging.
Submitted by: John Dyson
|
5151 |
18-Dec-1994 |
dg |
Fixed multiple bogons with the map entry handling.
|
5146 |
18-Dec-1994 |
dg |
Fixed bug where statically allocated map entries might be freed to the malloc pool...causing a panic.
Submitted by: John Dyson
|
5145 |
18-Dec-1994 |
dg |
Change swapping policy to be a bit more aggressive about finding a candidate for swapout. Increased default RSS limit to a minimum of 2MB.
|
5114 |
15-Dec-1994 |
dg |
Protect kmem_map modifications with splhigh() to work around a problem with the map being locked at interrupt time.
|
5033 |
11-Dec-1994 |
dg |
Don't put objects that have no parent on the reverse_shadow_list. Problem identified and explained by Gene Stark (thanks Gene!).
Submitted by: John Dyson
|
4810 |
25-Nov-1994 |
dg |
These changes fix a couple of lingering VM problems:
1. The pageout daemon used to block under certain circumstances, and we needed to add new functionality that would cause the pageout daemon to block more often. Now, the pageout daemon mostly just gets rid of pages and kills processes when the system is out of swap. The swapping, rss limiting and object cache trimming have been folded into a new daemon called "vmdaemon". This new daemon does things that need to be done for the VM system, but can block. For example, if the vmdaemon blocks for memory, the pageout daemon can take care of it. If the pageout daemon had blocked for memory, it was difficult to handle the situation correctly (and in some cases, was impossible).
2. The collapse problem has now been entirely fixed. It now appears to be impossible to accumulate unnecessary vm objects. The object collapsing now occurs when ref counts drop to one (where it is more likely to be more simple anyway because less pages would be out on disk.) The original fixes were incomplete in that pathological circumstances could still be contrived to cause uncontrolled growth of swap. Also, the old code still, under steady state conditions, used more swap space than necessary. When using the new code, users will generally notice a significant decrease in swap space usage, and theoretically, the system should be leaving fewer unused pages around competing for memory.
Submitted by: John Dyson
|
4797 |
24-Nov-1994 |
dg |
Don't try to page to a vnode that had it's filesystem unmounted.
|
4768 |
22-Nov-1994 |
dg |
Preallocate the first swap block to work around a failure with swap starting at physical block 0. Note that this will show up in pstat -s and swapinfo as space "in use". In reality, the space is simply never made available.
|
4537 |
17-Nov-1994 |
dg |
Don't ever try to kill off process 1 - even if we are out of swap space and it's the candidate pig.
|
4534 |
17-Nov-1994 |
gibbs |
Remove a peice of commented out code that was left over from the early stages of debugging LFS:
* if we can't bmap, use old VOP code */ ! if (/* (vp->v_mount && vp->v_mount->mnt_stat.f_type == MOUNT_LFS) || */ ! VOP_BMAP(vp, foff, &dp, 0, 0)) { for (i = 0; i < count; i++) { if (i != reqpage) { vnode_pager_freepage(m[i]); --- 804,810 ---- /* * if we can't bmap, use old VOP code */ ! if (VOP_BMAP(vp, foff, &dp, 0, 0)) {
Reviewed by: gibbs Submitted by: John Dyson
|
4461 |
14-Nov-1994 |
bde |
pmap.h: Disable the bogus declaration of pmap_bootstrap(). Since its arg list is machine-dependent, it must be declared in a machine-dependent header.
vm_page.h: Change `inline' to `__inline' and old-style function parameter lists for inlined functions to new-style.
`inline' and old-style function parameter lists should never be used in system headers, even in very machine-dependent ones, because they cause warnings from gcc -Wreally-all.
|
4447 |
14-Nov-1994 |
dg |
Set laundry flag when transitioning an inactive page from clean to dirty. This fixes a performance bug where pages would sometimes not be paged out when they could be.
Submitted by: John Dyson
|
4446 |
13-Nov-1994 |
dg |
Fixed bug where a read-behind to a negative offset would occur if the fault was at offset 0 in the object. This resulted in more overhead but was othewise benign. Added incore() check in vnode_pager_has_page() to work around a problem with LFS...other than slightly higher overhead, this change has no affect on UFS.
|
4440 |
13-Nov-1994 |
dg |
Fixed bugs in accounting of swap space that resulted in the pager thinking it was out of space when it really wasn't.
Submitted by: John Dyson
|
4439 |
13-Nov-1994 |
dg |
Implemented swap locking via P_SWAPPING flag. It was possible for a process to be chosen for swap-in while it was being swapped-out. This was BAD.
Submitted by: John Dyson
|
4207 |
06-Nov-1994 |
dg |
Fixed return status from pagers. Ahem...the previous method would manufacture data when it couldn't get it legitimately. :-(
Submitted by: John Dyson
|
4203 |
06-Nov-1994 |
dg |
Added support for starting the experimental "vmdaemon" system process. Enabled via REL2_1.
Added support for doing object collapses "on the fly". Enabled via REL2_1a.
Improved object collapses so that they can happen in more cases. Improved sensing of modified pages to fix an apparant race condition and improve clustered pageout opportunities. Fixed an "oops" with not restarting page scan after a potential block in vm_pageout_clean() (not doing this can result in strange behavior in some cases).
Submitted by: John Dyson & David Greenman
|
3841 |
25-Oct-1994 |
dg |
Improved I/O error reporting.
|
3839 |
25-Oct-1994 |
dg |
#if 0'd out the object cache trimming code - there are multiple ways that the pageout daemon can deadlock otherwise.
Submitted by: John Dyson
|
3815 |
23-Oct-1994 |
dg |
Fixed object cache trimming policy so it actually works.
Submitted by: John Dyson
|
3814 |
23-Oct-1994 |
dg |
Adjusted reserved levels to fix a deadlock condition.
Submitted by: John Dyson
|
3807 |
23-Oct-1994 |
dg |
Changed a thread_sleep into an spl protected tsleep. A deadlock can occur otherwise. Minor efficiency improvement in vm_page_free().
Submitted by: John Dyson
|
3798 |
22-Oct-1994 |
phk |
Contrary to my last commit here: NFS-swap is enabled automatically.
|
3772 |
22-Oct-1994 |
dg |
Fixed a comment from the previous commit.
|
3766 |
22-Oct-1994 |
dg |
Various changes to allow operation without any swapspace configured. Note that this is intended for use only in floppy situations and is done at the sacrifice of performance in that case (in ther words, this is not the best solution, but works okay for this exceptional situation).
Submitted by: John Dyson
|
3748 |
21-Oct-1994 |
phk |
ATTENTION!
From now on, >all< swapdevices must be activated with "swapon".
If you havn't got it, add this line to /etc/fstab: /dev/wd0b none swap sw 0 0 ne sec
Reason: We want our GENERIC* kernels to have a large selection of swap-devices, but on the other hand, we don't want to use a wd0b as swap when we boot of a floppy. This way, we will never use a unexpected swapdevice. Nothing else has changed.
|
3745 |
21-Oct-1994 |
wollman |
Make my ALLDEVS kernel compile (basically, LINT minus a lot of options).
This involves fixing a few things I broke last time.
|
3692 |
18-Oct-1994 |
dg |
Fix the remaining vmmeter counters. They all now work correctly.
|
3660 |
17-Oct-1994 |
dg |
Put sanity check for negative hold count into #ifdef DIAGNOSTIC so that it doesn't consume an extra 3k of kernel text because of gcc's bogus inlining code.
|
3612 |
15-Oct-1994 |
dg |
1) Some of the counters in the vmmeter struct don't fit well into the Mach VM scheme of things, so I've changed them to be more appropriate. page in/ous are now associated with the pager that did them. Nuked v_fault as the only fault of interest that wouldn't be already counted in v_trap is a VM fault, and this is counted seperately. 2) Implemented most of the remaining counters and corrected the counting of some that were done wrong. They are all almost correct now...just a few minor ones left to fix.
|
3611 |
15-Oct-1994 |
dg |
Count vm faults as v_vm_fault, not v_fault.
|
3610 |
15-Oct-1994 |
dg |
Properly count object lookups and hits.
|
3591 |
14-Oct-1994 |
dg |
Got rid of redundant declaration warnings.
|
3587 |
14-Oct-1994 |
jkh |
Add missing )'s to previous midnight changes. :-)
|
3573 |
14-Oct-1994 |
dg |
Fixed bug where page modifications would be lost when swap space was almost depleted.
Reviewed by: John Dyson
|
3572 |
14-Oct-1994 |
dg |
Changed I/O error messages to be somewhat less cryptic. Removed a piece of unused code.
|
3567 |
13-Oct-1994 |
dg |
Fixed an object reference count problem that was caused by a call to vm_object_lookup() being outside of some parens. The bug was introduced via some recently added code.
Reviewed by: John Dyson
|
3451 |
09-Oct-1994 |
dg |
Got rid of map.h. It's a leftover from the rmap code, and we use rlists. Changed swapmap into swaplist.
|
3449 |
09-Oct-1994 |
phk |
Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc. Reviewed by: davidg
|
3446 |
09-Oct-1994 |
dg |
Call resetpriority, not setpriority() ...oops.
Submitted by: John Dyson
|
3407 |
07-Oct-1994 |
phk |
Cosmetics. Unused vars and other warnings.
|
3374 |
05-Oct-1994 |
dg |
Stuff object into v_vmdata rather than pager. Not important which at the moment, but will be in the future. Other changes mostly cosmetic, but are made for future VMIO considerations.
Submitted by: John Dyson
|
3373 |
05-Oct-1994 |
dg |
Fixed minor bug caused by some missing parens that can result in slightly reduced paging performance by missing a clustering opportunity. Found by Poul-Henning Kamp with gcc -Wall.
|
3354 |
04-Oct-1994 |
dg |
John Dyson's work in progress. Not currently used.
|
3347 |
04-Oct-1994 |
dg |
Fixed bug related to proper sensing of page modification that we inadvertantly introduced in pre-1.1.5. This could cause page modifications to go unnoticed during certain extreme low memory/high paging rate conditions.
Submitted by: John Dyson and David Greenman
|
3311 |
02-Oct-1994 |
phk |
GCC cleanup. Reviewed by: Submitted by: Obtained from:
|
3154 |
27-Sep-1994 |
dg |
Previous commit should have read ...in vm_page_alloc_contig(). ...(this commit): moved initialization of 'start' to make it more clear that it is initialized properly (also in vm_page_alloc_contig). Reviewed by: Submitted by: Obtained from:
|
3153 |
27-Sep-1994 |
dg |
Fixed another bug, and cleaned up the code.
|
3147 |
27-Sep-1994 |
dg |
Fixed multiple bugs in previous version of vm_page_alloc_contig.
|
3145 |
27-Sep-1994 |
dg |
1) New "vm_page_alloc_contig" routine by me. 2) Created a new vm_page flag "PG_FREE" to help track free pages. 3) Use PG_FREE flag to detect inconsistencies in a few places.
|
3103 |
25-Sep-1994 |
dg |
Removed unimplemented subr_rmap.c and unused references to it.
|
3083 |
25-Sep-1994 |
dg |
Disabled swap anti-fragmentation code. It reduces swap paging performance by 20% in my tests, and it appears to be the cause of a swap leak.
Submitted by: John Dyson
|
2692 |
12-Sep-1994 |
dg |
Fixed a bug I introduced when fixing the rss limit code. Changed swapout policy to be a bit more selective about what processes get swapped out.
Reviewed by: John Dyson
|
2689 |
12-Sep-1994 |
dg |
Eliminated a whole pile of ancient (we're taking 4.3BSD) VM system related #define constants. Corrected incorrect VM_MAX_KERNEL_ADDRESS.
Reviewed by: John Dyson
|
2688 |
12-Sep-1994 |
dg |
Don't deactivate pages in 0-refcount objects. Added a couple of missing paging stats. Fixed problem with free_reserved becoming depleted during certain swap_pager operations.
Submitted by: John Dyson, with a little help from me
|
2654 |
11-Sep-1994 |
dg |
Fixed problem with no swap on boot device, but there is some on an alternate device (as specified via kernel config file)...that casues the machine to panic.
|
2524 |
06-Sep-1994 |
dg |
Disabled a debugging printf.
|
2521 |
06-Sep-1994 |
dg |
Simple changes to paging algorithms...but boy do they make a difference. FreeBSD's paging performance has never been better. Wow.
Submitted by: John Dyson
|
2462 |
02-Sep-1994 |
dg |
Whoops, accidently left out some pieces of the munmapfd patch.
|
2455 |
02-Sep-1994 |
dg |
Removed all vestiges of tlbflush(). Replaced them with calls to pmap_update(). Made pmap_update an inline assembly function.
|
2413 |
30-Aug-1994 |
dg |
Fixed bug caused by change of rlimit variables to quad_t's. The bug was in using min() to calculate the minimum of rss_cur,rss_max - since these are now quad_t's and min() takes u_ints...the comparison later for exceeding the rss limit was always true - resulting in rather serious page thrashing. Now using new qmin() function for this purpose.
Fixed another bug where PG_BUSY pages would sometimes be paged out (bad!). This was caused by the PG_BUSY flag not being included in a comparison.
|
2386 |
29-Aug-1994 |
dg |
Patches from John Dyson to improve swap code efficiency. Religiously add back pmap_clear_modify() in vnode_pager_input until we figure out why system performance isn't what we expect.
Submitted by: John Dyson (swap_pager) & David Greenman (vnode_pager)
|
2320 |
27-Aug-1994 |
dg |
1) Changed ddb into a option rather than a pseudo-device (use options DDB in your kernel config now). 2) Added ps ddb function from 1.1.5. Cleaned it up a bit and moved into its own file. 3) Added \r handing in db_printf. 4) Added missing memory usage stats to statclock(). 5) Added dummy function to pseudo_set so it will be emitted if there are no other pseudo declarations.
|
2177 |
21-Aug-1994 |
paul |
Made idempotent Reviewed by: Submitted by:
|
2112 |
18-Aug-1994 |
wollman |
Fix up some sloppy coding practices:
- Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
1997 |
10-Aug-1994 |
dg |
Fixed vm_page_deactivate to deal with getting called with a page that's not on any queue. This is an old patchkit days fix.
Reviewed by: John Dyson and David Greenman Submitted by: originally by Paul Mackerras
|
1974 |
09-Aug-1994 |
dg |
Removed an old, obsolete call to vmmeter(). This is called now in the schedcpu() routine in kern/kern_synch.c. This extra call to vmmeter() in vm_glue.c was what was totally messing up the load average calculations.
|
1896 |
07-Aug-1994 |
dg |
Made pmap_kenter "TLB safe". ...and then removed all the pmap_updates that are no longer needed because of this.
|
1895 |
07-Aug-1994 |
dg |
Provide support for upcoming merged VM/buffer cache, and fixed a few bugs that haven't appeared to manifest themselves (yet).
Submitted by: John Dyson
|
1890 |
06-Aug-1994 |
dg |
Fixed various prototype problems with the pmap functions and the subsequent problems that fixing them caused.
|
1887 |
06-Aug-1994 |
dg |
Incorporated post 1.1.5 work from John Dyson. This includes performance improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/ pmap_kremove. These routine allow fast mapping of pages for those architectures that have "normal" MMUs. Also included is a fix to the pageout daemon to properly check a queue end condition.
Submitted by: John Dyson
|
1885 |
06-Aug-1994 |
dg |
Enabled page table preloading of cached objects.
Submitted by: John Dyson
|
1835 |
04-Aug-1994 |
dg |
Added some code that was accidently left out early in the 1.x -> 2.0 VM system conversion. Submitted by: John Dyson
|
1827 |
04-Aug-1994 |
dg |
Integrated VM system improvements/fixes from FreeBSD-1.1.5.
|
1817 |
02-Aug-1994 |
dg |
Added $Id$
|
1810 |
01-Aug-1994 |
dg |
Removed all code related to the pagescan daemon, and changed 'act_count' adjustments to compensate for a world without the pagescan daemon.
|
1687 |
06-Jun-1994 |
dg |
Don't move the page's position in the active queue if it is busy or held. John has noticed some stability problems when doing this.
|
1549 |
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
1542 |
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|