#
267654 |
|
19-Jun-2014 |
gjb |
Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
254692 |
|
23-Aug-2013 |
avg |
MFC r253604: rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST
|
#
240499 |
|
14-Sep-2012 |
zont |
MFC r239818: - Don't take an account of locked memory for current process in vslock(9).
There are two consumers of vslock(9): sysctl code and drm driver. These consumers are using locked memory as transient memory, it doesn't belong to a process's memory.
MFC r239895: - Remove accounting of locked memory from vsunlock(9) that I missed in r239818.
|
#
229251 |
|
01-Jan-2012 |
kib |
MFC r228567: Move kstack_cache_entry into the private header, and make the stack cache list header accessible outside vm_glue.c.
|
#
225736 |
|
22-Sep-2011 |
kensmith |
Copy head to stable/9 as part of 9.0-RELEASE release cycle.
Approved by: re (implicit)
|
#
223825 |
|
06-Jul-2011 |
trasz |
All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
|
#
220390 |
|
06-Apr-2011 |
jhb |
Fix several places to ignore processes that are not yet fully constructed.
MFC after: 1 week
|
#
220373 |
|
05-Apr-2011 |
trasz |
Add accounting for most of the memory-related resources.
Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
|
#
217192 |
|
09-Jan-2011 |
kib |
Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out.
Comments wording and reviewed by: alc
|
#
207728 |
|
06-May-2010 |
alc |
Eliminate page queues locking around most calls to vm_page_free().
|
#
207644 |
|
05-May-2010 |
alc |
Push down the acquisition of the page queues lock into vm_page_unwire().
Update the comment describing which lock should be held on entry to vm_page_wire().
Reviewed by: kib
|
#
207410 |
|
29-Apr-2010 |
kmacy |
On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
|
#
207365 |
|
29-Apr-2010 |
kib |
When doing kstack swapin, read as much pages in one run as possible.
Suggested and reviewed by: alc (previous version) Tested by: pho MFC after: 2 weeks
|
#
206823 |
|
18-Apr-2010 |
alc |
vm_thread_swapout() can safely dirty the page before rather than after acquiring the page queues lock.
|
#
206819 |
|
18-Apr-2010 |
jmallett |
o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the address space for an address as aligned by the new pmap_align_tlb() function, which is for constraints imposed by the TLB. [1] o) Add a kmem_alloc_nofault_space() function, which acts like kmem_alloc_nofault() but allows the caller to specify which find-space option to use. [1] o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the kernel stack address on MIPS. [1] o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on an odd boundary within the TLB, so that they are suitable for insertion as wired entries and do not have to share a TLB entry with another mapping, assuming they are appropriately-sized. o) Eliminate md_realstack now that the kstack will be appropriately-aligned on MIPS. o) Increase the number of guard pages to 2 so that we retain the proper alignment of the kstack address.
Reviewed by: [1] alc X-MFC-after: Making sure alc has not come up with a better interface.
|
#
206545 |
|
13-Apr-2010 |
alc |
Simplify vm_thread_swapin().
|
#
206483 |
|
11-Apr-2010 |
alc |
Initialize the virtual memory-related resource limits in a single place. Previously, one of these limits was initialized in two places to a different value in each place. Moreover, because an unsigned int was used to represent the amount of pageable physical memory, some of these limits were incorrectly initialized on 64-bit architectures. (Currently, this error is masked by login.conf's default settings.)
Make vm_thread_swapin() and vm_thread_swapout() static.
Submitted by: bde (an earlier version) Reviewed by: kib
|
#
198341 |
|
21-Oct-2009 |
marcel |
o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked).
The key property of this change is that the I-cache is made coherent *after* writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent *before* any writes happen. The difference is key when the I-cache prefetches.
|
#
196730 |
|
01-Sep-2009 |
kib |
Reintroduce the r196640, after fixing the problem with my testing.
Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1].
This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size.
Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks.
Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week
|
#
196648 |
|
29-Aug-2009 |
kib |
Reverse r196640 and r196644 for now.
|
#
196640 |
|
29-Aug-2009 |
kib |
Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1].
This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size.
Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks.
Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week
|
#
193643 |
|
07-Jun-2009 |
alc |
Eliminate unnecessary obfuscation when testing a page's valid bits.
|
#
193593 |
|
06-Jun-2009 |
alc |
If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check the page's valid bits. The page is guaranteed to be fully valid. (For the record, this is documented in vm/vm_pager.h's comments.)
|
#
193522 |
|
05-Jun-2009 |
alc |
vm_thread_swapin() needn't validate any pages. The pages are already validated by vm_pager_get_pages().
|
#
181334 |
|
05-Aug-2008 |
jhb |
If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()).
With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock.
Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal().
Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
|
#
178272 |
|
17-Apr-2008 |
jeff |
- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc.
Sponsored by: Nokia
|
#
177368 |
|
19-Mar-2008 |
jeff |
- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
|
#
177253 |
|
16-Mar-2008 |
rwatson |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr.
MFC after: 1 month Discussed with: imp, rink
|
#
177091 |
|
12-Mar-2008 |
jeff |
Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
|
#
177085 |
|
12-Mar-2008 |
jeff |
- Pass the priority argument from *sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_* to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter.
Reviewed by: jhb, peter
|
#
173361 |
|
05-Nov-2007 |
kib |
Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper).
In collaboration with: Peter Holm Reviewed by: jhb
|
#
172268 |
|
21-Sep-2007 |
jeff |
- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal.
Approved by: re
|
#
172207 |
|
17-Sep-2007 |
jeff |
- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM.
Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
|
#
170307 |
|
04-Jun-2007 |
jeff |
Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization.
Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
|
#
170174 |
|
31-May-2007 |
jeff |
- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits.
Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
|
#
170170 |
|
31-May-2007 |
attilio |
Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately.
Requested by: alc Approved by: jeff (mentor)
|
#
169667 |
|
18-May-2007 |
jeff |
- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
|
#
166188 |
|
23-Jan-2007 |
jeff |
- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched.
Discussed with: julian
- Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers.
Suggested by: jhb
Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
|
#
164936 |
|
06-Dec-2006 |
julian |
Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it.
Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable.
The ULE scheduler compiles again but I have no idea if it works.
The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit.
Tested by David Xu, and Dan Eischen using libthr and libpthread.
|
#
163709 |
|
26-Oct-2006 |
jb |
Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE).
Reviewed by: davidxu@
|
#
163622 |
|
23-Oct-2006 |
alc |
The page queues lock is no longer required by vm_page_wakeup().
|
#
159054 |
|
29-May-2006 |
tegge |
Close race between vmspace_exitfree() and exit1() and races between vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice.
Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier.
Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice.
Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().
Reviewed by: alc
|
#
153485 |
|
16-Dec-2005 |
alc |
Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors.
Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks
|
#
146554 |
|
23-May-2005 |
ups |
Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.
|
#
146501 |
|
22-May-2005 |
alc |
Swap in can occur safely without Giant. Release Giant on entry to scheduler().
|
#
146484 |
|
21-May-2005 |
alc |
Remove GIANT_REQUIRED from swapout_procs().
|
#
140622 |
|
22-Jan-2005 |
alc |
Guard against address wrap in kernacc(). Otherwise, a program accessing a bad address range through /dev/kmem can panic the machine.
Submitted by: Mark W. Krentel Reported by: Kris Kennaway MFC after: 1 week
|
#
139825 |
|
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
#
138129 |
|
27-Nov-2004 |
das |
Don't include sys/user.h merely for its side-effect of recursively including other headers.
|
#
137910 |
|
20-Nov-2004 |
das |
Disable U area swapping and remove the routines that create, destroy, copy, and swap U areas.
Reviewed by: arch@
|
#
137168 |
|
03-Nov-2004 |
alc |
The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol.
This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.
|
#
136923 |
|
24-Oct-2004 |
alc |
Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().
|
#
135470 |
|
19-Sep-2004 |
das |
The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.
|
#
134675 |
|
03-Sep-2004 |
alc |
Push Giant deep into vm_forkproc(), acquiring it only if the process has mapped System V shared memory segments (see shmfork_myhook()) or requires the allocation of an ldt (see vm_fault_wire()).
|
#
132898 |
|
30-Jul-2004 |
alc |
Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().
|
#
132684 |
|
27-Jul-2004 |
alc |
- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.
|
#
131434 |
|
02-Jul-2004 |
jhb |
- Don't use a variable to point to the user area that we only use once. Just use p2->p_uarea directly instead. - Remove an old and mostly bogus assertion regarding p2->p_sigacts. - Use RANGEOF macro ala fork1() to clean up bzero/bcopy of p_stats.
|
#
131163 |
|
26-Jun-2004 |
das |
Update a stale comment. The heuristic to swap processes out based on the number of pages already paged out was broken in rev 1.10 and removed in rev 1.11.
|
#
130551 |
|
15-Jun-2004 |
julian |
Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
|
#
129028 |
|
07-May-2004 |
green |
In r1.190, vslock() and vsunlock() were bogusly made to do a "user wire" and a "system unwire." Make this a "system wire" and "system unwire."
Reviewed by: alc
|
#
127961 |
|
06-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999.
Approved by: core
|
#
127013 |
|
15-Mar-2004 |
truckman |
Make overflow/wraparound checking more robust and unbreak len=0 in vslock(), mlock(), and munlock().
Reviewed by: bde
|
#
127008 |
|
15-Mar-2004 |
truckman |
Style(9) changes.
Pointed out by: bde
|
#
127007 |
|
15-Mar-2004 |
truckman |
Revert to the original vslock() and vsunlock() API with the following exceptions: Retain the recently added vslock() error return.
The type of the len argument should be size_t, not u_int.
Suggested by: bde
|
#
126728 |
|
07-Mar-2004 |
alc |
Retire pmap_pinit2(). Alpha was the last platform that used it. However, ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps, there has been no reason for having this function on Alpha. Briefly, when pmap_growkernel() relied upon the list of all processes to find and update the various pmaps to reflect a growth in the kernel's valid address space, pmap_init2() served to avoid a race between pmap initialization and pmap_growkernel(). Specifically, pmap_pinit2() was responsible for initializing the kernel portions of the pmap and pmap_pinit2() was called after the process structure contained a pointer to the new pmap for use by pmap_growkernel(). Thus, an update to the kernel's address space might be applied to the new pmap unnecessarily, but an update would never be lost.
|
#
126668 |
|
05-Mar-2004 |
truckman |
Undo the merger of mlock()/vslock and munlock()/vsunlock() and the introduction of kern_mlock() and kern_munlock() in src/sys/kern/kern_sysctl.c 1.150 src/sys/vm/vm_extern.h 1.69 src/sys/vm/vm_glue.c 1.190 src/sys/vm/vm_mmap.c 1.179 because different resource limits are appropriate for transient and "permanent" page wiring requests.
Retain the kern_mlock() and kern_munlock() API in the revived vslock() and vsunlock() functions.
Combine the best parts of each of the original sets of implementations with further code cleanup. Make the mclock() and vslock() implementations as similar as possible.
Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent test, which can return EAGAIN, last so that requests that have no hope of ever being satisfied will not be retried unnecessarily.
Disable the test that can return EAGAIN in the vslock() implementation because it will cause the sysctl code to wedge.
Tested by: Cy Schubert <Cy.Schubert AT komquats.com>
|
#
126253 |
|
25-Feb-2004 |
truckman |
Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way.
Enable the RLIMIT_MEMLOCK checking code in kern_mlock().
Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits.
Nuke the vslock() and vsunlock() implementations, which are no longer used.
Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request.
Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request.
Modify the callers of sysctl_wire_old_buffer() to look for the error return.
Modify sysctl_old_user to obey the wired buffer length and clean up its implementation.
Reviewed by: bms
|
#
125454 |
|
04-Feb-2004 |
jhb |
Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
|
#
125193 |
|
29-Jan-2004 |
bde |
Fixed breakage of scheduling in rev.1.29 of subr_4bsd.c. The "scheduler" here has very little to do with scheduling. It is actually the swapper, and it really must be the last SYSINIT'ed item like its comment says, since proc0 metamorphoses into swapper by calling scheduler() last in mi_start(), and scheduler() never returns.. Rev.1.29 of subr_4bsd.c broke this by adding another SI_ORDER_FIRST item (kproc_start() for schedcpu_thread() onto the SI_SUB_RUN_SCHEDULER_LIST. The sorting of SYSINITs with identical orders (at all levels) is apparently nondeterministic, so this resulted in schedule() sometimes being called second last and schedcpu_thread() not being called at all.
This quick fix just changes the code to almost match the comment (SI_ORDER_FIRST -> SI_ORDER_ANY). "LAST" is misspelled "ANY", and there is no way to ensure that there is only 1 very lst SYSINIT. A more complete fix would remove the SYSINIT obfuscation.
|
#
122384 |
|
09-Nov-2003 |
alc |
- The Open Group Base Specifications Issue 6 specifies that an munmap(2) must return EINVAL if size is zero. Submitted by: tegge - In order to avoid a race condition in multithreaded applications, the check and removal operations by munmap(2) must be in the same critical section. To accomodate this, vm_map_check_protection() is modified to require its caller to obtain at least a read lock on the map.
|
#
120811 |
|
05-Oct-2003 |
bms |
Revert previous commit. Come back vslock(), all is forgiven.
Pointy hat to: bms
|
#
120806 |
|
05-Oct-2003 |
bms |
Retire vslock() and vsunlock() with extreme prejudice.
Discussed with: pete
|
#
119059 |
|
17-Aug-2003 |
alc |
Three unrelated changes to vm_proc_new(): (1) add vm object locking on the U pages object; (2) reorganize such that the U pages object is created and filled in one block; and (3) remove an unnecessary clearing of PG_ZERO.
|
#
119004 |
|
16-Aug-2003 |
marcel |
In vm_thread_swap{in|out}(), remove the alpha specific conditional compilation and replace it with a call to cpu_thread_swap{in|out}(). This allows us to add similar code on ia64 without cluttering the code even more.
|
#
118771 |
|
11-Aug-2003 |
bms |
Add the mlockall() and munlockall() system calls. - All those diffs to syscalls.master for each architecture *are* necessary. This needed clarification; the stub code generation for mlockall() was disabled, which would prevent applications from linking to this API (suggested by mux) - Giant has been quoshed. It is no longer held by the code, as the required locking has been pushed down within vm_map.c. - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES to express their intention explicitly. - Inspected at the vmstat, top and vm pager sysctl stats level. Paging-in activity is occurring correctly, using a test harness. - The RES size for a process may appear to be greater than its SIZE. This is believed to be due to mappings of the same shared library page being wired twice. Further exploration is needed. - Believed to back out of allocations and locks correctly (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).
PR: kern/43426, standards/54223 Reviewed by: jake, alc Approved by: jake (mentor) MFC after: 2 weeks
|
#
118390 |
|
03-Aug-2003 |
phk |
Change the layout policy of the swap_pager from a hardcoded width striping to a per device round-robin algorithm.
Because of the policy of not attempting to retain previous swap allocation on page-out, this means that a newly added swap device almost instantly takes its 1/N share of the I/O load but it takes somewhat longer for it to assume it's 1/N share of the pages if there is plenty of space on the other devices.
Change the 8G total swapspace limitation to 8G per device instead by using a per device blist rather than one global blist. This reduces the memory footprint by 75% (typically a couple hundred kilobytes) for the common case with one swapdevice but NSWAPDEV=4.
Remove the compile time constant limit of number of swap devices, there is no limit now. Instead of a fixed size array, store the per swapdev structure in a TAILQ.
Total swap space is still addressed by a 32 bit page number and therefore the upper limit is now 2^42 bytes = 16TB (for i386).
We still do not allocate the first page of each device in order to give some amount of protection to any bsdlabel at the start of the device.
A new device is appended after the existing devices in the swap space, no attempt is made to fill in holes left behind by swapoff (this can trivially be changed should it ever become a problem).
The sysctl vm.nswapdev now reflects the number of currently configured swap devices.
Rename vm_swap_size to swap_pager_avail for consistency with other exported names.
Change argument type for vm_proc_swapin_all() and swap_pager_isswapped() to be a struct swdevt pointer rather than an index.
Not changed: we are still using blists to manage the free space, but since the swapspace is no longer fragmented by the striping different resource managers might fare better.
|
#
118234 |
|
30-Jul-2003 |
peter |
Add #include "opt_kstack_pages.h" and "opt_kstack_max_pages.h" to remain in sync with the backend machdep code. When cpu_thread_init() does not have the same idea of KSTACK_PAGES as the thing that created the kstack, all hell breaks loose.
Bad alc! no cookie! :-)
|
#
116359 |
|
14-Jun-2003 |
alc |
Use #ifdef __alpha__, not __alpha.
|
#
116355 |
|
14-Jun-2003 |
alc |
Migrate the thread stack management functions from the machine-dependent to the machine-independent parts of the VM. At the same time, this introduces vm object locking for the non-i386 platforms.
Two details:
1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The different machine-dependent implementations used various combinations of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set KSTACK_GUARD_PAGES to 0.
2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In 5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed to vm_page_alloc() or vm_page_grab().
|
#
116328 |
|
14-Jun-2003 |
alc |
Move the *_new_altkstack() and *_dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.
|
#
116279 |
|
13-Jun-2003 |
alc |
Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.
|
#
116226 |
|
11-Jun-2003 |
obrien |
Use __FBSDID().
|
#
116188 |
|
11-Jun-2003 |
peter |
GC unused cpu_wait() function
|
#
115522 |
|
31-May-2003 |
phk |
Remove unused variables
Found by: FlexeLint
|
#
114983 |
|
13-May-2003 |
jhb |
- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe.
Reviewed by: arch@ Approved by: re (rwatson)
|
#
114216 |
|
29-Apr-2003 |
kan |
Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h>
Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
#
114166 |
|
28-Apr-2003 |
alc |
- Lock the vm_object when performing swap_pager_isswapped(). - Assert that the vm_object is locked in swap_pager_isswapped().
|
#
114030 |
|
25-Apr-2003 |
jhb |
- Don't bother using the proc lock to test just P_SYSTEM as that is set in fork1() and never changes. - The proc lock is enough to cover reading p_state, so push down sched_lock into the PRS_NORMAL case of the switch on p_state.
|
#
114019 |
|
25-Apr-2003 |
alc |
- Lock the vm_object when iterating over its list of resident pages.
|
#
113918 |
|
23-Apr-2003 |
jhb |
Fix compiling in the NO_SWAPPING case.
Submitted by: bde (partially)
|
#
113867 |
|
22-Apr-2003 |
jhb |
- Always call faultin() in _PHOLD() if PS_INMEM is clear. This closes a race where a thread could assume that a process was swapped in by PHOLD() when it actually wasn't fully swapped in yet. - In faultin(), always msleep() if PS_SWAPPINGIN is set instead of doing this check after bumping p_lock in the PS_INMEM == 0 case. Also, sched_lock is only needed for setting and clearning swapping PS_* flags and the swap thread inhibitor. - Don't set and clear the thread swap inhibitor in the same loops as the pmap_swapin/out_thread() since we have to do it under sched_lock. Instead, mimic the treatment of the PS_INMEM flag and use separate loops to set the inhibitors when clearing PS_INMEM and clear the inhibitors when setting PS_INMEM. - swapout() now returns with the proc lock held as it holds the lock while adjusting the swapping-related PS_* flags so that the proc lock can be used to test those flags. - Only use the proc lock to check the swapping-related PS_* flags in several places. - faultin() no longer requires sched_lock to be held by callers. - Rename PS_SWAPPING to PS_SWAPPINGOUT to be less ambiguous now that we have PS_SWAPPINGIN.
|
#
113603 |
|
17-Apr-2003 |
trhodes |
Add some tunable descriptions.
Submitted by: hmp Discussed with: bde
|
#
113600 |
|
17-Apr-2003 |
trhodes |
Pre-content whitespace commit.
Discussed with: bde
|
#
109630 |
|
21-Jan-2003 |
alfred |
use 'void *' instead of 'caddr_t' for useracc, kernacc, vslock and vsunlock.
|
#
109572 |
|
20-Jan-2003 |
dillon |
Close the remaining user address mapping races for physical I/O, CAM, and AIO. Still TODO: streamline useracc() checks.
Reviewed by: alc, tegge MFC after: 7 days
|
#
108251 |
|
24-Dec-2002 |
alc |
- Hold the page queues lock around vm_page_wakeup().
|
#
107913 |
|
15-Dec-2002 |
dillon |
This is David Schultz's swapoff code which I am finally able to commit. This should be considered highly experimental for the moment.
Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 3 weeks
|
#
105695 |
|
22-Oct-2002 |
jhb |
- Check that a process isn't a new process (p_state == PRS_NEW) before trying to acquire it's proc lock since the proc lock may not have been constructed yet. - Split up the one big comment at the top of the loop and put the pieces in the right order above the various checks.
Reported by: kris (1)
|
#
105126 |
|
14-Oct-2002 |
julian |
Remove old useless debugging code
|
#
104094 |
|
28-Sep-2002 |
phk |
Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too.
Inspired by: FlexeLint warning #512
|
#
103767 |
|
21-Sep-2002 |
jake |
Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
|
#
103216 |
|
11-Sep-2002 |
julian |
Completely redo thread states.
Reviewed by: davidxu@freebsd.org
|
#
103123 |
|
09-Sep-2002 |
tanimura |
- Do not swap out a process if it is in creation. The process may have no address space yet.
- Check whether a process is a system process prior to dereferencing its p_vmspace. Aio assumes that only the curthread switches the address space of a system process.
|
#
103002 |
|
06-Sep-2002 |
julian |
Use UMA as a complex object allocator. The process allocator now caches and hands out complete process structures *including substructures* .
i.e. it get's the process structure with the first thread (and soon KSE) already allocated and attached, all in one hit.
For the average non threaded program (non KSE that is) the allocated thread and its stack remain attached to the process, even when the process is unused and in the process cache. This saves having to allocate and attach it later, effectively bringing us (hopefully) close to the efficiency of pre-KSE systems where these were a single structure.
Reviewed by: davidxu@freebsd.org, peter@freebsd.org
|
#
102950 |
|
05-Sep-2002 |
davidxu |
s/SGNL/SIG/ s/SNGL/SINGLE/ s/SNGLE/SINGLE/
Fix abbreviation for P_STOPPED_* etc flags, in original code they were inconsistent and difficult to distinguish between them.
Approved by: julian (mentor)
|
#
101105 |
|
31-Jul-2002 |
alc |
o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably leads to unnecessary pmap_page_protect() calls if one of these pages is paged out after unwiring.
Note: setting PG_MAPPED asserts that the page's pv list may be non-empty. Since checking the status of the page's pv list isn't any harder than checking this flag, the flag should probably be eliminated. Alternatively, PG_MAPPED could be set by pmap_enter() exclusively rather than various places throughout the kernel.
|
#
100913 |
|
30-Jul-2002 |
tanimura |
- Optimize wakeup() and its friends; if a thread waken up is being swapped in, we do not have to ask for the scheduler thread to do that.
- Assert that a process is not swapped out in runq functions and swapout().
- Introduce thread_safetoswapout() for readability.
- In swapout_procs(), perform a test that may block (check of a thread working on its vm map) first. This lets us call swapout() with the sched_lock held, providing a better atomicity.
|
#
100885 |
|
29-Jul-2002 |
julian |
Remove a XXXKSE comment. the code is no longer a problem..
|
#
100884 |
|
29-Jul-2002 |
julian |
Create a new thread state to describe threads that would be ready to run except for the fact tha they are presently swapped out. Also add a process flag to indicate that the process has started the struggle to swap back in. This will be needed for the case where multiple threads start the swapin action top a collision. Also add code to stop a process fropm being swapped out if one of the threads in this process is actually off running on another CPU.. that might hurt...
Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
|
#
100862 |
|
29-Jul-2002 |
alc |
o Pass VM_ALLOC_WIRED to vm_page_grab() rather than calling vm_page_wire() in pmap_new_thread(), pmap_pinit(), and vm_proc_new(). o Lock page queue accesses by vm_page_free() in pmap_object_init_pt().
|
#
100438 |
|
21-Jul-2002 |
tanimura |
Do not pass a thread with the state TDS_RUNQ to setrunqueue(), otherwise assertion in setrunqueue() fails.
|
#
99985 |
|
14-Jul-2002 |
alc |
o Lock page queue accesses by vm_page_wire().
|
#
99920 |
|
13-Jul-2002 |
alc |
o Lock some page queue accesses, in particular, those by vm_page_unwire().
|
#
99851 |
|
12-Jul-2002 |
peter |
Avoid a vm_page_lookup() - that uses a spinlock protected hash. We can just use the object's memq for our nefarious purposes.
|
#
99563 |
|
07-Jul-2002 |
peter |
Avoid vm_page_lookup() [grabs a spinlock] and just process the upage object memq instead.
Suggested by: alc
|
#
99559 |
|
07-Jul-2002 |
peter |
Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/ pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code and use a single equivalent MI version. There are other cleanups needed still.
While here, use the UMA zone hooks to keep a cache of preinitialized proc structures handy, just like the thread system does. This eliminates one dependency on 'struct proc' being persistent even after being freed. There are some comments about things that can be factored out into ctor/dtor functions if it is worth it. For now they are mostly just doing statistics to get a feel of how it is working.
|
#
99408 |
|
04-Jul-2002 |
julian |
A small cleanup.
|
#
99407 |
|
04-Jul-2002 |
julian |
Don;t call teh thread setup routines from here.. they are already called when uma calls thread_init()
|
#
99072 |
|
29-Jun-2002 |
julian |
Part 1 of KSE-III
The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
98600 |
|
21-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from vslock(). o Annotate kernacc(), useracc(), and vslock() as MPSAFE.
Motivated by: alfred
|
#
98263 |
|
15-Jun-2002 |
alc |
o Remove GIANT_REQUIRED from useracc() and vsunlock(). Neither vm_map_check_protection() nor vm_map_unwire() expect Giant to be held.
|
#
98226 |
|
14-Jun-2002 |
alc |
o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and vm_map_user_pageable(). o Remove vm_map_pageable() and vm_map_user_pageable(). o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were only used by vm_map_pageable() and vm_map_user_pageable().)
Reviewed by: tegge
|
#
95610 |
|
28-Apr-2002 |
alc |
o Introduce and use vm_map_trylock() to replace several direct uses of lockmgr(). o Add missing synchronization to vmspace_swap_count(): Obtain a read lock on the vm_map before traversing it.
|
#
92727 |
|
19-Mar-2002 |
alfred |
Remove __P.
|
#
92666 |
|
19-Mar-2002 |
peter |
Fix a gcc-3.1+ warning. warning: deprecated use of label at end of compound statement
ie: you cannot do this anymore: switch(foo) { ....
default: }
|
#
92588 |
|
18-Mar-2002 |
green |
Back out the modification of vm_map locks from lockmgr to sx locks. The best path forward now is likely to change the lockmgr locks to simple sleep mutexes, then see if any extra contention it generates is greater than removed overhead of managing local locking state information, cost of extra calls into lockmgr, etc.
Additionally, making the vm_map lock a mutex and respecting it properly will put us much closer to not needing Giant magic in vm.
|
#
92475 |
|
17-Mar-2002 |
alc |
Undo part of revision 1.57: Now that (o)sendsig() doesn't call useracc(), the motivation for saving and restoring the map->hint in useracc() is gone. (The same tests that motivated this change in revision 1.57 now show that there is no performance loss from removing it.) This was really a hack and some day we would have had to add new synchronization here on map->hint to maintain it.
|
#
92466 |
|
17-Mar-2002 |
alc |
Acquire a read lock on the map inside of vm_map_check_protection() rather than expecting the caller to do so. This (1) eliminates duplicated code in kernacc() and useracc() and (2) fixes missing synchronization in munmap().
|
#
92246 |
|
13-Mar-2002 |
green |
Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate. While doing this, move it earlier in the sysinit boot process so that the VM system can use it.
After that, the system is now able to use sx locks instead of lockmgr locks in the VM system. To accomplish this, some of the more questionable uses of the locks (such as testing whether they are owned or not, as well as allowing shared+exclusive recursion) are removed, and simpler logic throughout is used so locks should also be easier to understand.
This has been tested on my laptop for months, and has not shown any problems on SMP systems, either, so appears quite safe. One more user of lockmgr down, many more to go :)
|
#
92029 |
|
10-Mar-2002 |
eivind |
- Remove a number of extra newlines that do not belong here according to style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs.
Reviewed by: alc
|
#
91263 |
|
25-Feb-2002 |
peter |
Remove unused variable (td)
|
#
90538 |
|
11-Feb-2002 |
julian |
In a threaded world, differnt priorirites become properties of different entities. Make it so.
Reviewed by: jhb@freebsd.org (john baldwin)
|
#
90361 |
|
07-Feb-2002 |
julian |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
#
90263 |
|
05-Feb-2002 |
alfred |
Fix a race with free'ing vmspaces at process exit when vmspaces are shared.
Also introduce vm_endcopy instead of using pointer tricks when initializing new vmspaces.
The race occured because of how the reference was utilized: test vmspace reference, possibly block, decrement reference
When sharing a vmspace between multiple processes it was possible for two processes exiting at the same time to test the reference count, possibly block and neither one free because they wouldn't see the other's update.
Submitted by: green
|
#
89464 |
|
17-Jan-2002 |
bde |
Don't declare vm_swapout() in the NO_SWAPPING case when it is not defined.
Fixed some style bugs.
|
#
88900 |
|
05-Jan-2002 |
jhb |
Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed:
The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.)
I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly.
Reviewed by: peter Tested on: i386, alpha
|
#
84783 |
|
10-Oct-2001 |
ps |
Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable.
Reviewed by: peter MFC after: 2 weeks
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
83276 |
|
10-Sep-2001 |
peter |
Rip some well duplicated code out of cpu_wait() and cpu_exit() and move it to the MI area. KSE touched cpu_wait() which had the same change replicated five ways for each platform. Now it can just do it once. The only MD parts seemed to be dealing with fpu state cleanup and things like vm86 cleanup on x86. The rest was identical.
XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional stub in place.
Reviewed by: jake, tmm, dillon
|
#
79242 |
|
04-Jul-2001 |
dillon |
whitespace / register cleanup
|
#
79224 |
|
04-Jul-2001 |
dillon |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
#
78481 |
|
19-Jun-2001 |
jhb |
Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for now. The proc locking isn't actually safe yet and won't be until the proc locking is finished.
|
#
77089 |
|
23-May-2001 |
jhb |
- Lock the VM around the pmap_swapin_proc() call in faultin(). - Don't lock Giant in the scheduler() function except for when calling faultin(). - In swapout_procs(), lock the VM before the proccess to avoid a lock order violation. - In swapout_procs(), release the allproc lock before calling swapout(). We restart the process scan after swapping out a process. - In swapout_procs(), un #if 0 the code to bump the vmspace reference count and lock the process' vm structures. This bug was introduced by me and could result in the vmspace being free'd out from under a running process. - Fix an old bug where the vmspace reference count was not free'd if we failed the swap_idle_threshold2 test.
|
#
76827 |
|
18-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
#
76778 |
|
17-May-2001 |
jhb |
- Use a timeout for the tsleep in scheduler() instead of having vmmeter() wakeup proc0 by hand to enforce the timeout. - When swapping out a process, keep the process locked via the proc lock from the first checks up until we clear PS_INMEM and set PS_SWAPPING in swapout(). The swapout() function now must be called with the proc lock held and releases it before returning. - Comment out the code to attempt to lock a process' VM structures before swapping out. It is broken in that it releases the lock after obtaining it. If it does grab the lock, it needs to hand it off to swapout() instead of releasing it. This can be revisisted when the VM is locked as this is a valid test to perform. It also causes a lock order reversal for the time being, which is the immediate cause for temporarily disabling it.
|
#
76641 |
|
15-May-2001 |
jhb |
- Use PROC_LOCK_ASSERT instead of a direct mtx_assert. - Don't hold Giant in the swapper daemon while we walk the list of processes looking for a process to swap back in. - Don't bother grabbing the sched_lock while checking a process' sleep time in swapout_procs() to ensure that a process has been idle for at least swap_idle_threshold2 before swapping it out. If we lose the race we just let a process stay in memory until the next call of swapout_procs(). - Remove some unneeded spl's, sched_lock does all the locking needed in this case.
|
#
76166 |
|
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
#
74927 |
|
28-Mar-2001 |
jhb |
Convert the allproc and proctree locks from lockmgr locks to sx locks.
|
#
72376 |
|
11-Feb-2001 |
jake |
Implement a unified run queue and adjust priority levels accordingly.
- All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
|
#
72200 |
|
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
#
71610 |
|
24-Jan-2001 |
jhb |
- Doh, lock faultin() with proc lock in scheduler(). - Lock p_swtime with sched_lock in scheduler() as well.
|
#
71574 |
|
24-Jan-2001 |
jhb |
Argh, I didn't get this test right when I converted it. Break this up into two separate if's instead of nested if's. Also, reorder things slightly to avoid unnecessary mutex operations.
|
#
71570 |
|
24-Jan-2001 |
jhb |
- Catch up to proc flag changes. - Proc locking in a few places. - faultin() now must be called with the proc lock held. - Split up swappable() into a couple of tests so that it can be locke in swapout_procs(). - Use queue macros.
|
#
69947 |
|
12-Dec-2000 |
jake |
- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
|
#
69509 |
|
02-Dec-2000 |
jhb |
Protect p_stat with sched_lock.
|
#
69022 |
|
22-Nov-2000 |
jake |
Protect the following with a lockmgr lock:
allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid
Reviewed by: jhb Obtained from: BSD/OS and netbsd
|
#
67536 |
|
24-Oct-2000 |
jhb |
- Catch a machine/mutex.h -> sys/mutex.h I somehow missed. - Close a small race condition. The sched_lock mutex protects p->p_stat as well as the run queues. Another CPU could change p_stat of the process while we are waiting for the lock, and we would end up scheduling a process that isn't runnable.
|
#
65557 |
|
06-Sep-2000 |
jasone |
Major update to the way synchronization is done in the kernel. Highlights include:
* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.)
* Per-CPU idle processes.
* Interrupts are run in their own separate kernel threads and can be preempted (i386 only).
Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
|
#
59368 |
|
18-Apr-2000 |
phk |
Remove unneeded <sys/buf.h> includes.
Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks by 924 bytes.
|
#
58705 |
|
27-Mar-2000 |
charnier |
Revert spelling mistake I made in the previous commit Requested by: Alan and Bruce
|
#
58634 |
|
26-Mar-2000 |
charnier |
Spelling
|
#
57975 |
|
13-Mar-2000 |
phk |
Remove unused 3rd argument from vsunlock() which abused B_WRITE.
|
#
54188 |
|
06-Dec-1999 |
luoqi |
User ldt sharing.
|
#
52649 |
|
30-Oct-1999 |
alc |
Reverse the sense of the test in the KASSERT's from the last commit.
|
#
52644 |
|
30-Oct-1999 |
phk |
Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the "rw" argument, rather than hijacking B_{READ|WRITE}.
Fix two bugs (physio & cam) resulting by the confusion caused by this.
Submitted by: Tor.Egge@fast.no Reviewed by: alc, ken (partly)
|
#
52635 |
|
29-Oct-1999 |
phk |
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs.
This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
#
51337 |
|
17-Sep-1999 |
dillon |
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
Replace various VM related page count calculations strewn over the VM code with inlines to aid in readability and to reduce fragility in the code where modules depend on the same test being performed to properly sleep and wakeup.
Split out a portion of the page deactivation code into an inline in vm_page.c to support vm_page_dontneed().
add vm_page_dontneed(), which handles the madvise MADV_DONTNEED feature in a related commit coming up for vm_map.c/vm_object.c. This code prevents degenerate cases where an essentially active page may be rotated through a subset of the paging lists, resulting in premature disposal.
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
50034 |
|
18-Aug-1999 |
peter |
Update for run queue code.
|
#
48963 |
|
21-Jul-1999 |
alc |
Fix the following problem:
When creating new processes (or performing exec), the new page directory is initialized too early. The kernel might grow before p_vmspace is initialized for the new process. Since pmap_growkernel doesn't yet know about the new page directory, it isn't updated, and subsequent use causes a failure.
The fix is (1) to clear p_vmspace early, to stop pmap_growkernel from stomping on memory, and (2) to defer part of the initialization of new page directories until p_vmspace is initialized.
PR: kern/12378 Submitted by: tegge Reviewed by: dfr
|
#
48022 |
|
19-Jun-1999 |
alc |
Remove some unused function and variable declarations.
|
#
45363 |
|
06-Apr-1999 |
peter |
Only use p->p_lock (manage by PHOLD()/PRELE()) - P_NOSWAP/P_PHYSIO is no longer set.
|
#
44146 |
|
19-Feb-1999 |
luoqi |
Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
|
#
43208 |
|
26-Jan-1999 |
julian |
Enable Linux threads support by default. This takes the conditionals out of the code that has been tested by various people for a while. ps and friends (libkvm) will need a recompile as some proc structure changes are made.
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
|
#
42968 |
|
21-Jan-1999 |
dillon |
Removed low-memory blockages at fork. This is the wrong place to put this sort of test. We need to fix the low-memory handling in general.
|
#
42957 |
|
21-Jan-1999 |
dillon |
This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
|
#
42379 |
|
07-Jan-1999 |
julian |
Changes to the LINUX_THREADS support to only allocate extra memory for shared signal handling when there is shared signal handling being used.
This removes the main objection to making the shared signal handling a standard ability in rfork() and friends and 'unconditionalising' this code. (i.e. the allocation of an extra 328 bytes per process).
Signal handling information remains in the U area until such a time as it's reference count would be incremented to > 1. At that point a new struct is malloc'd and maintained in KVM so that it can be shared between the processes (threads) using it.
A function to check the reference count and move the struct back to the U area when it drops back to 1 is also supplied. Signal information is therefore now swapable for all processes that are not sharing that information with other processes. THis should addres the concerns raised by Garrett and others.
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
|
#
41936 |
|
19-Dec-1998 |
julian |
Fix two bogons created by 'patch(1)' in my last commit.
|
#
41931 |
|
19-Dec-1998 |
julian |
Reviewed by: Luoqi Chen, Jordan Hubbard Submitted by: "Richard Seaman, Jr." <lists@tar.com> Obtained from: linux :-)
Code to allow Linux Threads to run under FreeBSD.
By default not enabled This code is dependent on the conditional COMPAT_LINUX_THREADS (suggested by Garret) This is not yet a 'real' option but will be within some number of hours.
|
#
40286 |
|
13-Oct-1998 |
dg |
Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.
|
#
39770 |
|
29-Sep-1998 |
abial |
Make #define NO_SWAPPING a normal kernel config option.
Reviewed by: jkh
|
#
34030 |
|
04-Mar-1998 |
dufault |
Reviewed by: msmith, bde long ago POSIX.4 headers and sysctl variables. Nothing should change unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
|
#
33181 |
|
09-Feb-1998 |
eivind |
Staticize.
|
#
33134 |
|
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
#
33109 |
|
05-Feb-1998 |
dyson |
1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.
|
#
33108 |
|
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
#
32702 |
|
22-Jan-1998 |
dyson |
VM level code cleanups.
1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
|
#
31667 |
|
11-Dec-1997 |
dyson |
Fix the prototype for swapout_procs(); Submitted by: dima@best.net
|
#
31563 |
|
06-Dec-1997 |
dyson |
Support an optional, sysctl enabled feature of idle process swapout. This is apparently useful for large shell systems, or systems with long running idle processes. To enable the feature:
sysctl -w vm.swap_idle_enabled=1
Please note that some of the other vm sysctl variables have been renamed to be more accurate. Submitted by: Much of it from Matt Dillon <dillon@best.net>
|
#
31016 |
|
07-Nov-1997 |
phk |
Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by: -Wunused
|
#
28992 |
|
01-Sep-1997 |
bde |
Removed unused #includes.
|
#
28551 |
|
21-Aug-1997 |
bde |
#include <machine/limits.h> explicitly in the few places that it is required.
|
#
24917 |
|
14-Apr-1997 |
peter |
Unused variable (upobj is now purely handled within pmap)
|
#
24848 |
|
12-Apr-1997 |
dyson |
Fully implement vfork. Vfork is now much much faster than even our fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)
Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory from the other threads of a group.
Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating possible existing shares with other threads/processes.
Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a thread from the rest of the group.
Fix the case where a thread does an exec. It is almost nonsense for a thread to modify the other threads address space by an exec, so we now automatically divorce the address space before modifying it.
|
#
24691 |
|
07-Apr-1997 |
peter |
The biggie: Get rid of the UPAGES from the top of the per-process address space. (!)
Have each process use the kernel stack and pcb in the kvm space. Since the stacks are at a different address, we cannot copy the stack at fork() and allow the child to return up through the function call tree to return to user mode - create a new execution context and have the new process begin executing from cpu_switch() and go to user mode directly. In theory this should speed up fork a bit.
Context switch the tss_esp0 pointer in the common tss. This is a lot simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer to each process's tss since the esp0 pointer is a 32 bit pointer, and the sd_base setting is split into three different bit sections at non-aligned boundaries and requires a lot of twiddling to reset.
The 8K of memory at the top of the process space is now empty, and unmapped (and unmappable, it's higher than VM_MAXUSER_ADDRESS).
Simplity the pmap code to manage process contexts, we no longer have to double map the UPAGES, this simplifies and should measuably speed up fork().
The following parts came from John Dyson:
Set PG_G on the UPAGES that are now in kernel context, and invalidate them when swapping them out.
Move the upages object (upobj) from the vmspace to the proc structure.
Now that the UPAGES (pcb and kernel stack) are out of user space, make rfork(..RFMEM..) do what was intended by sharing the vmspace entirely via reference counting rather than simply inheriting the mappings.
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21733 |
|
15-Jan-1997 |
bde |
Removed redundant spl0()'s from kernel processes. They were work-arounds for a bug in fork().
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
21037 |
|
30-Dec-1996 |
dyson |
EEEK!!! useracc and kernacc didn't lock their respective maps. Additionally, eliminate the map->hint distortion associated with useracc. That may/may-not be the "right" thing to do -- but time will tell. Submitted by: Partially by Alan Cox <alc@cs.rice.edu>
|
#
20821 |
|
22-Dec-1996 |
joerg |
Make DFLDSIZ and MAXDSIZ fully-supported options.
"Don't forget to do a ``make depend''" :-)
|
#
18974 |
|
17-Oct-1996 |
dyson |
Make processes waken up eligible for immediate swap-in.
|
#
18937 |
|
15-Oct-1996 |
dyson |
Move much of the machine dependent code from vm_glue.c into pmap.c. Along with the improved organization, small proc fork performance is now about 5%-10% faster.
|
#
18307 |
|
15-Sep-1996 |
bde |
Removed iprintf(). It was copied to db_iprintf() in ddb.
|
#
16892 |
|
02-Jul-1996 |
dyson |
Properly set the PG_MAPPED and PG_WRITEABLE flags. This fixes some potential problems with vm_map_remove/vm_map_delete.
|
#
16858 |
|
30-Jun-1996 |
dyson |
Make -current consistant with -stable regarding time that a process sleeps before being swapped out. The time is increased from 4 secs to 10 secs. Originally I had decreased it from 20 to 4, but that is a bit severe. 20 is too long though.
|
#
16026 |
|
30-May-1996 |
dyson |
This commit is dual-purpose, to fix more of the pageout daemon queue corruption problems, and to apply Gary Palmer's code cleanups. David Greenman helped with these problems also. There is still a hang problem using X in small memory machines.
|
#
15809 |
|
18-May-1996 |
dyson |
This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:
More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.
|
#
15534 |
|
02-May-1996 |
phk |
KGDB is dead. It may come back one day if somebody does it.
|
#
15153 |
|
09-Apr-1996 |
dyson |
Reinstitute the map lock for processes being swapped out. This is needed because of the vm_fault used to bring the page table page for the kernel stack (UPAGES) back in. The consequence of the previous incorrect change was a system hang.
|
#
15134 |
|
08-Apr-1996 |
dyson |
Map lock checks not needed anymore for swapping out. We don't use map operations for it anymore. Certain deadlocks should never happen anymore.
|
#
15117 |
|
07-Apr-1996 |
bde |
Removed never-used #includes of <machine/cpu.h>. Many were apparently copied from bad examples.
|
#
15018 |
|
03-Apr-1996 |
dyson |
Fixed a problem that the UPAGES of a process were being run down in a suboptimal manner. I had also noticed some panics that appeared to be at least superficially caused by this problem. Also, included are some minor mods to support more general handling of page table page faulting. More details in a future commit.
|
#
14531 |
|
11-Mar-1996 |
hsu |
For Lite2: proc LIST changes. Reviewed by: davidg & bde
|
#
14432 |
|
09-Mar-1996 |
dyson |
Delay forking a process until there are more pages available. It was possible to deadlock with the low threshold that we had used.
|
#
14316 |
|
02-Mar-1996 |
dyson |
1) Eliminate unnecessary bzero of UPAGES. 2) Eliminate unnecessary copying of pages during/after forks. 3) Add user map simplification.
|
#
14221 |
|
23-Feb-1996 |
peter |
kern_descrip.c: add fdshare()/fdcopy() kern_fork.c: add the tiny bit of code for rfork operation. kern/sysv_*: shmfork() takes one less arg, it was never used. sys/shm.h: drop "isvfork" arg from shmfork() prototype sys/param.h: declare rfork args.. (this is where OpenBSD put it..) sys/filedesc.h: protos for fdshare/fdcopy. vm/vm_mmap.c: add minherit code, add rounding to mmap() type args where it makes sense. vm/*: drop unused isvfork arg.
Note: this rfork() implementation copies the address space mappings, it does not connect the mappings together. ie: once the two processes have split, the pages may be shared, but the address space is not. If one does a mmap() etc, it does not appear in the other. This makes it not useful for pthreads, but it is useful in it's own right for having light-weight threads in a static shared address space.
Obtained from: Original by Ron Minnich, extended by OpenBSD
|
#
14178 |
|
22-Feb-1996 |
dg |
Add a "NO_SWAPPING" option to disable swapping. This was originally done to help diagnose a problem on wcarchive (where the kernel stack was sometimes not present), but is useful in its own right since swapping actually reduces performance on some systems (such as wcarchive). Note: swapping in this context means making the U pages pageable and has nothing to do with generic VM paging, which is unaffected by this option.
Reviewed by: <dyson>
|
#
13705 |
|
29-Jan-1996 |
dg |
Added a check/panic for vm_map_find failing to find space for the page tables/u-pages when forking. This is a "can't happen" case. :-)
|
#
13628 |
|
25-Jan-1996 |
phk |
Don't use %r, we havn't got it anymore. Submitted by: bde
|
#
13490 |
|
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
13228 |
|
04-Jan-1996 |
wollman |
Convert DDB to new-style option.
|
#
13226 |
|
04-Jan-1996 |
wollman |
Convert SYSV IPC to new-style options. (I hope I got everything...) The LKMs will need an extra file, to come later.
|
#
12820 |
|
14-Dec-1995 |
phk |
Another mega commit to staticize things.
|
#
12662 |
|
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
#
12569 |
|
02-Dec-1995 |
bde |
Finished (?) cleaning up sysinit stuff.
|
#
12110 |
|
05-Nov-1995 |
dyson |
Greatly simplify the msync code. Eliminate complications in vm_pageout for msyncing. Remove a bug that manifests itself primarily on NFS (the dirty range on the buffers is not set on msync.)
|
#
11709 |
|
23-Oct-1995 |
dyson |
Get rid of machine-dependent NBPG and replace with PAGE_SIZE.
|
#
11526 |
|
16-Oct-1995 |
dyson |
Remove an unnecessary tsleep in the swapin code. This tsleep can defer swapping in processes and is just not the right thing to do.
|
#
10989 |
|
24-Sep-1995 |
dyson |
Perform more checking for proper loading of the UPAGES when a process is swapped in. Also, remove unnecessary map locking/unlocking during selection of processes to be swapped out.
This code might afford proper panics as opposed to spontaneous reboots on certain systems. This should allow us to debug these problems better.
|
#
10835 |
|
16-Sep-1995 |
dg |
Check the return value from vm_map_pageable() when mapping the process's UPAGES and associated page table page. Panic on error. This is less than optimial and will be fixed in the future, but is better than the old behavior of panicing with a "kernel page directory invalid" in pmap_enter.
|
#
10653 |
|
09-Sep-1995 |
dg |
Fixed init functions argument type - caddr_t -> void *. Fixed a couple of compiler warnings.
|
#
10358 |
|
28-Aug-1995 |
julian |
Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular..
NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases..
certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task)
The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.
|
#
9507 |
|
13-Jul-1995 |
dg |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
#
9468 |
|
10-Jul-1995 |
dg |
swapout_threads() -> swapout_procs().
|
#
9467 |
|
10-Jul-1995 |
dg |
Increased global RSS limit to total RAM.
|
#
8876 |
|
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
#
7883 |
|
16-Apr-1995 |
dg |
Moved some zero-initialized variables into .bss. Made code intended to be called only from DDB #ifdef DDB. Removed some completely unused globals.
|
#
7430 |
|
28-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) that I didn't notice when I fixed "all" such warnings before.
|
#
6601 |
|
21-Feb-1995 |
dg |
Panic if u_map allocation fails.
|
#
6571 |
|
20-Feb-1995 |
dg |
VM for the kernel stack and page tables doesn't need to be explicitly deallocated as it isn't inherited across the fork. Use vm_map_find not vm_allocate.
Submitted by: John Dyson
|
#
6356 |
|
14-Feb-1995 |
phk |
YF Fix.
|
#
6129 |
|
02-Feb-1995 |
dg |
swap_pager.c: Fixed long standing bug in freeing swap space during object collapses. Fixed 'out of space' messages from printing out too often. Modified to use new kmem_malloc() calling convention. Implemented an additional stat in the swap pager struct to count the amount of space allocated to that pager. This may be removed at some point in the future. Minimized unnecessary wakeups.
vm_fault.c: Don't try to collect fault stats on 'swapped' processes - there aren't any upages to store the stats in. Changed read-ahead policy (again!).
vm_glue.c: Be sure to gain a reference to the process's map before swapping. Be sure to lose it when done.
kern_malloc.c: Added the ability to specify if allocations are at interrupt time or are 'safe'; this affects what types of pages can be allocated.
vm_map.c: Fixed a variety of map lock problems; there's still a lurking bug that will eventually bite.
vm_object.c: Explicitly initialize the object fields rather than bzeroing the struct. Eliminated the 'rcollapse' code and folded it's functionality into the "real" collapse routine. Moved an object_unlock() so that the backing_object is protected in the qcollapse routine. Make sure nobody fools with the backing_object when we're destroying it. Added some diagnostic code which can be called from the debugger that looks through all the internal objects and makes certain that they all belong to someone.
vm_page.c: Fixed a rather serious logic bug that would result in random system crashes. Changed pagedaemon wakeup policy (again!).
vm_pageout.c: Removed unnecessary page rotations on the inactive queue. Changed the number of pages to explicitly free to just free_reserved level.
Submitted by: John Dyson
|
#
5841 |
|
24-Jan-1995 |
dg |
Added ability to detect sequential faults and DTRT. (swap_pager.c) Added hook for pmap_prefault() and use symbolic constant for new third argument to vm_page_alloc() (vm_fault.c, various) Changed the way that upages and page tables are held. (vm_glue.c) Fixed architectural flaw in allocating pages at interrupt time that was introduced with the merged cache changes. (vm_page.c, various) Adjusted some algorithms to acheive better paging performance and to accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c) Fixed pbuf handling problem, changed policy on handling read-behind page. (vnode_pager.c)
Submitted by: John Dyson
|
#
5464 |
|
10-Jan-1995 |
dg |
Fixed some formatting weirdness that I overlooked in the previous commit.
|
#
5455 |
|
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
#
5145 |
|
18-Dec-1994 |
dg |
Change swapping policy to be a bit more aggressive about finding a candidate for swapout. Increased default RSS limit to a minimum of 2MB.
|
#
4439 |
|
13-Nov-1994 |
dg |
Implemented swap locking via P_SWAPPING flag. It was possible for a process to be chosen for swap-in while it was being swapped-out. This was BAD.
Submitted by: John Dyson
|
#
3449 |
|
08-Oct-1994 |
phk |
Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc. Reviewed by: davidg
|
#
2692 |
|
12-Sep-1994 |
dg |
Fixed a bug I introduced when fixing the rss limit code. Changed swapout policy to be a bit more selective about what processes get swapped out.
Reviewed by: John Dyson
|
#
2112 |
|
18-Aug-1994 |
wollman |
Fix up some sloppy coding practices:
- Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
#
1974 |
|
09-Aug-1994 |
dg |
Removed an old, obsolete call to vmmeter(). This is called now in the schedcpu() routine in kern/kern_synch.c. This extra call to vmmeter() in vm_glue.c was what was totally messing up the load average calculations.
|
#
1827 |
|
04-Aug-1994 |
dg |
Integrated VM system improvements/fixes from FreeBSD-1.1.5.
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|