#
272461 |
|
02-Oct-2014 |
gjb |
Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
265606 |
|
07-May-2014 |
scottl |
Merge r264984
Retire smp_active. It was racey and caused demonstrated problems with the cpufreq code. Replace its use with smp_started. There's at least one userland tool that still looks at the kern.smp.active sysctl, so preserve it but point it to smp_started as well.
Obtained from: Netflix, Inc.
|
#
264147 |
|
05-Apr-2014 |
kib |
MFC r263912: Clear the kernel grab of the FPU state on fork.
|
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
255844 |
|
24-Sep-2013 |
kib |
Ensure that the ERESTART return from the syscall reloads the registers, to make the restarted syscall instruction pass the correct arguments.
PR: kern/182161 Reported by: Russ Cox <rsc@swtch.com> Sponsored by: The FreeBSD Foundation MFC after: 3 days Approved by: re (marius)
|
#
255827 |
|
23-Sep-2013 |
kib |
Free both KVA and backing pages when freeing TSS memory.
Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (marius)
|
#
255289 |
|
06-Sep-2013 |
glebius |
On those machines, where sf_bufs do not represent any real object, make sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely empty functions.
Reviewed by: alc, kib, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix
|
#
255217 |
|
04-Sep-2013 |
kib |
Tidy up some loose ends in the PCID code:
- Restore the pre-PCID TLB shootdown handlers for whole address space and single page invalidation asm code, and assign the IPI handler to them when PCID is not supported or disabled. Old handlers have linear control flow. But, still use the common return sequence.
- Stop using pcpu for INVPCID descriptors in the invlrg handler. It is enough to allocate descriptors on the stack. As result, two SWAPGS instructions are shaved off from the code for Haswell+.
- Fix the reverted condition in invlrng for checking of the PCID support [1], also in invlrng check that pmap is kernel pmap before performing other tests. For the kernel pmap, which provides global mappings, the INVLPG must be used for invalidation always.
- Save the pre-computed pmap' %CR3 register in the struct pmap. This allows to remove several checks for pm_pcid validity when %CR3 is reloaded [2].
Noted by: gibbs [1] Discussed with: alc [2] Tested by: pho, flo Sponsored by: The FreeBSD Foundation
|
#
255060 |
|
30-Aug-2013 |
kib |
Implement support for the process-context identifiers ('PCID') on Intel CPUs. The feature tags TLB entries with the Id of the address space and allows to avoid TLB invalidation on the context switch, it is available only in the long mode. In the microbenchmarks, using the PCID decreased latency of the context switches by ~30% on SandyBridge class desktop CPUs, measured with the lat_ctx program from lmbench.
If available, use INVPCID instruction when a TLB entry in non-current address space needs to be invalidated. The instruction is typically available on the Haswell.
If needed, the use of PCID can be turned off with the vm.pmap.pcid_enabled loader tunable set to 0. The state of the feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl vm.pmap.pcid_save_cnt reports the number of context switches which avoided invalidating the TLB; compare with the total number of context switches, available as sysctl vm.stats.sys.v_swtch.
Sponsored by: The FreeBSD Foundation Reviewed by: alc Tested by: pho, bf
|
#
254025 |
|
07-Aug-2013 |
jeff |
Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
#
250624 |
|
13-May-2013 |
ed |
Improve readability of static assertions for OFFSET_* macros.
Instead of doing all sorts of weird casting of constants to pointer-pointers, simply use the standard C offsetof() macro to obtain the offset of the respective fields in the structures.
|
#
245204 |
|
09-Jan-2013 |
neel |
Add a "pause" to busy wait loops in the cpu reset path.
This should not matter much when running on bare metal but it makes the guest more friendly when running inside a virtual machine.
Discussed with: jhb Obtained from: NetApp
|
#
238623 |
|
19-Jul-2012 |
kib |
Introduce curpcb magic variable, similar to curthread, which is MD amd64. It is implemented as __pure2 inline with non-volatile asm read from pcpu, which allows a compiler to cache its results.
Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb.
Note that __curthread() uses magic value 0 as an offsetof(struct pcpu, pc_curthread). It seems to be done this way due to machine/pcpu.h needs to be processed before sys/pcpu.h, because machine/pcpu.h contributes machine-depended fields to the struct pcpu definition. As result, machine/pcpu.h cannot use struct pcpu yet.
The __curpcb() also uses a magic constant instead of offsetof(struct pcpu, pc_curpcb) for the same reason. The constants are now defined as symbols and CTASSERTs are added to ensure that future KBI changes do not break the code.
Requested and reviewed by: bde MFC after: 3 weeks
|
#
231441 |
|
10-Feb-2012 |
kib |
In cpu_set_user_tls(), consistently set PCB_FULL_IRET pcb flag for both 64bit and 32bit binaries, not for 64bit only.
The set of the flag is not neccessary there, because the only current user of the cpu_set_user_tls() is create_thread(), which calls cpu_set_upcall() before and cpu_set_upcall() itself sets PCB_FULL_IRET. Change the function for consistency and preserve existing KPI for now.
MFC after: 1 week
|
#
230426 |
|
21-Jan-2012 |
kib |
Add support for the extended FPU states on amd64, both for native 64bit and 32bit ABIs. As a side-effect, it enables AVX on capable CPUs.
In particular:
- Query the CPU support for XSAVE, list of the supported extensions and the required size of FPU save area. The hw.use_xsave tunable is provided for disabling XSAVE, and hw.xsave_mask may be used to select the enabled extensions.
- Remove the FPU save area from PCB and dynamically allocate the (run-time sized) user save area on the top of the kernel stack, right above the PCB. Reorganize the thread0 PCB initialization to postpone it after BSP is queried for save area size.
- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as well. FPU state is only useful for suspend, where it is saved in dynamically allocated suspfpusave area.
- Use XSAVE and XRSTOR to save/restore FPU state, if supported and enabled.
- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that mcontext_t has a valid pointer to out-of-struct extended FPU state. Signal handlers are supplied with stack-allocated fpu state. The sigreturn(2) and setcontext(2) syscall honour the flag, allowing the signal handlers to inspect and manipilate extended state in the interrupted context.
- The getcontext(2) never returns extended state, since there is no place in the fixed-sized mcontext_t to place variable-sized save area. And, since mcontext_t is embedded into ucontext_t, makes it impossible to fix in a reasonable way. Instead of extending getcontext(2) syscall, provide a sysarch(2) facility to query extended FPU state.
- Add ptrace(2) support for getting and setting extended state; while there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.
- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to consumers, making it opaque. Internally, struct fpu_kern_ctx now contains a space for the extended state. Convert in-kernel consumers of fpu_kern KPI both on i386 and amd64.
First version of the support for AVX was submitted by Tim Bird <tim.bird am sony com> on behalf of Sony. This version was written from scratch.
Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org> MFC after: 1 month
|
#
223758 |
|
04-Jul-2011 |
attilio |
With retirement of cpumask_t and usage of cpuset_t for representing a mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.
Remove them and replace their usage with custom pc_cpuid magic (as, atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).
This change is not targeted for MFC because of struct pcpu members removal and dependency by cpumask_t retirement.
MD review by: marcel, marius, alc Tested by: pluknet MD testing by: marcel, marius, gonzo, andreast
|
#
222813 |
|
07-Jun-2011 |
attilio |
etire the cpumask_t type and replace it with cpuset_t usage.
This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being.
Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES.
Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64).
Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC.
People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately.
Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development.
(I really hope I didn't forget anyone, if it happened I apologize in advance).
|
#
217896 |
|
26-Jan-2011 |
dchagin |
Add macro to test the sv_flags of any process. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures.
MFC after: 1 month
|
#
216634 |
|
21-Dec-2010 |
jkim |
Improve PCB flags handling and make it more robust. Add two new functions for manipulating pcb_flags. These inline functions are very similar to atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK prefix for SMP. Add comments about the rationale[1]. Use these functions wherever possible. Although there are some places where it is not strictly necessary (e.g., a PCB is copied to create a new PCB), it is done across the board for sake of consistency. Turn pcb_full_iret into a PCB flag as it is safe now. Move rarely used fields before pcb_flags and reduce size of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in the neighborhood.
Reviewed by: kib Submitted by: kib[1] MFC after: 2 months
|
#
216255 |
|
07-Dec-2010 |
kib |
Update some comments related to use of amd64 full context switch. In exec_linux_setregs(), use locally cached pointer to pcb to set pcb_full_iret. In set_regs(), note that full return is needed when code that sets segment registers is enabled.
MFC after: 1 week
|
#
216253 |
|
07-Dec-2010 |
kib |
Retire write-only PCB_FULLCTX pcb flag on amd64.
Reminded by: Petr Salinger <Petr.Salinger seznam cz> Tested by: pho MFC after: 1 week
|
#
211197 |
|
11-Aug-2010 |
jhb |
Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.
|
#
209198 |
|
15-Jun-2010 |
kib |
Use critical sections instead of disabling local interrupts to ensure the consistency between PCPU fpcurthread and the state of the FPU.
Explicitely assert that the calling conventions for fpudrop() are adhered too. In cpu_thread_exit(), add missed critical section entrance.
Reviewed by: bde Tested by: pho MFC after: 1 month
|
#
208833 |
|
05-Jun-2010 |
kib |
Introduce the x86 kernel interfaces to allow kernel code to use FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area.
Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered.
KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month
|
#
205014 |
|
11-Mar-2010 |
nwhitehorn |
Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms.
Reviewed by: kib, jhb
|
#
204309 |
|
25-Feb-2010 |
attilio |
Introduce the new kernel sub-tree x86 which should contain all the code shared and generalized between our current amd64, i386 and pc98.
This is just an initial step that should lead to a more complete effort. For the moment, a very simple porting of cpufreq modules, BIOS calls and the whole MD specific ISA bus part is added to the sub-tree but ideally a lot of code might be added and more shared support should grow.
Sponsored by: Sandvine Incorporated Reviewed by: emaste, kib, jhb, imp Discussed on: arch MFC: 3 weeks
|
#
200444 |
|
12-Dec-2009 |
kib |
For ia32 syscall(), call cpu_set_syscall_retval(). Update comment inside cpu_set_syscall_retval() accordingly.
MFC after: 1 week
|
#
199135 |
|
10-Nov-2009 |
kib |
Extract the code that records syscall results in the frame into MD function cpu_set_syscall_retval().
Suggested by: marcel Reviewed by: marcel, davidxu PowerPC, ARM, ia64 changes: marcel Sparc64 tested and reviewed by: marius, also sunv reviewed MIPS tested by: gonzo MFC after: 1 month
|
#
195486 |
|
09-Jul-2009 |
kib |
Restore the segment registers and segment base MSRs for amd64 syscall return path only when neither thread was context switched while executing syscall code nor syscall explicitely modified LDT or MSRs.
Save segment registers in trap handlers before interrupts are enabled, to not allow context switches to happen before registers are saved. Use separated byte in pcb for indication of fast/full return, since pcb_flags are not synchronized with context switches.
The change puts back syscall microbenchmark numbers that were slowed down after commit of the support for LDT on amd64.
Reviewed by: jeff Tested (and tested, and tested ...) by: pho Approved by: re (kensmith)
|
#
190620 |
|
01-Apr-2009 |
kib |
Save and restore segment registers on amd64 when entering and leaving the kernel on amd64. Fill and read segment registers for mcontext and signals. Handle traps caused by restoration of the invalidated selectors.
Implement user-mode creation and manipulation of the process-specific LDT descriptors for amd64, see sysarch(2).
Implement support for TSS i/o port access permission bitmap for amd64.
Context-switch LDT and TSS. Do not save and restore segment registers on the context switch, that is handled by kernel enter/leave trampolines now. Remove segment restore code from the signal trampolines for freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.
Implement amd64-specific compat shims for sysarch.
Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.
TODO: Currently, gdb is not adapted to show segment registers from struct reg. Also, no machine-depended ptrace command is added to set segment registers for debugged process.
In collaboration with: pho Discussed with: peter Reviewed by: jhb Linuxolator tested by: dchagin
|
#
190239 |
|
22-Mar-2009 |
alc |
In general, the kernel virtual address of the pml4 page table page that is stored in the pmap is from the direct map region. The two exceptions have been the kernel pmap and the swapper's pmap. These pmaps have used a kernel virtual address established by pmap_bootstrap() for their shared pml4 page table page. However, there is no reason not to use the direct map for these pmaps as well.
|
#
190237 |
|
22-Mar-2009 |
alc |
Eliminate the recomputation of pcb_cr3 from cpu_set_upcall(). The bcopy()ed value from the old thread is the correct value because the new thread and the old thread will share a page table.
|
#
189282 |
|
02-Mar-2009 |
kib |
Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process executing on 64bit kernel. This eliminates the direct comparisions of p_sysent with &ia32_freebsd_sysvec, that were left intact after r185169.
|
#
183615 |
|
05-Oct-2008 |
davidxu |
If the current thread has the trap bit set (i.e. a debugger had single stepped the process to the system call), we need to clear the trap flag from the new frame. Otherwise, the new thread will receive a (likely unexpected) SIGTRAP when it executes the first instruction after returning to userland.
|
#
182936 |
|
11-Sep-2008 |
jhb |
Update the comments above the 0xcf9 register reset attempt to match the code. We only attempt a single reset using this method (a "hard" reset), and we use two writes to ensure there is a 0 -> 1 transition in bit 2 to force a reset.
MFC after: 1 week
|
#
177091 |
|
12-Mar-2008 |
jeff |
Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
|
#
173615 |
|
14-Nov-2007 |
marcel |
o Rename cpu_thread_setup() to cpu_thread_alloc() to better communicate that it relates to (is called by) thread_alloc() o Add cpu_thread_free() which is called from thread_free() to counter-act cpu_thread_alloc().
i386: Have cpu_thread_free() call cpu_thread_clean() to preserve behaviour. ia64: Have cpu_thread_free() call mtx_destroy() for the mutex initialized in cpu_thread_alloc().
PR: ia64/118024
|
#
170305 |
|
04-Jun-2007 |
jeff |
- Change comments and asserts to reflect the removal of the global scheduler lock.
Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
|
#
169030 |
|
24-Apr-2007 |
jhb |
Fix the triple fault used as a last resort during a reboot to actually fault. The previous method zero'd out the page tables, invalidated the TLB, and then entered a spin loop. The idea was that the instruction after the TLB invalidate would result in a page fault and the page fault and subsequent double fault wouldn't be able to determine the physical page for their fault handlers' first instruction. This stopped working when PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3 reload doesn't clear TLB entries with PG_G set. Thus, the CPU was still able to map the virtual address for the spin loop and happily performed its infinite loop.
The triple fault now uses a much more deterministic sledge-hammer approach to generate a triple fault. First, the IDT descriptor is set to point to an empty IDT, so any interrupts (including a double fault) will instantly fault. Second, we trigger a int 3 breakpoint to force an interrupt and kick off a triple fault.
MFC after: 3 days
|
#
169029 |
|
24-Apr-2007 |
jhb |
MFi386: Attempt to reset the machine using the Reset Control register and Fast A20 and Init register if the keyboard reset doesn't work before resorting to a triple fault.
|
#
162378 |
|
17-Sep-2006 |
davidxu |
Make cpu_set_upcall_kse() and cpu_set_user_tls() work for 32bit process.
|
#
160617 |
|
24-Jul-2006 |
davidxu |
Remove a duplicated line.
|
#
151633 |
|
24-Oct-2005 |
jhb |
When restarting the BSP during cpu_reset() use a membar to ensure that the updated cpustop_restartfunc is seen when the BSP resumes execution. This matches the membar already present in restart_cpus().
|
#
150647 |
|
27-Sep-2005 |
peter |
Kill pcb_rflags. It served no purpose.
Reported by: bde
|
#
147889 |
|
10-Jul-2005 |
davidxu |
Validate if the value written into {FS,GS}.base is a canonical address, writting non-canonical address can cause kernel a panic, by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring only canonical values get written to the registers.
Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com > Approved by: re (scottl)
|
#
147566 |
|
23-Jun-2005 |
peter |
MFi386: 1.258: Minor cleanups
Approved by: re (blanket i386<->amd64 sync)
|
#
145433 |
|
23-Apr-2005 |
davidxu |
Change cpu_set_kse_upcall to more generic style, so we can reuse it in other codes. Add cpu_set_user_tls, use it to tweak user register and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall, but since cpu_set_kse_upcall is also used by M:N threads which may not need this feature, so I wrote a separated cpu_set_user_tls.
|
#
144637 |
|
04-Apr-2005 |
jhb |
Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case.
Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch.
This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example).
Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
|
#
140554 |
|
21-Jan-2005 |
peter |
MFi386: handle PSL_T properly across fork. Typo fix.
|
#
140257 |
|
14-Jan-2005 |
jhb |
Remove redundant code to drop per-thread debug register state from cpu_exit() as this is already performed in cpu_thread_exit() and the debug state is per-thread rather than per-process.
|
#
139345 |
|
27-Dec-2004 |
njl |
MFi386: Restore cpu_reset proxy code to enable reset from ddb on an AP.
|
#
139344 |
|
27-Dec-2004 |
njl |
Reduce diffs to i386.
|
#
138237 |
|
30-Nov-2004 |
peter |
Remove unused cnt variable for the SMP case. Trim some excessive blank lines while here.
|
#
138207 |
|
29-Nov-2004 |
peter |
Take advantage of the shutdown processing being wired to the BSP and eliminate the evil cpu_reset_proxy code now that it will never be activated. i386 should pick this up as well.
|
#
138129 |
|
27-Nov-2004 |
das |
Don't include sys/user.h merely for its side-effect of recursively including other headers.
|
#
133902 |
|
16-Aug-2004 |
peter |
Sync with i386 - set rbp reg to 0 for upcalls as a frame marker, not that it is guaranteed to be used in userland though.
|
#
133529 |
|
11-Aug-2004 |
davidxu |
Mark end of frames.
|
#
129750 |
|
26-May-2004 |
tmm |
Retire cpu_sched_exit(); it is not used any more.
|
#
128384 |
|
18-Apr-2004 |
alc |
Simplify the sf_buf implementation. In short, make it a trivial veneer over the direct virtual-to-physical mapping.
|
#
128100 |
|
11-Apr-2004 |
alc |
- is_physical_memory()'s parameter, which is a physical address, should be a vm_paddr_t not a vm_offset_t.
|
#
127788 |
|
03-Apr-2004 |
alc |
In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it should not. Add a new parameter so that the caller can specify which is the case.
Reported by: dillon
|
#
127582 |
|
29-Mar-2004 |
peter |
Finish tidying up a couple of leftovers from the KSTACK_PAGES stuff. Some files still #included the opt_ file. powerpc hadn't been updated yet.
|
#
127392 |
|
25-Mar-2004 |
peter |
MFi386: correctly calculate the top-of-stack when a kthread is created with a larger kernel stack.
|
#
127086 |
|
16-Mar-2004 |
alc |
Refactor the existing machine-dependent sf_buf_free() into a machine- dependent function by the same name and a machine-independent function, sf_buf_mext(). Aside from the virtue of making more of the code machine- independent, this change also makes the interface more logical. Before, sf_buf_free() did more than simply undo an sf_buf_alloc(); it also unwired and if necessary freed the page. That is now the purpose of sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used as a general-purpose emphemeral map cache.
|
#
125180 |
|
28-Jan-2004 |
peter |
deal with dbregs for fork etc update for fpu.c simplification Merge #include sort from i386
|
#
123929 |
|
28-Dec-2003 |
silby |
Track three new sendfile-related statistics: - The number of times sendfile had to do disk I/O - The number of times sfbuf allocation failed - The number of times sfbuf allocation had to wait
|
#
123920 |
|
27-Dec-2003 |
silby |
Move the declaration of sfbufspeak and sfbufsused to mbuf.h, and use imax instead of max, as sfbufspeak and sfbufsused are signed.
Submitted by: bde
|
#
123884 |
|
27-Dec-2003 |
silby |
Track current and peak sfbuf usage, export the values via sysctl.
|
#
122940 |
|
21-Nov-2003 |
peter |
Cosmetic and/or trivial sync up with i386.
Approved by: re (rwatson)
|
#
122849 |
|
17-Nov-2003 |
peter |
Initial landing of SMP support for FreeBSD/amd64.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.
|
#
122821 |
|
16-Nov-2003 |
alc |
- Remove unnecessary synchronization from sf_buf_init(). (There is only one active CPU when sf_buf_init() is performed.)
|
#
122780 |
|
16-Nov-2003 |
alc |
- Modify alpha's sf_buf implementation to use the direct virtual-to- physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.
|
#
122292 |
|
08-Nov-2003 |
peter |
The great s/npx/fpu/gi
|
#
122278 |
|
08-Nov-2003 |
peter |
Rename npx* to fpu*. I haven't done the flags/function names yet.
|
#
121751 |
|
30-Oct-2003 |
peter |
MFi386: thread specific fpu state optimizations
|
#
119563 |
|
29-Aug-2003 |
alc |
Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy sockets into machine-dependent files. The rationale for this migration is illustrated by the modified amd64 allocator. It uses the amd64's direct map to avoid emphemeral mappings in the kernel's address space. On an SMP, the emphemeral mappings result in an IPI for TLB shootdown for each transmitted page. Yuck.
Maintainers of other 64-bit platforms with direct maps should be able to use the amd64 allocator as a reference implementation.
|
#
119004 |
|
16-Aug-2003 |
marcel |
In vm_thread_swap{in|out}(), remove the alpha specific conditional compilation and replace it with a call to cpu_thread_swap{in|out}(). This allows us to add similar code on ia64 without cluttering the code even more.
|
#
118156 |
|
29-Jul-2003 |
davidxu |
Use PSL_KERNEL as upcall thread's initial rflags, don't use scratch user rflags.
|
#
118031 |
|
25-Jul-2003 |
obrien |
Use __FBSDID().
Brought to you by: a boring talk at Ottawa Linux Symposium
|
#
117985 |
|
24-Jul-2003 |
davidxu |
Align upcall stack top to odd times of 8. GCC accounts return address in callee function for stack alignment.
|
#
117962 |
|
24-Jul-2003 |
davidxu |
Implement cpu_set_upcall and cpu_set_upcall_kse.
Reviewed by: peter
|
#
116188 |
|
11-Jun-2003 |
peter |
GC unused cpu_wait() function
|
#
115861 |
|
04-Jun-2003 |
marcel |
Change the second (and last) argument of cpu_set_upcall(). Previously we were passing in a void* representing the PCB of the parent thread. Now we pass a pointer to the parent thread itself. The prime reason for this change is to allow cpu_set_upcall() to copy (parts of) the trapframe instead of having it done in MI code in each caller of cpu_set_upcall(). Copying the trapframe cannot always be done with a simply bcopy() or may not always be optimal that way. On ia64 specifically the trapframe contains information that is specific to an entry into the kernel and can only be used by the corresponding exit from the kernel. A trapframe copied verbatim from another frame is in most cases useless without some additional normalization.
Note that this change removes the assignment to td->td_frame in some implementations of cpu_set_upcall(). The assignment is redundant. A previous call to cpu_thread_setup() already did the exact same assignment. An added benefit of removing the redundant assignment is that we can now change td_pcb without nasty side-effects.
This change officially marks the ability on ia64 for 1:1 threading.
Not tested on: amd64, powerpc Compile & boot tested on: alpha, sparc64 Functionally tested on: i386, ia64
|
#
115251 |
|
23-May-2003 |
peter |
Major pmap rework to take advantage of the larger address space on amd64 systems. Of note: - Implement a direct mapped region using 2MB pages. This eliminates the need for temporary mappings when getting ptes. This supports up to 512GB of physical memory for now. This should be enough for a while. - Implement a 4-tier page table system. Most of the infrastructure is there for 128TB of userland virtual address space, but only 512GB is presently enabled due to a mystery bug somewhere. The design of this was heavily inspired by the alpha pmap.c. - The kernel is moved into the negative address space(!). - The kernel has 2GB of KVM available. - Provide a uma memory allocator to use the direct map region to take advantage of the 2MB TLBs. - Fixed some assumptions in the bus_space macros about the ability to fit virtual addresses in an 'int'.
Notable missing things: - pmap_growkernel() should be able to grow to 512GB of KVM by expanding downwards below kernbase. The kernel must be at the top 2GB of the negative address space because of gcc code generation strategies. - need to fix the >512GB user vm code.
Approved by: re (blanket)
|
#
114987 |
|
14-May-2003 |
peter |
Add BASIC i386 binary support for the amd64 kernel. This is largely stolen from the ia64/ia32 code (indeed there was a repocopy), but I've redone the MD parts and added and fixed a few essential syscalls. It is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic) and p4. The ia64 code has not implemented signal delivery, so I had to do that.
Before you say it, yes, this does need to go in a common place. But we're in a freeze at the moment and I didn't want to risk breaking ia64. I will sort this out after the freeze so that the common code is in a common place.
On the AMD64 side, this required adding segment selector context switch support and some other support infrastructure. The %fs/%gs etc code is hairy because loading %gs will clobber the kernel's current MSR_GSBASE setting. The segment selectors are not used by the kernel, so they're only changed at context switch time or when changing modes. This still needs to be optimized.
Approved by: re (amd64/* blanket)
|
#
114349 |
|
30-Apr-2003 |
peter |
Commit MD parts of a loosely functional AMD64 port. This is based on a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to attempt to get a stable base to start from. There is a lot missing still. Worth noting: - The kernel runs at 1GB in order to cheat with the pmap code. pmap uses a variation of the PAE code in order to avoid having to worry about 4 levels of page tables yet. - It boots in 64 bit "long mode" with a tiny trampoline embedded in the i386 loader. This simplifies locore.s greatly. - There are still quite a few fragments of i386-specific code that have not been translated yet, and some that I cheated and wrote dumb C versions of (bcopy etc). - It has both int 0x80 for syscalls (but using registers for argument passing, as is native on the amd64 ABI), and the 'syscall' instruction for syscalls. int 0x80 preserves all registers, 'syscall' does not. - I have tried to minimize looking at the NetBSD code, except in a couple of places (eg: to find which register they use to replace the trashed %rcx register in the syscall instruction). As a result, there is not a lot of similarity. I did look at NetBSD a few times while debugging to get some ideas about what I might have done wrong in my first attempt.
|
#
113796 |
|
21-Apr-2003 |
davidxu |
Reset pcb_gs and %gs before possibly invalidating it.
|
#
113364 |
|
11-Apr-2003 |
davidxu |
Copy %gs from current CPU not from a stale PCB backup.
|
#
112841 |
|
30-Mar-2003 |
jake |
- Add support for PAE and more than 4 gigs of ram on x86, dependent on the kernel opition 'options PAE'. This will only work with device drivers which either use busdma, or are able to handle 64 bit physical addresses.
Thanks to Lanny Baron from FreeBSD Systems for the loan of a test machine with 6 gigs of ram.
Sponsored by: DARPA, Network Associates Laboratories, FreeBSD Systems
|
#
112569 |
|
24-Mar-2003 |
jake |
- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long.
Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms.
Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)
|
#
111363 |
|
23-Feb-2003 |
jake |
- Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the page directory. - Use these instead of the magic constants 1 or PAGE_SIZE where appropriate. There are still numerous assumptions that the page directory is exactly 1 page.
Sponsored by: DARPA, Network Associates Laboratories
|
#
111028 |
|
17-Feb-2003 |
jeff |
- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code.
Submitted by: davidxu
|
#
110190 |
|
01-Feb-2003 |
julian |
Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but them's the rules..
I am using my "David's mentor" hat to revert this as he's offline for a while.
|
#
109877 |
|
26-Jan-2003 |
davidxu |
Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone.
A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary.
Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created.
Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed.
KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another.
When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides.
The code hasn't been tested under SMP by author due to lack of hardware.
Reviewed by: julian
|
#
109342 |
|
15-Jan-2003 |
dillon |
Merge all the various copies of vm_fault_quick() into a single portable copy.
|
#
109340 |
|
15-Jan-2003 |
dillon |
Merge all the various copies of vmapbuf() and vunmapbuf() into a single portable copy. Note that pmap_extract() must be used instead of pmap_kextract().
This is precursor work to a reorganization of vmapbuf() to close remaining user/kernel races (which can lead to a panic).
|
#
107719 |
|
10-Dec-2002 |
julian |
Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage during the context switch. Rearrange thread cleanups to avoid problems with Giant. Clean threads when freed or when recycled.
Approved by: re (jhb)
|
#
107212 |
|
24-Nov-2002 |
alc |
Add page queues locking to vunmapbuf(); reduce differences with respect to the sparc64 implementation. (Note: With modest effort on the alpha and ia64 this function could migrate to the MI part of the kernel.)
Approved by: re (blanket)
|
#
107180 |
|
22-Nov-2002 |
mux |
Under certain circumstances, we were calling kmem_free() from i386 cpu_thread_exit(). This resulted in a panic with WITNESS since we need to hold Giant to call kmem_free(), and we weren't helding it anymore in cpu_thread_exit(). We now do this from a new MD function, cpu_thread_dtor(), called by thread_dtor().
Approved by: re@ Suggested by: jhb
|
#
104695 |
|
09-Oct-2002 |
julian |
Round out the facilty for a 'bound' thread to loan out its KSE in specific situations. The owner thread must be blocked, and the borrower can not proceed back to user space with the borrowed KSE. The borrower will return the KSE on the next context switch where teh owner wants it back. This removes a lot of possible race conditions and deadlocks. It is consceivable that the borrower should inherit the priority of the owner too. that's another discussion and would be simple to do.
Also, as part of this, the "preallocatd spare thread" is attached to the thread doing a syscall rather than the KSE. This removes the need to lock the scheduler when we want to access it, as it's now "at hand".
DDB now shows a lot mor info for threaded proceses though it may need some optimisation to squeeze it all back into 80 chars again. (possible JKH project)
Upcalls are now "bound" threads, but "KSE Lending" now means that other completing syscalls can be completed using that KSE before the upcall finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is one of the highest priorities in the KSE system.) The upcall when it happens will present all the completed syscalls to the KSE for selection.
|
#
103407 |
|
16-Sep-2002 |
mini |
Add kernel support needed for the KSE-aware libpthread: - Maintain fpu state across signals. - Use ucontext_t's to store KSE thread state. - Synthesize state for the UTS upon each upcall, rather than saving and copying a trapframe. - Save and restore FPU state properly in ucontext_t's.
Reviewed by: deischen, julian Approved by: -arch
|
#
103049 |
|
06-Sep-2002 |
peter |
Zap the implementations of the i386-aout specific cpu_coredump function. Most of the non-i386 platforms had rather broken implementations anyway.
|
#
101941 |
|
15-Aug-2002 |
rwatson |
In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what:
- Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c.
For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics:
- badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred
Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics.
Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED.
These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations.
Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
99072 |
|
29-Jun-2002 |
julian |
Part 1 of KSE-III
The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
98765 |
|
24-Jun-2002 |
jake |
Add an MD callout like cpu_exit, but which is called after sched_lock is obtained, when all other scheduling activity is suspended. This is needed on sparc64 to deactivate the vmspace of the exiting process on all cpus. Otherwise if another unrelated process gets the exact same vmspace structure allocated to it (same address), its address space will not be activated properly. This seems to fix some spontaneous signal 11 problems with smp on sparc64.
|
#
93264 |
|
27-Mar-2002 |
dillon |
Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt disablement assumptions in kern_fork.c by adding another API call, cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h to <arch>/<arch>/critical.c (stage-2 will clean this up).
Implement interrupt deferral for i386 that allows interrupts to remain enabled inside critical sections. This also fixes an IPI interlock bug, and requires uses of icu_lock to be enclosed in a true interrupt disablement.
This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized, and will move cpu_critical*() into its own header file(s) + other things. This commit may break non-i386 architectures in trivial ways. This should be temporary.
Reviewed by: core Approved by: core
|
#
92890 |
|
21-Mar-2002 |
alc |
o Use the MI vm_map_growstack() instead of grow_stack() in trap_pfault() and trapwrite(). o On i386/pc98, remove the (now) unused grow_stack().
|
#
92860 |
|
21-Mar-2002 |
imp |
Fix abuses of cpu_critical_{enter,exit} by converting to intr_{disable,restore} as well as providing an implemenation of intr_{disable,restore}.
Reviewed by: jake, rwatson, jhb
|
#
92770 |
|
20-Mar-2002 |
alfred |
Remove __P.
|
#
91328 |
|
26-Feb-2002 |
dillon |
revert last commit temporarily due to whining on the lists.
|
#
91315 |
|
26-Feb-2002 |
dillon |
STAGE-1 of 3 commit - allow (but do not require) interrupts to remain enabled in critical sections and streamline critical_enter() and critical_exit().
This commit allows an architecture to leave interrupts enabled inside critical sections if it so wishes. Architectures that do not wish to do this are not effected by this change.
This commit implements the feature for the I386 architecture and provides a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For now you can turn the sysctl on and off at any time in order to test the architectural changes or track down bugs.
This commit is just the first stage. Some areas of the code, specifically the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will be cleaned up in the STAGE-2 commit when the critical_*() functions are moved entirely into MD files.
The following changes have been made:
* critical_enter() and critical_exit() for I386 now simply increment and decrement curthread->td_critnest. They no longer disable hard interrupts. When critical_exit() decrements the counter to 0 it effectively calls a routine to deal with whatever interrupts were deferred during the time the code was operating in a critical section.
Other architectures are unaffected.
* fork_exit() has been conditionalized to remove MD assumptions for the new code. Old code will still use the old MD assumptions in regards to hard interrupt disablement. In STAGE-2 this will be turned into a subroutine call into MD code rather then hardcoded in MI code.
The new code places the burden of entering the critical section in the trampoline code where it belongs.
* I386: interrupts are now enabled while we are in a critical section. The interrupt vector code has been adjusted to deal with the fact. If it detects that we are in a critical section it currently defers the interrupt by adding the appropriate bit to an interrupt mask.
* In order to accomplish the deferral, icu_lock is required. This is i386-specific. Thus icu_lock can only be obtained by mainline i386 code while interrupts are hard disabled. This change has been made.
* Because interrupts may or may not be hard disabled during a context switch, cpu_switch() can no longer simply assume that PSL_I will be in a consistent state. Therefore, it now saves and restores eflags.
* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred. The intention is to eventually allow them to operate either while we are in a critical section or, if we are able to restrict the use of sched_lock, while we are not holding the sched_lock.
* ICU and APIC vector assembly for I386 cleaned up. The ICU code has been cleaned up to match the APIC code in regards to format and macro availability. Additionally, the code has been adjusted to deal with deferred interrupts.
* Deferred interrupts use a per-cpu boolean int_pending, and masks ipending, spending, and fpending. Being per-cpu variables it is not currently necessary to lock; bus cycles modifying them.
Note that the same mechanism will enable preemption to be incorporated as a true software interrupt without having to further hack up the critical nesting code.
* Note: the old critical_enter() code in kern/kern_switch.c is currently #ifdef to be compatible with both the old and new methodology. In STAGE-2 it will be moved entirely to MD code.
Performance issues:
One of the purposes of this commit is to enhance critical section performance, specifically to greatly reduce bus overhead to allow the critical section code to be used to protect per-cpu caches. These caches, such as Jeff's slab allocator work, can potentially operate very quickly making the effective savings of the new critical section code's performance very significant.
The second purpose of this commit is to allow architectures to enable certain interrupts while in a critical section. Specifically, the intention is to eventually allow certain FAST interrupts to operate rather then defer.
The third purpose of this commit is to begin to clean up the critical_enter()/critical_exit()/cpu_critical_enter()/ cpu_critical_exit() API which currently has serious cross pollution in MI code (in fork_exit() and ast() for example).
The fourth purpose of this commit is to provide a framework that allows kernel-preempting software interrupts to be implemented cleanly. This is currently used for two forward interrupts in I386. Other architectures will have the choice of using this infrastructure or building the functionality directly into critical_enter()/ critical_exit().
Finally, this commit is designed to greatly improve the flexibility of various architectures to manage critical section handling, software interrupts, preemption, and other highly integrated architecture-specific details.
|
#
90562 |
|
12-Feb-2002 |
alc |
Remove an unused (but initialized) variable from vmapbuf().
|
#
90361 |
|
07-Feb-2002 |
julian |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
#
88146 |
|
18-Dec-2001 |
julian |
In a couple of places, we recalculated addresses we already had in local pointer variables.
|
#
88088 |
|
17-Dec-2001 |
jhb |
Modify the critical section API as follows: - The MD functions critical_enter/exit are renamed to start with a cpu_ prefix. - MI wrapper functions critical_enter/exit maintain a per-thread nesting count and a per-thread critical section saved state set when entering a critical section while at nesting level 0 and restored when exiting to nesting level 0. This moves the saved state out of spin mutexes so that interlocking spin mutexes works properly. - Most low-level MD code that used critical_enter/exit now use cpu_critical_enter/exit. MI code such as device drivers and spin mutexes use the MI wrappers. Note that since the MI wrappers store the state in the current thread, they do not have any return values or arguments. - mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is assigned to curthread->td_savecrit during fork_exit().
Tested on: i386, alpha
|
#
87702 |
|
11-Dec-2001 |
jhb |
Overhaul the per-CPU support a bit:
- The MI portions of struct globaldata have been consolidated into a MI struct pcpu. The MD per-CPU data are specified via a macro defined in machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the interface would be cleaner (PCPU_GET(my_md_field) vs. PCPU_GET(md.md_my_md_field)). - All references to globaldata are changed to pcpu instead. In a UP kernel, this data was stored as global variables which is where the original name came from. In an SMP world this data is per-CPU and ideally private to each CPU outside of the context of debuggers. This also included combining machine/globaldata.h and machine/globals.h into machine/pcpu.h. - The pointer to the thread using the FPU on i386 was renamed from npxthread to fpcurthread to be identical with other architectures. - Make the show pcpu ddb command MI with a MD callout to display MD fields. - The globaldata_register() function was renamed to pcpu_init() and now init's MI fields of a struct pcpu in addition to registering it with the internal array and list. - A pcpu_destroy() function was added to remove a struct pcpu from the internal array and list.
Tested on: alpha, i386 Reviewed by: peter, jake
|
#
85449 |
|
24-Oct-2001 |
jhb |
Split the per-process Local Descriptor Table out of the PCB and into struct mdproc.
Submitted by: Andrew R. Reiter <arr@watson.org> Silence on: -current
|
#
84935 |
|
14-Oct-2001 |
tegge |
Change vmapbuf() to use pmap_qenter() and vunmapbuf() to use pmap_qremove().
This significantly reduces the number of TLB shootdowns caused by vmapbuf/vunmapbuf when performing many large reads from raw disk devices.
Reviewed by: dillon
|
#
84381 |
|
02-Oct-2001 |
mjacob |
Fix problem where a user buffer outside of the area being tested will be corrupted.
PR: 29194 Obtained from: Tor.Egge@fast.no MFC after: 2 weeks
|
#
83660 |
|
19-Sep-2001 |
peter |
Reserve an extra 16 bytes in case we have to grow the trapframe into a vm86trapframe for switching to vm86 [unlikely] while exiting. I lost this when doing the pcb move that went in with the KSE commit.
Reviewed by: jake
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
83276 |
|
10-Sep-2001 |
peter |
Rip some well duplicated code out of cpu_wait() and cpu_exit() and move it to the MI area. KSE touched cpu_wait() which had the same change replicated five ways for each platform. Now it can just do it once. The only MD parts seemed to be dealing with fpu state cleanup and things like vm86 cleanup on x86. The rest was identical.
XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional stub in place.
Reviewed by: jake, tmm, dillon
|
#
83223 |
|
08-Sep-2001 |
peter |
Missing part of dillon's coredump commit. cpu_coredump() was still passing IO_NODELOCKED to vn_rdwr(), this would cause operations on the unlocked core vnode and softupdates nastiness if an a.out binary cored.
|
#
82938 |
|
04-Sep-2001 |
peter |
Nuke #if 0'ed "setredzone()" stub. We never used it, and probably never will. I've implemented an optional redzone as part of the KSE upage breakup.
|
#
82309 |
|
25-Aug-2001 |
peter |
Optionize UPAGES for the i386. As part of this I split some of the low level implementation stuff out of machine/globaldata.h to avoid exposing UPAGES to lots more places. The end result is that we can double the kernel stack size with 'options UPAGES=4' etc.
This is mainly being done for the benefit of a MFC to RELENG_4 at some point. -current doesn't really need this so much since each interrupt runs on its own kstack.
|
#
79609 |
|
12-Jul-2001 |
peter |
Activate SSE/SIMD. This is the extra context switching support that we are required to do if we let user processes use the extra 128 bit registers etc.
This is the base part of the diff I got from: http://www.issei.org/issei/FreeBSD/sse.html I believe this is by: Mr. SUZUKI Issei <issei@issei.org> SMP support apparently by: Takekazu KATO <kato@chino.it.okayama-u.ac.jp> Test code by: NAKAMURA Kazushi <kaz@kobe1995.net>, see http://kobe1995.net/~kaz/FreeBSD/SSE.en.html
I have fixed a couple of style(9) deviations. I have some followup commits to fix a couple of non-style things.
|
#
79265 |
|
04-Jul-2001 |
dillon |
Move vm_page_zero_idle() from machine-dependant sections to a machine-independant source file, vm/vm_zeroidle.c. It was exactly the same for all platforms and updating them all was getting annoying.
|
#
79263 |
|
04-Jul-2001 |
dillon |
Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc). Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system.
vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)
|
#
79224 |
|
04-Jul-2001 |
dillon |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
#
79123 |
|
03-Jul-2001 |
jhb |
Allow Giant to be recursed when a process terminates.
|
#
78962 |
|
29-Jun-2001 |
jhb |
Add a new MI pointer to the process' trapframe p_frame instead of using various differently named pointers buried under p_md.
Reviewed by: jake (in principle)
|
#
77486 |
|
30-May-2001 |
jhb |
We can't grab the sched_lock in set_user_ldt() because when it is called from cpu_switch(), curproc has been changed, but the sched_lock owner will not be updated until we return to mi_switch(), thus we deadlock against ourselves. As a workaround, push the acquire and release of sched_lock out to the callers of set_user_ldt(). Note that we can't use a mtx_assert() in set_user_ldt for the same reason.
Sleuting by: tmm Tested by: tmm, dougb
|
#
76947 |
|
21-May-2001 |
jhb |
Remove a few more spl's I missed earlier.
Reported by: Michael Harnois <mdharnois@home.com> Pointy hat: me
|
#
76939 |
|
21-May-2001 |
jhb |
Axe unneeded spl()'s.
|
#
76827 |
|
18-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
#
76546 |
|
13-May-2001 |
bde |
Use a critical region to protect pushing of the parent's npx state to the pcb for fork(). It was possible for the state to be saved twice when an interrupt handler saved it concurrently. This corrupted (reset) the state because fnsave has the (in)convenient side effect of doing an implicit fninit. Mundane null pointer bugs were not possible, because we save to an "arbitrary" process's pcb and not to the "right" place (npxproc).
Push the parent's %gs to the pcb for fork(). Changes to %gs before fork() were not preserved in the child unless an accidental context switch did the pushing. Updated the list of pcb contents which is supposed to inhibit bugs like this. pcb_dr*, pcb_gs and pcb_ext were missing. Copying is correct for pcb_dr*, and pcb_ext is already handled specially (although XXX'ly).
Reducing the savectx() call to an npxsave() call in rev.1.80 was a mistake. The above bugs are duplicated in many places, including in savectx() itself.
The arbitraryness of the parent process pointer for the fork() subroutines, the pcb pointer for savectx(), and the save87 pointer for npxsave(), is illusory. These functions don't work "right" unless the pointers are precisely curproc, curpcb, and the address of npxproc's save87 area, respectively, although the special context in which they are called allows savectx(&dumppcb) to sort of work and npxsave(&dummy) to work. cpu_fork() just doesn't work unless the parent process pointer is curproc, or the caller has pushed %gs to the pcb, or %gs happens to already be in the pcb.
|
#
76434 |
|
10-May-2001 |
jhb |
- Use sched_lock and critical regions to ensure that LDT updates are thread safe from preemption and concurrent access to the LDT. - Move the prototype for i386_extend_pcb() to <machine/pcb_ext.h>.
Reviewed by: silence on -hackers
|
#
76078 |
|
27-Apr-2001 |
jhb |
Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the *_process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the *_process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the *_process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the *_process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter Looked over by: eivind
|
#
73922 |
|
07-Mar-2001 |
jhb |
Use the proc lock to protect p_pptr when waking up our parent in cpu_exit() and remove the mpfixme() message that is now fixed.
|
#
72930 |
|
22-Feb-2001 |
peter |
Activate USER_LDT by default. The new thread libraries are going to depend on this. The linux ABI emulator tries to use it for some linux binaries too. VM86 had a bigger cost than this and it was made default a while ago.
Reviewed by: jhb, imp
|
#
72746 |
|
20-Feb-2001 |
jhb |
- Don't call clear_resched() in userret(), instead, clear the resched flag in mi_switch() just before calling cpu_switch() so that the first switch after a resched request will satisfy the request. - While I'm at it, move a few things into mi_switch() and out of cpu_switch(), specifically set the p_oncpu and p_lastcpu members of proc in mi_switch(), and handle the sched_lock state change across a context switch in mi_switch(). - Since cpu_switch() no longer handles the sched_lock state change, we have to setup an initial state for sched_lock in fork_exit() before we release it.
|
#
72200 |
|
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
#
71785 |
|
29-Jan-2001 |
peter |
Send "#if NISA > 0" to the bit-bucket and replace it with an option. These were compile-time "is the isa code present?" tests and not 'how many isa busses' tests.
|
#
71528 |
|
24-Jan-2001 |
jhb |
Setup the return values for a child process in the trapframe when we setup the rest of the trapframe instead of doing it in fork_return().
|
#
71257 |
|
19-Jan-2001 |
peter |
Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h
|
#
70861 |
|
10-Jan-2001 |
jake |
Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
|
#
70317 |
|
23-Dec-2000 |
jake |
Protect proc.p_pptr and proc.p_children/p_sibling with the proctree_lock.
linprocfs not locked pending response from informal maintainer.
Reviewed by: jhb, -smp@
|
#
69781 |
|
08-Dec-2000 |
dwmalone |
Convert more malloc+bzero to malloc+M_ZERO.
Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
|
#
69779 |
|
08-Dec-2000 |
jake |
Revert the previous change I made to cpu_switch. It doesn't help as much as I thought it would and according to bde was a pessimization.
|
#
69536 |
|
02-Dec-2000 |
jake |
Change cpu_switch to explicitly popl the callers program counter and pushl that of the new process, rather than doing a movl (%esp) and assuming that the stack has been setup right. This make the initial stack setup slightly more sane, and will make it easier to stick an interrupted process onto the run queue without its knowing.
|
#
67551 |
|
25-Oct-2000 |
jhb |
- Overhaul the software interrupt code to use interrupt threads for each type of software interrupt. Roughly, what used to be a bit in spending now maps to a swi thread. Each thread can have multiple handlers, just like a hardware interrupt thread. - Instead of using a bitmask of pending interrupts, we schedule the specific software interrupt thread to run, so spending, NSWI, and the shandlers array are no longer needed. We can now have an arbitrary number of software interrupt threads. When you register a software interrupt thread via sinthand_add(), you get back a struct intrhand that you pass to sched_swi() when you wish to schedule your swi thread to run. - Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit more intuitive. Also, prefix all the members of struct intrhand with 'ih_'. - Make swi_net() a MI function since there is now no point in it being MD.
Submitted by: cp
|
#
67477 |
|
23-Oct-2000 |
jhb |
Don't dink with interrupts in vm_page_zero_idle(). This code assumed it was being called with interrupts disabled, when it was actually being called with them enabled.
Pointed out by: tegge
|
#
67360 |
|
20-Oct-2000 |
jhb |
- machine/mutex.h -> sys/mutex.h - Use cpu_throw() instead of cpu_switch() during cpu_exit() since we don't need to save our previous state.
|
#
67164 |
|
15-Oct-2000 |
phk |
Remove unneeded #include <machine/clock.h>
|
#
65557 |
|
06-Sep-2000 |
jasone |
Major update to the way synchronization is done in the kernel. Highlights include:
* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.)
* Per-CPU idle processes.
* Interrupts are run in their own separate kernel threads and can be preempted (i386 only).
Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
|
#
64529 |
|
11-Aug-2000 |
peter |
Clean up some low level bootstrap code:
- stop using the evil 'struct trapframe' argument for mi_startup() (formerly main()). There are much better ways of doing it. - do not use prepare_usermode() - setregs() in execve() will do it all for us as long as the p_md.md_regs pointer is set. (which is now done in machdep.c rather than init_main.c. The Alpha port did it this way all along and is much cleaner). - collect all the magic %cr0 etc register settings into one place and have the AP's call that instead of using magic numbers (!!) that keep changing over and over again. - Make it safe to call kthread_create() earlier, including during the device probe sequence. It doesn't need the callback mechanism that NetBSD's version uses. - kthreads created this way are root-less as they exist before the root filesystem is mounted. init(1) is set up so that it aquires the root pointers prior to running. If other kthreads want filesystem acccess we can make this code more generic. - set all threads start times once we have decided what time it is. - init uses a trampoline rather than the evil prepare_usermode() hack. - kern_descrip.c has a couple of tweaks to deal with forking when there is no rootdir or cwd etc. - adjust the early SYSINIT() sequence so that a few prereqisites are in place. eg: make sure the run queue is initialized before doing forks.
With this, the USB code can easily create a kthread to do the device tree discovery. (I have tested it, it works nicely).
There are still some open issues before this is truely useful. - tsleep() does not like working before the clock is running. It sort-of tries to spin wait, but it can do more useful things now. - stopping a kthread in kld code at unload time is "interesting" but we have a solution for that.
The Alpha code needs no changes for this. It already uses pretty much the same strategies, but a little cleaner.
|
#
61469 |
|
10-Jun-2000 |
peter |
Add option BROKEN_KEYBOARD_RESET to an opt_*.h file
|
#
60041 |
|
05-May-2000 |
phk |
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
|
#
58717 |
|
28-Mar-2000 |
dillon |
Commit major SMP cleanups and move the BGL (big giant lock) in the syscall path inward. A system call may select whether it needs the MP lock or not (the default being that it does need it).
A great deal of conditional SMP code for various deadended experiments has been removed. 'cil' and 'cml' have been removed entirely, and the locking around the cpl has been removed. The conditional separately-locked fast-interrupt code has been removed, meaning that interrupts must hold the CPL now (but they pretty much had to anyway). Another reason for doing this is that the original separate-lock for interrupts just doesn't apply to the interrupt thread mechanism being contemplated.
Modifications to the cpl may now ONLY occur while holding the MP lock. For example, if an otherwise MP safe syscall needs to mess with the cpl, it must hold the MP lock for the duration and must (as usual) save/restore the cpl in a nested fashion.
This is precursor work for the real meat coming later: avoiding having to hold the MP lock for common syscalls and I/O's and interrupt threads. It is expected that the spl mechanisms and new interrupt threading mechanisms will be able to run in tandem, allowing a slow piecemeal transition to occur.
This patch should result in a moderate performance improvement due to the considerable amount of code that has been removed from the critical path, especially the simplification of the spl*() calls. The real performance gains will come later.
Approved by: jkh Reviewed by: current, bde (exception.s) Some work taken from: luoqi's patch
|
#
58345 |
|
20-Mar-2000 |
phk |
Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
|
#
57362 |
|
20-Feb-2000 |
bsd |
Don't forget to reset the hardware debug registers when a process that was using them exits.
Don't allow a user process to cause the kernel to take a TRCTRAP on a user space address.
Reviewed by: jlemon, sef Approved by: jkh
|
#
54188 |
|
06-Dec-1999 |
luoqi |
User ldt sharing.
|
#
52647 |
|
30-Oct-1999 |
alc |
The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to eliminate an extra (useless) level of indirection in half of the page queue accesses and (2) to use a single name for each queue throughout, instead of, e.g., "vm_page_queue_active" in some places and "vm_page_queues[PQ_ACTIVE]" in others.
Reviewed by: dillon
|
#
52635 |
|
29-Oct-1999 |
phk |
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs.
This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
#
52121 |
|
11-Oct-1999 |
peter |
Zap unneeded #includes
Submitted by: phk
|
#
51474 |
|
20-Sep-1999 |
dillon |
Fix bug in pipe code relating to writes of mmap'd but illegal address spaces which cross a segment boundry in the page table. pmap_kextract() is not designed for access to the user space portion of the page table and cannot handle the null-page-directory-entry case.
The fix is to have vm_fault_quick() return a success or failure which is then used to avoid calling pmap_kextract().
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
49326 |
|
31-Jul-1999 |
alc |
Change the type of vpgqueues::lcnt from "int *" to "int". The indirection served no purpose.
|
#
48974 |
|
22-Jul-1999 |
alc |
Reduce the number of "magic constants" used for page coloring by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same thing at different places in the kernel. Drop PQ_PRIME3.
|
#
48391 |
|
01-Jul-1999 |
peter |
Slight reorganization of kernel thread/process creation. Instead of using SYSINIT_KT() etc (which is a static, compile-time procedure), use a NetBSD-style kthread_create() interface. kproc_start is still available as a SYSINIT() hook. This allowed simplification of chunks of the sysinit code in the process. This kthread_create() is our old kproc_start internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work the same as the NetBSD one.
One thing I'd like to do shortly is get rid of nfsiod as a user initiated process. It makes sense for the nfs client code to create them on the fly as needed up to a user settable limit. This means that nfsiod doesn't need to be in /sbin and is always "available". This is a fair bit easier to do outside of the SYSINIT_KT() framework.
|
#
47678 |
|
01-Jun-1999 |
jlemon |
Unifdef VM86.
Reviewed by: silence on on -current
|
#
45821 |
|
19-Apr-1999 |
peter |
unifdef -DVM_STACK - it's been on for a while for x86 and was checked and appeared to be working for the Alpha some time ago.
|
#
44146 |
|
19-Feb-1999 |
luoqi |
Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
|
#
44078 |
|
16-Feb-1999 |
dfr |
* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
|
#
43758 |
|
08-Feb-1999 |
dillon |
Adjust idle zero-page fill hysteresis based on tests. Use 2/3 and 4/5 zero-fill levels.
Adjust comment for ozfod in vmmeter.h - this counter represents non-optimal ( on the fly ) zero fills, not prefills.
|
#
43752 |
|
07-Feb-1999 |
dillon |
Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with PQ_FREE. There is little operational difference other then the kernel being a few kilobytes smaller and the code being more readable.
* vm_page_select_free() has been *greatly* simplified. * The PQ_ZERO page queue and supporting structures have been removed * vm_page_zero_idle() revamped (see below)
PG_ZERO setting and clearing has been migrated from vm_page_alloc() to vm_page_free[_zero]() and will eventually be guarenteed to remain tracked throughout a page's life ( if it isn't already ).
When a page is freed, PG_ZERO pages are appended to the appropriate tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended. When locating a new free page, PG_ZERO selection operates from within vm_page_list_find() ( get page from end of queue instead of beginning of queue ) and then only occurs in the nominal critical path case. If the nominal case misses, both normal and zero-page allocation devolves into the same _vm_page_list_find() select code without any specific zero-page optimizations.
Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been added and zero-page tracking adjusted to conform with the other changes. Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free pages. We may wish to increase both parameters as time permits. The hysteresis is designed to avoid silly zeroing in borderline allocation/free situations.
|
#
43387 |
|
29-Jan-1999 |
dillon |
More -Wall / -Wcast-qual cleanup. Also, EXEC_SET can't use C_DECLARE_MODULE due to the linker_file_sysinit() function making modifications to the data.
|
#
42360 |
|
06-Jan-1999 |
julian |
Add (but don't activate) code for a special VM option to make downward growing stacks more general. Add (but don't activate) code to use the new stack facility when running threads, (specifically the linux threads support). This allows people to use both linux compiled linuxthreads, and also the native FreeBSD linux-threads port.
The code is conditional on VM_STACK. Not using this will produce the old heavily tested system.
Submitted by: Richard Seaman <dick@tar.com>
|
#
41868 |
|
16-Dec-1998 |
bde |
Removed bogus casts of USRSTACK and/or the other operand in binary expressions involving USRSTACK.
|
#
40794 |
|
31-Oct-1998 |
peter |
Add John Dyson's SYSCTL descriptions, and an export of more stats to a sysctl hierarchy (vm.stats.*). SYSCTL descriptions are only present in source, they do not get compiled into the binaries taking up memory.
|
#
40286 |
|
13-Oct-1998 |
dg |
Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.
|
#
39703 |
|
28-Sep-1998 |
tegge |
Initialize pcb_mpnest to 1 in the child process in cpu_fork(). This should fix the 50% idle problem that the ELF /sbin/init triggered. The problem appeared when the last context switch before a fork() call was due to the kernel faulting in user pages via normal page faults (e.g. copyin). Reviewed by: Peter Wemm <peter@netplex.com.au>
|
#
39648 |
|
25-Sep-1998 |
peter |
Goodbye BOUNCE_BUFFERS, for a hack it has served us well.
The last consumer of this code (the old SCSI system) has left us and the CAM code does it's own bouncing. The isa dma system has been doing it's own bouncing for a while too.
Reviewed by: core
|
#
38422 |
|
18-Aug-1998 |
msmith |
Presently there is only one `currentldt' variable for all cpus in a SMP system. Unexpected things could happen if each cpu has a different ldt setting and one cpu tries to use value of currentldt set by another cpu.
The fix is to move currentldt to the per-cpu area. It includes patches I filed in PR i386/6219 which are also user ldt related.
PR: i386/7591, i386/6219 Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
|
#
36168 |
|
18-May-1998 |
tegge |
Disallow reading the current kernel stack. Only the user structure and the current registers should be accessible. Reviewed by: David Greenman <dg@root.com>
|
#
36135 |
|
17-May-1998 |
tegge |
Add forwarding of roundrobin to other cpus. This gives a more regular update of cpu usage as shown by top when one process is cpu bound (no system calls) while the system is otherwise idle (except for top).
Don't attempt to switch to the BSP in boot(). If the system was idle when an interrupt caused a panic, this won't work. Instead, switch to the BSP in cpu_reset.
Remove some spurious forward_statclock/forward_hardclock warnings.
|
#
36095 |
|
16-May-1998 |
kato |
Some of newer PC-98 may cause "Windows Protection Fault" when booting Windows 95 after rebooting FreeBSD without power off. In PC-98 system, reboot mode is set via I/O port 0x37 in cpu_reset(), and accessing of this port is the reason of the problem. To avnoid the fault, current status of reboot mode should be checked before accessing the I/O port.
|
#
34840 |
|
23-Mar-1998 |
jlemon |
Add the ability to make real-mode BIOS calls from the kernel. Currently, everything is contained inside #ifdef VM86, so this option must be present in the config file to use this functionality.
Thanks to Tor Egge, these changes should work on SMP machines. However, it may not be throughly SMP-safe.
Currently, the only BIOS calls made are memory-sizing routines at bootup, these replace reading the RTC values.
|
#
34643 |
|
17-Mar-1998 |
kato |
Make EPSON_BOUNCEDMA a new-style option.
|
#
34569 |
|
14-Mar-1998 |
tegge |
Don't use the standard macros for disabling/enabling interrupt. On SMP systems, this left the mpintr_lock simplelock locked, causing further calls to disable_intr to deadlock or panic.
|
#
34507 |
|
12-Mar-1998 |
bde |
Fixed breakage of the !SMP case in vm_page_zero_idle() in the previous commit. Opportunities to clean pages were often missed, and leaving of the idle state was sometimes delayed until the next interrupt (after any that occurred while cleaning).
Fixed an unstaticization, a syntax error and a style bug in the previous commit.
|
#
33817 |
|
25-Feb-1998 |
dyson |
Fix page prezeroing for SMP, and fix some potential paging-in-progress hangs. The paging-in-progress diagnosis was a result of Tor Egge's excellent detective work. Submitted by: Partially from Tor Egge.
|
#
33307 |
|
13-Feb-1998 |
bde |
Ifdefed some npx code. npx should be optional again.
|
#
33134 |
|
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
#
33108 |
|
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
#
32884 |
|
30-Jan-1998 |
dyson |
Make the bounce buffer code a little more robust when space isn't available. If there isn't bounce space available, the bounce code is disabled. This will allow most large systems to run properly when the bounce space is mistakenly allocated above 16MB.
|
#
32702 |
|
22-Jan-1998 |
dyson |
VM level code cleanups.
1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM.
This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.)
This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
|
#
32617 |
|
19-Jan-1998 |
tegge |
The removal of a page from the free queue in vm_page_zero_idle was imcomplete. Also set m->queue, in order to prevent vm_page_select_free from selecting the page being zeroed.
|
#
32516 |
|
15-Jan-1998 |
gibbs |
Implementation of Bus DMA for FreeBSD-x86. This is sufficient to do page level bounce buffering, but there are still some issues left to address.
|
#
32010 |
|
27-Dec-1997 |
peter |
#include "opt_user_ldt.h" so that the #ifdef USER_LDT checks can work, as commented about at length in the PR audit trail.
PR: 2412
|
#
31322 |
|
20-Nov-1997 |
bde |
Removed a duplicate (sloppy common-style) definition.
Fixed some style bugs.
|
#
31249 |
|
18-Nov-1997 |
bde |
Don't #include <machine/smp.h> even in the SMP case. Fixed the one place that depended on it. The "bazillion warnings" mentioned in the log for rev.1.45 apparently aren't a problem any more. It is hard to be sure because the SIMPLELOCK_DEBUG option turns off (and breaks) things in the SMP case.
|
#
30265 |
|
10-Oct-1997 |
peter |
Convert the VM86 option from a global option to an option only depended on by the files that use it. Changing the VM86 option now only causes a recompile of a dozen files or so rather than the entire kernel.
|
#
29330 |
|
13-Sep-1997 |
joerg |
Revert the logic behind my last change, and use a function called `is_physical_memory()' now for the decision whether to dump some region of memory or not.
Suggested by: davidg
|
#
29280 |
|
10-Sep-1997 |
joerg |
Do not ever try to coredump adapter memory regions.
PR: 4486 Submitted by: tegge@idi.ntnu.no (Tor Egge)
Implement a function is_adapter_memory() in order to determine what should nto be dumped at all. Currently, only populated with the ``ISA memory hole''. Adapter regions of other busses should be added.
|
#
29041 |
|
02-Sep-1997 |
bde |
Removed unused #includes.
|
#
28808 |
|
26-Aug-1997 |
peter |
Clean up the SMP AP bootstrap and eliminate the wretched idle procs.
- We now have enough per-cpu idle context, the real idle loop has been revived (cpu's halt now with nothing to do). - Some preliminary support for running some operations outside the global lock (eg: zeroing "free but not yet zeroed pages") is present but appears to cause problems. Off by default. - the smp_active sysctl now behaves differently. It's merely a 'true/false' option. Setting smp_active to zero causes the AP's to halt in the idle loop and stop scheduling processes. - bootstrap is a lot safer. Instead of sharing a statically compiled in stack a number of times (which has caused lots of problems) and then abandoning it, we use the idle context to boot the AP's directly. This should help >2 cpu support since the bootlock stuff was in doubt. - print physical apic id in traps.. helps identify private pages getting out of sync. (You don't want to know how much hair I tore out with this!)
More cleanup to follow, this is more of a checkpoint than a 'finished' thing.
|
#
27993 |
|
08-Aug-1997 |
dyson |
VM86 kernel support. Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>, Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>, and probably alot of others. Submitted by: Jnathan Lemon <jlemon@americantv.com>
|
#
27535 |
|
20-Jul-1997 |
bde |
Removed unused #includes.
|
#
26954 |
|
26-Jun-1997 |
tegge |
Back out a bad commit.
|
#
26943 |
|
25-Jun-1997 |
tegge |
Block some interrupts during the call to pmap_zero_page in vm_page_zero_idle. This fixes some occurences of the problem reported in PR kern/3216: "panic: pmap_zero_page: CMAP busy"
|
#
26812 |
|
22-Jun-1997 |
peter |
Preliminary support for per-cpu data pages.
This eliminates a lot of #ifdef SMP type code. Things like _curproc reside in a data page that is unique on each cpu, eliminating the expensive macros like: #define curproc (SMPcurproc[cpunumber()])
There are some unresolved bootstrap and address space sharing issues at present, but Steve is waiting on this for other work. There is still some strictly temporary code present that isn't exactly pretty.
This is part of a larger change that has run into some bumps, this part is standalone so it should be safe. The temporary code goes away when the full idle cpu support is finished.
Reviewed by: fsmp, dyson
|
#
25557 |
|
07-May-1997 |
peter |
clean up forked child creation. This is simplified also by having md_regs being struct trapframe *. Do a npxsave() if needed and copy the pcb rather than use the increasingly defunct savectx(). Copy %edi and %ebp explicitly.
Submitted by: bde
XXX npxproc could be declared in npx.h so the externs with smp fruit are not needed.
|
#
24980 |
|
16-Apr-1997 |
kato |
Use reset port before clearing page table in cpu_reset if PC98 is defined. Clearing page table could hang some new PC-98.
|
#
24691 |
|
07-Apr-1997 |
peter |
The biggie: Get rid of the UPAGES from the top of the per-process address space. (!)
Have each process use the kernel stack and pcb in the kvm space. Since the stacks are at a different address, we cannot copy the stack at fork() and allow the child to return up through the function call tree to return to user mode - create a new execution context and have the new process begin executing from cpu_switch() and go to user mode directly. In theory this should speed up fork a bit.
Context switch the tss_esp0 pointer in the common tss. This is a lot simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer to each process's tss since the esp0 pointer is a 32 bit pointer, and the sd_base setting is split into three different bit sections at non-aligned boundaries and requires a lot of twiddling to reset.
The 8K of memory at the top of the process space is now empty, and unmapped (and unmappable, it's higher than VM_MAXUSER_ADDRESS).
Simplity the pmap code to manage process contexts, we no longer have to double map the UPAGES, this simplifies and should measuably speed up fork().
The following parts came from John Dyson:
Set PG_G on the UPAGES that are now in kernel context, and invalidate them when swapping them out.
Move the upages object (upobj) from the vmspace to the proc structure.
Now that the UPAGES (pcb and kernel stack) are out of user space, make rfork(..RFMEM..) do what was intended by sharing the vmspace entirely via reference counting rather than simply inheriting the mappings.
|
#
24361 |
|
29-Mar-1997 |
bde |
Don't keep cpu interrupts enabled during the lookup in vm_page_zero_idle(). Lookup isn't done every time the system goes idle now, but it can still take > 1800 instructions in the worst case, so if cpu interrupts are kept disabled then it might lose 20 characters of sio input at 115200 bps.
Fixed style in vm_page_zero_idle().
|
#
24099 |
|
22-Mar-1997 |
dyson |
Decrease the latency/overhead in the prezero code when there is an adequate number of prezeroed pages.
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
21181 |
|
01-Jan-1997 |
se |
Add code to copy the LDT, if required.
This code was sent to me by Bruce Evans, and seems to fix some possible kernel panic in case of an execution error. It did not cause any problems on my system, but I did never observe the problem this patch is supposed to fix, anyway.
This patch is a NOP, unless the kernel is built with "options USER_LDT", and doesn't affect the GENERIC kernel for this reason.
I want to have it in 2.2: it fixes a bug ...
Submitted by: bde
|
#
19269 |
|
30-Oct-1996 |
asami |
More merge and update.
(1) deleted #if 0
pc98/pc98/mse.c
(2) hold per-unit I/O ports in ed_softc
pc98/pc98/if_ed.c pc98/pc98/if_ed98.h
(3) merge more files by segregating changes into headers.
new file (moved from pc98/pc98):
i386/isa/aic_98.h
deleted:
well, it's already in the commit message so I won't repeat the long list here ;)
Submitted by: The FreeBSD(98) Development Team
|
#
18937 |
|
15-Oct-1996 |
dyson |
Move much of the machine dependent code from vm_glue.c into pmap.c. Along with the improved organization, small proc fork performance is now about 5%-10% faster.
|
#
18548 |
|
28-Sep-1996 |
dyson |
Essentially rename pmap_update to be invltlb. It is a very machine dependent operation, and not really a correct name. invltlb and invlpg are more descriptive, and in the case of invlpg, a real opcode.
Additionally, fix the tlb management code for 386 machines.
|
#
18169 |
|
08-Sep-1996 |
dyson |
Addition of page coloring support. Various levels of coloring are afforded. The default level works with minimal overhead, but one can also enable full, efficient use of a 512K cache. (Parameters can be generated to support arbitrary cache sizes also.)
|
#
17108 |
|
12-Jul-1996 |
bde |
Don't use NULL in non-pointer contexts.
|
#
16532 |
|
20-Jun-1996 |
dg |
Properly account for non-page aligned buffers.
|
#
16530 |
|
19-Jun-1996 |
dg |
Minor KNF formatting change to vmapbuf() and vunmapbuf().
|
#
16499 |
|
19-Jun-1996 |
dyson |
Clean up vmapbuf and vunmapbuf significantly. The previous code was very rough.
|
#
15809 |
|
18-May-1996 |
dyson |
This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:
More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.
|
#
15543 |
|
02-May-1996 |
phk |
removed: CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei() ptei() kvtopte() ptetov() ispt() ptetoav() &c &c new: NPDEPG
Major macro cleanup.
|
#
15538 |
|
02-May-1996 |
phk |
First pass at cleaning up macros relating to pages, clusters and all that.
|
#
15379 |
|
25-Apr-1996 |
phk |
Fix cpu_fork for real.
Suggested by: bde
|
#
15301 |
|
18-Apr-1996 |
phk |
Fix a bogon. cpu_fork & savectx ecpected cpu_switch to restore %eax, they shouldn't.
|
#
15117 |
|
07-Apr-1996 |
bde |
Removed never-used #includes of <machine/cpu.h>. Many were apparently copied from bad examples.
|
#
14348 |
|
02-Mar-1996 |
jkh |
USER_LDT changes for the Willows TwinXPDK toolkit. Only tested with WINE since that's the only other USER_LDT using code that I know of. Submitted by: Gary Jennejohn <Gary.Jennejohn@munich.netsurf.de> Obtained from: {Origin of diffs may be someone else - I only rec'd them from Gary}
|
#
13915 |
|
05-Feb-1996 |
dg |
Unspam my changes in rev 1.54 that John spammed in rev 1.55.
|
#
13909 |
|
04-Feb-1996 |
dyson |
Changed vm_fault_quick in vm_machdep.c to be global. Needed for new pipe code.
|
#
13908 |
|
04-Feb-1996 |
dg |
Rewrote cpu_fork so that it doesn't use pmap_activate, and removed pmap_activate since it's not used anymore. Changed cpu_fork so that it uses one line of inline assembly rather than calling mvesp() to get the current stack pointer. Removed mvesp() since it is no longer being used.
|
#
13740 |
|
30-Jan-1996 |
dg |
savectx() strikes again: the saved stack pointer wasn't properly adjusted to remove the return address. It's only the frame pointer and luck that allowed the code to work at all.
|
#
13580 |
|
23-Jan-1996 |
dg |
Simplified savectx() a little and fixed a bug that caused it to return garbage in the child process rather than "1" like it is supposed to.
Reviewed by: bde
|
#
13490 |
|
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
13265 |
|
05-Jan-1996 |
wollman |
Convert BOUNCE_BUFFERS and BOUNCEPAGES to new option scheme.
|
#
12819 |
|
14-Dec-1995 |
phk |
A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
|
#
12767 |
|
11-Dec-1995 |
dyson |
Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
|
#
12722 |
|
10-Dec-1995 |
phk |
Staticize and cleanup. remove a TON of #includes from machdep.
|
#
12662 |
|
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
#
12417 |
|
20-Nov-1995 |
phk |
Remove unused vars.
|
#
12357 |
|
18-Nov-1995 |
bde |
Fixed the type of vm_fault_quick() - don't convert types back and forth through bogus immediate types.
Added prototypes.
|
#
11116 |
|
01-Oct-1995 |
dg |
Insert zeroed pages at the head of the zero queue rather than at the tail. A measurable performance improvement results from the potential for the page to be partially cached when it is eventually used.
|
#
10546 |
|
03-Sep-1995 |
dyson |
Machine dependent routines to support pre-zeroed free pages. This significantly improves demand zero performance.
|
#
9759 |
|
29-Jul-1995 |
bde |
Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
|
#
9507 |
|
13-Jul-1995 |
dg |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
#
8876 |
|
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
#
8590 |
|
18-May-1995 |
dg |
Added "BROKEN_KEYBOARD_RESET" option to disable using the keyboard reset in cpu_reset(). Some MBs don't deal with this properly.
Submitted by: Rod Grimes
|
#
8211 |
|
01-May-1995 |
dyson |
Fixed a problem that can cause left-over pv_entries and as as side-effect, removed some legacy code that was necessary when we called vm_fault inside of vm_fault_quick instead of using the kernel/user space byte move routines.
|
#
8074 |
|
26-Apr-1995 |
rgrimes |
Add outb to keyboard controller to do a cpu_reset, this fixes 2 known cases of motherboards that failed to reboot.
|
#
7170 |
|
19-Mar-1995 |
dg |
Removed redundant newlines that were in some panic strings.
|
#
7090 |
|
16-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
#
6817 |
|
01-Mar-1995 |
dg |
Use su/fubyte instead of directly touching the user's address space.
|
#
6579 |
|
20-Feb-1995 |
dg |
Use of vm_allocate() and vm_deallocate() has been deprecated.
|
#
5771 |
|
21-Jan-1995 |
bde |
Don't use mi_switch() to terminate cpu_exit(). Calling it just happened to work (mi_switch() counted the last timeslice again but this didn't affect the exiting process' rusage because the rusage has already been finalized).
Remove stale comment.
|
#
5455 |
|
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
#
3436 |
|
08-Oct-1994 |
phk |
db_disasm.c: Unused var zapped. pmap.c: tons of unused vars zapped, various other warnings silenced. trap.c: unused vars zapped. vm_machdep.c: A wrong argument, which by chance did the right thing, was corrected.
|
#
2455 |
|
02-Sep-1994 |
dg |
Removed all vestiges of tlbflush(). Replaced them with calls to pmap_update(). Made pmap_update an inline assembly function.
|
#
2422 |
|
31-Aug-1994 |
dg |
Rather than exclude bounce buffers support with NOBOUNCE, include it with BOUNCE_BUFFERS. This is more intuitive, and is better for future multiplatform support. Added BOUNCE_BUFFERS option to the GENERIC and LINT kernel config files.
|
#
1896 |
|
07-Aug-1994 |
dg |
Made pmap_kenter "TLB safe". ...and then removed all the pmap_updates that are no longer needed because of this.
|
#
1894 |
|
07-Aug-1994 |
dg |
Don't kremove process VM pages (oops!). This was the cause of the instability that was introduced last night.
Submitted by: John Dyson
|
#
1890 |
|
06-Aug-1994 |
dg |
Fixed various prototype problems with the pmap functions and the subsequent problems that fixing them caused.
|
#
1889 |
|
06-Aug-1994 |
dg |
Incorporated 1.1.5 improvements to the bounce buffer code (i.e. make it actually work), and additionally improved it's performance via new pmap routines and "pbuf" allocation policy.
Submitted by: John Dyson
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1415 |
|
25-Apr-1994 |
dg |
From John Dyson:
Fixed physio in the 386 case - write faults weren't properly implemented.
|
#
1379 |
|
20-Apr-1994 |
dg |
Bug fixes and performance improvements from John Dyson and myself:
1) check va before clearing the page clean flag. Not doing so was causing the vnode pager error 5 messages when paging from NFS. (pmap.c) 2) put back interrupt protection in idle_loop. Bruce didn't think it was necessary, John insists that it is (and I agree). (swtch.s) 3) various improvements to the clustering code (vm_machdep.c). It's now enabled/used by default. 4) bad disk blocks are now handled properly when doing clustered IOs. (wd.c, vm_machdep.c) 5) bogus bad block handling fixed in wd.c. 6) algorithm improvements to the pageout/pagescan daemons. It's amazing how well 4MB machines work now.
|
#
1362 |
|
14-Apr-1994 |
dg |
Changes from John Dyson and myself:
1) Removed all instances of disable_intr()/enable_intr() and changed them back to splimp/splx. The previous method was done to improve the performance, but Bruces recent changes to inline spl* have made this unnecessary. 2) Cleaned up vm_machdep.c considerably. Probably fixed a few bugs, too. 3) Added a new mechanism for collecting page statistics - now done by a new system process "pagescan". Previously this was done by the pageout daemon, but this proved to be impractical. 4) Improved the page usage statistics gathering mechanism - performance is much improved in small memory machines. 5) Modified mbuf.h to enable the support for an external free routine when using mbuf clusters. Added appropriate glue in various places to allow this to work. 6) Adapted a suggested change to the NFS code from Yuval Yurom to take advantage of #5. 7) Added fault/swap statistics support.
|
#
1335 |
|
05-Apr-1994 |
dg |
from John Dyson:
1) fixed some bugs related to the bounce buffer code 2) vnode pager now supports clustered pageouts 3) experimental code for clustering all I/O via a new "cldisksort" 4) added >16MB check to Bustek driver 5) made some experimental algorithmic changes to the pageout daemon 6) fixed bugs in truncating mapped files (esp when mapped via NFS) 7) reorganized vnode pager I/O code
|
#
1314 |
|
30-Mar-1994 |
dg |
Eliminated the "physstrat" wart and merged it into kern_physio.c. This patch also fixes a bug which causes a kernel VM leak.
|
#
1312 |
|
30-Mar-1994 |
dg |
New routine "pmap_kenter", designed to take advantage of the special case of the kernel pmap.
|
#
1307 |
|
24-Mar-1994 |
dg |
From John Dyson: performance improvements to the new bounce buffer code.
|
#
1298 |
|
23-Mar-1994 |
dg |
Bounce buffers. From John Dyson with help from me.
|
#
1288 |
|
21-Mar-1994 |
dg |
Changed dynamic stack grow code to grow by "SGROWSIZ" amount. Initially allocate SGROWSIZ amount of stack. Also set vm_ssize to the initial stack VM size. Increased DFLSSIZ stack rlimit default to 8MB.
|
#
1246 |
|
07-Mar-1994 |
dg |
1) "Pre-faulting" in of pages into process address space Eliminates vm_fault overhead on process startup and mmap referenced data for in-memory pages.
(process startup time using in-memory segments *much* faster)
2) Even more efficient pmap code. Code partially cleaned up. More comments yet to follow.
(generally more efficient pte management)
3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein clustering.)
(much faster paging performance on non-write behind disk subsystems, slightly faster performance on other systems.)
4) Slightly changed vm_pageout code for more efficiency and better statistics. Also, resist swapout a little more.
(less likely to pageout a recently used page)
5) Slight improvement to the page table page trap efficiency.
(generally faster system VM fault performance)
6) Defer creation of unnamed anonymous regions pager until needed.
(speeds up shared memory bss creation)
7) Remove possible deadlock from swap_pager initialization.
8) Enhanced procfs to provide "vminfo" about vm objects and user pmaps.
9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net & socket performance and to prepare for things to come.
John Dyson dyson@implode.root.com David Greenman davidg@root.com
|
#
1127 |
|
08-Feb-1994 |
dg |
Fixed bugs in stack grow code, and moved it back into a seperate function like it was originally. Also added back call to "grow" in sendsig now that this routine actually works.
|
#
991 |
|
21-Jan-1994 |
dg |
Remove some old, unused, major UGLY code.
|
#
974 |
|
14-Jan-1994 |
dg |
"New" VM system from John Dyson & myself. For a run-down of the major changes, see the log of any effected file in the sys/vm directory (swap_pager.c for instance).
|
#
879 |
|
18-Dec-1993 |
wollman |
Make everything compile with -Wtraditional. Make it easier to distribute a binary link-kit. Make all non-optional options (pagers, procfs) standard, and update LINT to reflect new symtab requirements.
NB: -Wtraditional will henceforth be forgotten. This editing pass was primarily intended to detect any constructions where the old code might have been relying on traditional C semantics or syntax. These were all fixed, and the result of fixing some of them means that -Wall is now a realistic possibility within a few weeks.
|
#
798 |
|
24-Nov-1993 |
wollman |
Make the LINT kernel compile with -W -Wreturn-type -Wcomment -Werror, and add same (sans -Werror) to Makefile for future compilations.
|
#
608 |
|
15-Oct-1993 |
rgrimes |
genassym.c: Remove NKMEMCLUSTERS, it is no longer define or used.
locores.s: Fix comment on PTDpde and APTDpde to be pde instead of pte Add new equation for calculating location of Sysmap Remove Bill's old #ifdef garbage for counting up memory, that stuff will never be made to work and was just cluttering up the file.
Add code that places the PTD, page table pages, and kernel stack below the 640k ISA hole if there is room for it, otherwise put this stuff all at 1MB. This fixes the 28K bogusity in the boot blocks, that can now go away!
Fix the caclulation of where first is to be dependent on NKPDE so that we can skip over the above mentioned areas. The 28K thing is now 44K in size due to the increase in kernel virtual memory space, but since we no longer have to worry about that this is no big deal.
Use if NNPX > 0 instead of ifdef NPX for floating point code.
machdep.c Change the calculation of for the buffer cache to be 20% of all memory above 2MB and add back the upper limit of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL of the kernel memory space on large memory machines, note that this will not even come into effect unless you have more than 32MB. The current buffer cache limit is 6.7MB due to this caclulation.
It seems that we where erroniously allocating bufpages pages for buffer_map. buffer_map is UNUSED in this implementation of the buffer cache, but since the map is referenced in several if statements a quick fix was to simply allocate 1 vm page (but no real memory) to it.
pmap.h Remove rcsid, don't want them in the kernel files!
Removed some cruft inside an #ifdef DEBUGx that caused compiler errors if you where compiling this for debug.
Use the #defines for PD_SHIFT and PG_SHIFT in place of constants.
trap.c: Remove patch kit header and rcsid, fix $Id$. Now include "npx.h" and use NNPX for controlling the floating point code.
Remove a now completly invalid check for a maximum virtual address, the virtual address now ends at 0xFFFFFFFF so there is no more MAX!! (Thanks David, I completly missed that one!)
vm_machdep.c Remove patch kit header and rcsid, fix $Id$. Now include "npx.h" and use NNPX for controlling the floating point code.
Replace several 0xFE00000 constants with KERNBASE
|
#
434 |
|
10-Sep-1993 |
nate |
Removed volatile functions which were causing grief in the system, since volatile functions are undefined, and there is no reason to have them in our kernel.
|
#
433 |
|
10-Sep-1993 |
rgrimes |
This is just to shut the compiler up =================================================================== RCS file: /a/cvs/386BSD/src/sys/i386/i386/vm_machdep.c,v retrieving revision 1.3 diff -c -r1.3 vm_machdep.c *** 1.3 1993/07/27 10:52:21 --- vm_machdep.c 1993/09/10 20:12:53 *************** *** 179,184 **** --- 179,186 ---- #endif splclock(); swtch(); + /*NOTREACHED*/ + for(;;); }
cpu_wait(p) struct proc *p; {
|
#
200 |
|
27-Jul-1993 |
dg |
* Applied fixes from Bruce Evans to fix COW bugs, >1MB kernel loading, profiling, and various protection checks that cause security holes and system crashes. * Changed min/max/bcmp/ffs/strlen to be static inline functions - included from cpufunc.h in via systm.h. This change improves performance in many parts of the kernel - up to 5% in the networking layer alone. Note that this requires systm.h to be included in any file that uses these functions otherwise it won't be able to find them during the load. * Fixed incorrect call to splx() in if_is.c * Fixed bogus variable assignment to splx() in if_ed.c
|
#
140 |
|
18-Jul-1993 |
paul |
Added volatile void to cpu_exit() in the hope that it would stop warning about returning from gcc.
It hasn't but the declaration is still correct.
|
#
5 |
|
12-Jun-1993 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r4, which included commits to RCS files with non-trunk default branches.
|
#
4 |
|
12-Jun-1993 |
rgrimes |
Initial import, 0.1 + pk 0.2.4-B1
|