#
272461 |
|
02-Oct-2014 |
gjb |
Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
270296 |
|
21-Aug-2014 |
emaste |
MFC r263815, r263872:
Move ia64 efi.h to sys in preparation for amd64 UEFI support
Prototypes specific to ia64 have been left in this file for now, under __ia64__, rather than moving them to a new header under sys/ia64. I anticipate that (some of) the corresponding functions will be shared by the amd64, arm64, i386, and ia64 architectures, and we can adjust this as EFI support on other than ia64 continues to develop.
Fix missed efi.h header change in r263815
Sponsored by: The FreeBSD Foundation
|
#
268200 |
|
02-Jul-2014 |
marcel |
MFC r263323: Fix and improve exception tracing.
|
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
240244 |
|
08-Sep-2012 |
attilio |
userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it.
Reviewed by: kib MFC after: 1 week
|
#
225474 |
|
11-Sep-2011 |
kib |
Inline the syscallenter() and syscallret(). This reduces the time measured by the syscall entry speed microbenchmarks by ~10% on amd64.
Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks
|
#
219741 |
|
18-Mar-2011 |
marcel |
Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though it's not used anywhere in the source tree.
|
#
211515 |
|
19-Aug-2010 |
jhb |
Remove unused KTRACE includes.
|
#
208514 |
|
24-May-2010 |
kib |
Change ia64' struct syscall_args definition so that args is a pointer to the arguments array instead of array itself. ia64 syscall arguments are readily available in the frame, point args to it, do not do unnecessary bcopy. Still reserve the array in syscall_args for ia32 emulation.
Suggested and reviewed by: marcel MFC after: 1 month
|
#
208453 |
|
23-May-2010 |
kib |
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that use sv_*syscall* pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
|
#
205713 |
|
26-Mar-2010 |
marcel |
Rename disable_intr() to ia64_disable_intr() and rename enable_intr() to ia64_enable_intr(). This reduces confusion with intr_disable() and intr_restore().
Have configure_final() call ia64_finalize_intr() instead of enable_intr() in preparation of adding support for binding interrupts to all CPUs.
|
#
203572 |
|
06-Feb-2010 |
marcel |
Fix single-stepping when the kernel was entered through the EPC syscall path. When the taken branch leaves the kernel and enters the process, we still need to execute the instruction at that address. Don't raise SIGTRAP when we branch into the process, but enable single-stepping instead.
|
#
199868 |
|
27-Nov-2009 |
alc |
Simplify the invocation of vm_fault(). Specifically, eliminate the flag VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault().
Discussed with: kib
|
#
199566 |
|
20-Nov-2009 |
marcel |
Add a seatbelt to the Nested TLB Fault handler to give us a chance to panic when we have an unexpected TLB fault while interrupt collection is disabled. Use a token rather than the actual address of the restart point to avoid the need for the movl instruction. The token is arbitrary. For the drummers: it's based on a single paradiddle.
|
#
199135 |
|
10-Nov-2009 |
kib |
Extract the code that records syscall results in the frame into MD function cpu_set_syscall_retval().
Suggested by: marcel Reviewed by: marcel, davidxu PowerPC, ARM, ia64 changes: marcel Sparc64 tested and reviewed by: marius, also sunv reviewed MIPS tested by: gonzo MFC after: 1 month
|
#
198733 |
|
31-Oct-2009 |
marcel |
Reimplement the lazy FP context switching: o Move all code into a single file for easier maintenance. o Use a single global lock to avoid having to handle either multiple locks or race conditions. o Make sure to disable the high FP registers after saving or dropping them. o use msleep() to wait for the other CPU to save the high FP registers.
This change fixes the high FP inconsistency panics.
A single global lock typically serializes too much, which may be noticable when a lot of threads use the high FP registers, but in that case it's probably better to switch the high FP context synchronuously. Put differently: cpu_switch() should switch the high FP registers if the incoming and outgoing threads both use the high FP registers.
|
#
177091 |
|
12-Mar-2008 |
jeff |
Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
|
#
173600 |
|
14-Nov-2007 |
julian |
generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
|
#
170291 |
|
04-Jun-2007 |
attilio |
Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value.
Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days.
Reviewed by: alc, bde Approved by: jeff (mentor)
|
#
169814 |
|
21-May-2007 |
marcel |
When speculation fails (as determined by the chk instruction) the processor is to jump to recovery code. This branching behaviour may not be implemented by the processor and a Speculative Operation fault is raised. The OS is responsible to emulate the branch. Implement this, because GCC 4.2 uses advanced loads regularly.
|
#
167352 |
|
09-Mar-2007 |
mohans |
Over NFS, an open() call could result in multiple over-the-wire GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.
|
#
163709 |
|
26-Oct-2006 |
jb |
Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE).
Reviewed by: davidxu@
|
#
162361 |
|
16-Sep-2006 |
rwatson |
Add audit hooks for ppc, ia64 system call paths.
Reviewed by: marcel (ia64) Obtained from: TrustedBSD Project MFC after: 3 days
|
#
160801 |
|
28-Jul-2006 |
jhb |
Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is now back to just being an argument count.
|
#
160798 |
|
28-Jul-2006 |
jhb |
Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to mark system calls as being MPSAFE: - Stop conditionally acquiring Giant around system call invocations. - Remove all of the 'M' prefixes from the master system call files. - Remove support for the 'M' prefix from the script that generates the syscall-related files from the master system call files. - Don't explicitly set SYF_MPSAFE when registering nfssvc.
|
#
160773 |
|
27-Jul-2006 |
jhb |
Unify the checking for lock misbehavior in the various syscall() implementations and adjust some of the checks while I'm here: - Add a new check to make sure we don't return from a syscall in a critical section. - Add a new explicit check before userret() to make sure we don't return with any locks held. The advantage here is that we can include the syscall number and name in syscall() whereas that info is not available in userret(). - Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by the more general checks just added.
MFC after: 2 weeks
|
#
160770 |
|
27-Jul-2006 |
jhb |
Add KTR_SYSC tracing to the syscall() implementations that didn't have it yet.
MFC after: 1 week
|
#
160040 |
|
29-Jun-2006 |
marcel |
Partial support for branch long emulation. This only emulates the branch long jump and not the branch long call. Support for that is forthcoming.
|
#
158651 |
|
16-May-2006 |
phk |
Since DELAY() was moved, most <machine/clock.h> #includes have been unnecessary.
|
#
155455 |
|
08-Feb-2006 |
phk |
Simplify system time accounting for profiling.
Rename struct thread's td_sticks to td_pticks, we will need the other name for more appropriately named use shortly. Reduce it from uint64_t to u_int.
Clear td_pticks whenever we enter the kernel instead of recording its value as reference for userret(). Use the absolute value of td->pticks in userret() and eliminate third argument.
|
#
151316 |
|
14-Oct-2005 |
davidxu |
1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code.
2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread.
3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc.
4. Add td_sigqueue to thread structure to hold all signals delivered to thread.
5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed.
6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals.
Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
|
#
149915 |
|
09-Sep-2005 |
marcel |
Change the High FP lock from a sleep lock to a spin lock. We can take the lock from interrupt context, which causes an implicit lock order reversal. We've been using the lock carefully enough that making it a spin lock should not be harmful.
|
#
148807 |
|
06-Aug-2005 |
marcel |
Improve SMP support: o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU uses to look up translations it can't find in the TLB. As such, the VHPT serves as a level 1 cache (the TLB being a level 0 cache) and best results are obtained when it's not shared between CPUs. The collision chain (i.e. the hash bucket) is shared between CPUs, as all buckets together constitute our collection of PTEs. To achieve this, the collision chain does not point to the first PTE in the list anymore, but to a hash bucket head structure. The head structure contains the pointer to the first PTE in the list, as well as a mutex to lock the bucket. Thus, each bucket is locked independently of each other. With at least 1024 buckets in the VHPT, this provides for sufficiently finei-grained locking to make the ssolution scalable to large SMP machines. o Add synchronisation to the lazy FP context switching. We do this with a seperate per-thread lock. On SMP machines the lazy high FP context switching without synchronisation caused inconsistent state, which resulted in a panic. Since the use of the high FP registers is not common, it's possible that races exist. The ia64 package build has proven to be a good stress test, so this will get plenty of exercise in the near future. o Don't use the local ID of the processor we want to send the IPI to as the argument to ipi_send(). use the struct pcpu pointer instead. The reason for this is that IPI delivery is unreliable. It has been observed that sending an IPI to a CPU causes it to receive a stray external interrupt. As such, we need a way to make the delivery reliable. The intended solution is to queue requests in the target CPU's per-CPU structure and use a single IPI to inform the CPU that there's a new entry in the queue. If that IPI gets lost, the CPU can check it's queue at any convenient time (such as for each clock interrupt). This also allows us to send requests to a CPU without interrupting it, if such would be beneficial.
With these changes SMP is almost working. There are still some random process crashes and the machine can hang due to having the IPI lost that deals with the high FP context switch.
The overhead of introducing the hash bucket head structure results in a performance degradation of about 1% for UP (extra pointer indirection). This is surprisingly small and is offset by gaining reasonably/good scalable SMP support.
|
#
147640 |
|
27-Jun-2005 |
marcel |
Handle B-unit break instructions. The break.b is unique in that the immediate is not saved by the architecture. Any of the break.{mifx} instructions have their immediate saved in cr.iim on interruption. Consequently, when we handle the break interrupt, we end up with a break value of 0 when it was a break.b. The immediate is important because it distinguishes between different uses of the break and which are defined by the runtime specification. The bottomline is that when the GNU debugger replaces a B-unit instruction with a break instruction in the inferior, we would not send the process a SIGTRAP when we encounter it, because the value is not one we recognize as a debugger breakpoint.
This change adds logic to decode the bundle in which the break instruction lives whenever the break value is 0. The assumption being that it's a break.b and we fetch the immediate directly out of the instruction. If the break instruction was not a break.b, but any of break.{mifx} with an immediate of 0, we would be doing unnecessary work. But since a break 0 is invalid, this is not a problem and it will still result in a SIGILL being sent to the process.
Approved by: re (scottl)
|
#
147639 |
|
27-Jun-2005 |
marcel |
Replace the existing copyright notice with my own. Over the years I've changed this file so much that it's equivalent to a rewrite, and I'm not talking about any of the cosmetic changes of course.
Approved by: re (scottl)
|
#
147638 |
|
27-Jun-2005 |
marcel |
Cosmetic: s/u_int64_t/uint64_t/g
Approved by: re (scottl)
|
#
144971 |
|
12-Apr-2005 |
jhb |
Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic operations in some places and simple non-per CPU math in others.
|
#
139790 |
|
06-Jan-2005 |
imp |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
138129 |
|
27-Nov-2004 |
das |
Don't include sys/user.h merely for its side-effect of recursively including other headers.
|
#
135783 |
|
25-Sep-2004 |
marcel |
Move the IA-32 trap handling from trap() to ia32_trap(). Move the ia32_syscall() function along with it to ia32_trap.c. When COMPAT_IA32 is not defined, we'll raise SIGEMT instead.
|
#
135405 |
|
17-Sep-2004 |
marcel |
Provide our own FPSWA definitions, instead of depending on the Intel EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers conflict with the Intel ACPI headers (duplicate type definitions), so are being phased out in the kernel.
|
#
134571 |
|
31-Aug-2004 |
julian |
Remove an unneeded argument.. The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync.
MFC after: 3 days
|
#
134568 |
|
31-Aug-2004 |
julian |
Remove sched_free_thread() which was only used in diagnostics. It has outlived its usefulness and has started causing panics for people who turn on DIAGNOSTIC, in what is otherwise good code.
MFC after: 2 days
|
#
133291 |
|
07-Aug-2004 |
marcel |
Implement single stepping when we leave the kernel through the EPC syscall path. The basic problem is that we cannot set the single stepping flag directly, because we don't leave the kernel via an interrupt return. So, we need another way to set the single stepping flag. The way we do this is by enabling the lower-privilege transfer trap, which gets raised when we drop the privilege level. However, since we're still running in kernel space (sec), we're not yet done. We clear the lower- privilege transfer trap, enable the taken-branch trap and continue exiting the kernel until we branch into user space. Given the current code, there's a total of two traps this way before we can raise SIGTRAP.
|
#
131945 |
|
10-Jul-2004 |
marcel |
Update for the KDB framework: o ksym_start and ksym_end changed type to vm_offset_t. o Make debugging support conditional upon KDB instead of DDB. o Call kdb_enter() instead of breakpoint(). o Remove implementation of Debugger(). o Call kdb_trap() according to the new world order.
unwinder: o s/db_active/kdb_active/g o Various s/ddb/kdb/g o Add support for unwinding from the PCB as well as the trapframe. Abuse a spare field in the special register set to flag whether the PCB was actually constructed from a trapframe so that we can make the necessary adjustments.
md_var.h: o Add RSE convenience macros. o Add ia64_bsp_adjust() to add or subtract from BSP while taking NaT collections into account.
|
#
131838 |
|
08-Jul-2004 |
marcel |
MFamd64 (1.275): Reduce the scope of the Giant lock being held for non-mpsafe syscalls. There was way too much code being covered.
|
#
131821 |
|
08-Jul-2004 |
marcel |
Better handle the break instruction trap. The runtime specification has outlined which break numbers are software interrupts, debugger breakpoints and ABI specific breaks. We mostly treated all break numbers we didn't care about as debugger breakpoints.
|
#
129023 |
|
07-May-2004 |
marcel |
Revert previous commit. We should not get any FP traps from within the kernel. We can guarantee this by resetting the FP status register. This masks all FP traps. The reason we did get FP traps was that we didn't reset the FP status register in all cases.
Make sure to reset the FP status register in syscall(). This is one of the places where it was forgotten.
While on the subject, reset the FP status register only when we trapped from user space.
|
#
128857 |
|
03-May-2004 |
marcel |
Floating-point faults and exceptions can happen in the kernel too. Do not panic when it happens; handle them.
Run into by: das
|
#
124739 |
|
20-Jan-2004 |
marcel |
Fix handling of FP traps: o For traps, the cr.iip register points to the next instruction to execute on interrupt return (modulo slot). Since we need to get the bundle of the instruction that caused the FP fault/trap, make sure we fetch the previous bundle if the next instruction is in fact the first in a bundle. o When we call the FPSWA handler, we need to tell it whether it's a trap or a fault (first argument). This was hardcoded to mean a fault.
Also, for FP faults, when a fault is converted to a trap, adjust the cr.iip and cr.ipsr registers to point to the next instruction. This makes sure that the SIGFPE handler gets a consistent state.
|
#
124737 |
|
20-Jan-2004 |
marcel |
s/framep/tf/g -- this normalizes on the use of tf to point to the trapframe and improves grep-ability.
|
#
123346 |
|
09-Dec-2003 |
marcel |
Don't panic for misalignment traps when the onfault handler is set. Not all transfers between kernel and user space are byte oriented and thus alignment safe. Especially fuword*() and suword*() are sensitive to alignment but in general more optimal than block copies. By catching the misalignment trap we avoid pessimizing the common case of properly aligned memory accesses which we would do if we were to use byte copies or adding tests for proper alignment.
Note that the expectation that the kernel produces aligned pointers is unchanged. This change therefore relates to possible unaligned pointers generated in userland.
|
#
122518 |
|
11-Nov-2003 |
marcel |
Further work-out the handling of the high FP registers. The most important change is in cpu_switch() where we disable the high FP registers for the thread that we switch-out if the CPU currently has its high FP registers. This avoids that the high FP registers remain enabled for the thread even when the CPU has unloaded them or the thread migrated to another processor. Likewise, when we switch-in a thread of that has its high FP registers on the CPU, we enable them. This avoids an otherwise harmless, but unnecessary trap to have them enabled.
The code that handles the disabled high FP trap (in trap()) has been turned into a critical section for the most part to avoid being preempted. If there's a race, we bail out and have the processor trap again if necessary.
Avoid using the generic ia64_highfp_save() function when the context is predictable. The function adds unnecessary overhead. Don't use ia64_highfp_load() for the same reason. The function is now unused and can be removed.
These changes make the lazy context switching of the high FP registers in an UP kernel functional.
|
#
121635 |
|
28-Oct-2003 |
marcel |
When switching the RSE to use the kernel stack as backing store, keep the RNAT bit index constant. The net effect of this is that there's no discontinuity WRT NaT collections which greatly simplifies certain operations. The cost of this is that there can be up to 504 bytes of unused stack between the true base of the kernel stack and the start of the RSE backing store. The cost of adjusting the backing store pointer to keep the RNAT bit index constant, for each kernel entry, is negligible.
The primary reasons for this change are: 1. Asynchronuous contexts in KSE processes have the disadvantage of having to copy the dirty registers from the kernel stack onto the user stack. The implementation we had so far copied the registers one at a time without calculating NaT collection values. A process that used speculation would not work. Now that the RNAT bit index is constant, we can block-copy the registers from the kernel stack to the user stack without having to worry about NaT collections. They will be in the right place on the user stack. 2. The ndirty field in the trapframe is now also usable in userland. This was previously not the case because ndirty also includes the space occupied by NaT collections. The value could be off by 8, depending on the discontinuity. Now that the RNAT bit index is contants, we have exactly the same number of NaT collection points on the kernel stack as we would have had on the user stack if we didn't switch backing stores. 3. Debuggers and other applications that use ptrace(2) can now copy the dirty registers from the kernel stack (using ptrace(2)) and copy them whereever they want them (onto the user stack of the inferior as might be the case for gdb) without having to worry about NaT collections in the same way the kernel doesn't have to worry about them.
There's a second order effect caused by the randomization of the base of the backing store, for it depends on the number of dirty registers the processor happened to have at the time of entry into the kernel. The second order effect is that the RSE will have a better cache utilization as compared to having the backing store always aligned at page boundaries. This has not been measured and may be in practice only minimally beneficial, if at all measurable.
|
#
121412 |
|
23-Oct-2003 |
marcel |
Remove prototype of unaligned_fixup() and fix a nearby style(9) bug.
|
#
120937 |
|
09-Oct-2003 |
robert |
Implement preliminary support for the PT_SYSCALL command to ptrace(2).
|
#
120250 |
|
19-Sep-2003 |
marcel |
Revamp trap(): make it more explicit which kinds of traps/faults we can get (or not) and what we do with them. This fixes the behaviour for NaT consumption and speculation faults in that we now don't panic for user faults.
Remove the dopanic label and move the code to a function. This makes it easier in the simulator to set a breakpoint.
While here, remove the special handling of the old break-based syscall path and move it to where we handle the break vector. While here, reserve a new break immediate for KSE. We currently use the old break- based syscall to deal with restoring async contexts. However, it has the side-effect of also setting the signal mask and callong ast() on the way out. The new break immediate simply restores the context and returns without calling ast().
|
#
119159 |
|
20-Aug-2003 |
marcel |
Undo the mistake made in revision 1.77 of trap.c and which was the ultimate trigger for the follow-up fixes in revisions 1.78, 1.80, 1.81 and 1.82 of trap.c. I was simply too pre-occupied with the gateway page and how it blurs kernel space with user space and vice versa that I couldn't see that it was all a load of bollocks.
It's not the IP address that matters, it's the privilege level that counts. We never run in user space with lifted permissions and we sure can not run in kernel space without it. Sure, the gateway page is the exception, but not if you look at the privilege level. It's user space if you run with user permissions and kernel space otherwise.
So, we're back to looking at the privilege level like it should be. There's no other way.
Pointy hat: marcel
|
#
118853 |
|
13-Aug-2003 |
marcel |
Don't use VM_MIN_KERNEL_ADDRESS to check if the faulting address is in user space or kernel space. VM_MIN_KERNEL_ADDRESS starts after the gateway page, which means that improper memory accesses to the gateway page while in user mode would panic the kernel. Use VM_MAX_ADDRESS instead. It ends before the gateway page. The difference between VM_MIN_KERNEL_ADDRESS and VM_MAX_ADDRESS is exactly the gateway page.
|
#
118811 |
|
12-Aug-2003 |
marcel |
Cleanup prototypes in cpu.h, including fswintrberr and any references to it. Sort the remaining prototypes in cpu.h.
No functional change.
|
#
117999 |
|
25-Jul-2003 |
marcel |
Remove __aligned(16) from the definition of struct _ia64_fpreg. It's a non-standard construct. Instead, redefine struct _ia64_fpreg as a union and put a long double in it. On ia64 and for LP64, this is defined by the ABI to have 16-byte alignment. For ILP32 a long double has 4-byte alignment, but we don't support ILP32.
Note that the in-memory image of a long double does not match the in- memory image of spilled FP registers. This means that one cannot use the fpr_flt field to interpet the bits. For this reason we continue to use an aggregate type.
|
#
117984 |
|
24-Jul-2003 |
marcel |
Disable the single-step trap on a debug related trap, including of course the single-step trap itself.
|
#
117499 |
|
13-Jul-2003 |
marcel |
Enable the high FP registers when we call the FPSWA handler and disable them again afterwards. This fixes a disabled FP fault while in the FPSWA handler. While here, merge the FP fault and FP trap handling code to reduce code duplication. Where code was different, it was not sure it should be.
Trigger case: ports/math/atlas
|
#
116361 |
|
14-Jun-2003 |
davidxu |
Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.
|
#
115936 |
|
07-Jun-2003 |
marcel |
If we get a fault in the gateway page, which would happen if we try to deliver a signal and the RSE backing store has been exhausted or the backing store pointer has been clobbered, we need to make sure we call userret() and do_ast() when we exit from trap(). Not adjusting the local variable 'user' in this case will prevent the faulty process from being terminated and we end up in an infinite fault repetition.
Faulty process provided by: bento
|
#
115916 |
|
06-Jun-2003 |
marcel |
Use TRAPF_USERMODE() to replace an equivalent check in trap(). While here, amend the related comment.
|
#
115573 |
|
31-May-2003 |
marcel |
Now that we have the signal trampolines in the gateway page and the gateway page is considered kernel space, we can panic when we should only SIGSEGV. Hence, add the additional constraint that for page faults we also require running with kernel privileges. The gateway page is the only kernel code running with user privileges, iso this is a correct way to exclude the gateway page from kernel land.
We do not currently exclude the gateway page for other faults as it is not always the right way to do it. Further tuning will happen on a case by case bases.
|
#
115376 |
|
29-May-2003 |
marcel |
Fix what I think is a cut-n-paste bug: use OID_AUTO for the print_usertrap sysctl instead of CPU_UNALIGNED_PRINT. The latter is used already.
Approved by: re@ (blanket)
|
#
115298 |
|
24-May-2003 |
marcel |
Now that we define user mode as any IP address that isn't in the kernel's VA regions, we cannot limit the use of break-based syscalls to user mode only. The signal trampolines are in the gateway page, which is mapped into the process address space in region 5 and thus is kernel space.
We don't special case the gateway page here. Allow break-based syscalls from anywhere in the kernel VA space.
Approved by: re@ (blanket)
|
#
115294 |
|
24-May-2003 |
marcel |
Consistently us the same metric to differentiate between kernel mode and user mode. We need to take into account that the EPC syscall path introduces a grey area in which one can argue either way, including a third: neither.
We now use the region in which the IP address lies. Regions 5, 6 and 7 are kernel VA regions and if the IP lies any any of those regions we assume we're in kernel mode. Hence, we can be in kernel mode even if we're not on the kernel stack and/or have user privileges. There're gremlins living in the twilight zone :-)
For the EPC syscall path this particularly means that the process leaves user mode the moment it calls into the gateway page. This makes the most sense because from a process' point of view the call represents a request to the kernel for some service and that service has been performed if the call returns. With the metric we picked, this also means that we're back in user mode IFF the call returns.
Approved by: re@ (blanket)
|
#
115084 |
|
16-May-2003 |
marcel |
Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h).
Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx)
Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented.
Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use.
Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood.
This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all.
Approved by: re@ (jhb)
|
#
114305 |
|
30-Apr-2003 |
jhb |
Range check the syscall number before looking it up in the syscallnames[] array.
Submitted by: pho
|
#
113833 |
|
22-Apr-2003 |
davidxu |
Remove single threading detecting code, these code really should be replaced by thread_user_enter(), but current we don't want to enable this in trap.
|
#
113686 |
|
18-Apr-2003 |
jhb |
Use the proc lock to protect p_singlethread and a P_WEXIT test. This fixes a couple of potential KSE panics on non-i386 arch's that weren't holding the proc lock when calling thread_exit().
|
#
112883 |
|
31-Mar-2003 |
jeff |
- Change trapsignal() to accept a thread and not a proc. - Change all consumers to pass in a thread.
Right now this does not cause any functional changes but it will be important later when signals can be delivered to specific threads.
|
#
111883 |
|
04-Mar-2003 |
jhb |
Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().
|
#
111587 |
|
27-Feb-2003 |
davidxu |
Needn't kse.h
|
#
111585 |
|
27-Feb-2003 |
julian |
Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
|
#
111024 |
|
17-Feb-2003 |
jeff |
- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into the proc. These counters are only examined through calcru.
Submitted by: davidxu Tested on: x86, alpha, UP/SMP
|
#
110190 |
|
01-Feb-2003 |
julian |
Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but them's the rules..
I am using my "David's mentor" hat to revert this as he's offline for a while.
|
#
109877 |
|
26-Jan-2003 |
davidxu |
Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone.
A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary.
Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created.
Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed.
KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another.
When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides.
The code hasn't been tested under SMP by author due to lack of hardware.
Reviewed by: julian
|
#
105900 |
|
24-Oct-2002 |
julian |
Extract out KSE specific code from machine specific code so that there is ony one copy of it. Fix that one copy so that KSEs with no mailbox in a KSE program are not a cause of page faults (this can legitmatly happen).
Submitted by: (parts) davidxu
|
#
104425 |
|
03-Oct-2002 |
peter |
Update for post-kseIII
|
#
102560 |
|
29-Aug-2002 |
jake |
Fixed printf format errors.
|
#
99900 |
|
13-Jul-2002 |
mini |
Add additional cred_free_thread() calls that I had missed the first time.
Pointed out by: jhb
|
#
99095 |
|
29-Jun-2002 |
julian |
Fix reverse ordering of locks. add a comment about locks on some platforms.
Submitted by: jhb@freebsd.org
|
#
99081 |
|
29-Jun-2002 |
julian |
Add KSE stubs to MD parts of ia64 code. Dfr will fill these out when we decide to enable KSEs on ia64 (probably not immediatly)
|
#
99072 |
|
29-Jun-2002 |
julian |
Part 1 of KSE-III
The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
98727 |
|
24-Jun-2002 |
mini |
Remove unused diagnostic function cread_free_thread().
Approved by: alfred
|
#
98474 |
|
20-Jun-2002 |
peter |
ia32 %edx return comes from td_retval[1], not td_retval[0]
Obtained from: dfr
|
#
98473 |
|
20-Jun-2002 |
peter |
Use suword32/64 and fuword32/64 like elsewhere instead of inventing suhword/fuhword.
|
#
98001 |
|
07-Jun-2002 |
jhb |
- Fixup / remove obsolete comments. - ktrace no longer requires Giant so do ktrace syscall events before and after acquiring and releasing Giant, respectively. - For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the goto bad hack and instead use the model on ia64 and alpha were we skip the actual syscall invocation if error != 0. This fixes a bug where if we the copyin() of the arguments failed for a syscall that was not marked MP safe, we would try to release Giant when we had not acquired it.
|
#
96961 |
|
19-May-2002 |
marcel |
Fix a kernel page fault when accessing user memory. We were combining too much conditions and as such ended up with the kernel map instead of the corresponding process map. While here, remove code to allow access to the stackgap and restyle slightly to improve readability.
This fix specifically fixes the procfs failure we're having when reading the process map (cat /proc/curproc/map)
|
#
96912 |
|
19-May-2002 |
marcel |
o Remove namespace pollution from param.h: - Don't include ia64_cpu.h and cpu.h - Guard definitions by _NO_NAMESPACE_POLLUTION - Move definition of KERNBASE to vmparam.h
o Move definitions of IA64_RR_{BASE|MASK} to vmparam.h o Move definitions of IA64_PHYS_TO_RR{6|7} to vmparam.h
o While here, remove some left-over Alpha references.
|
#
95019 |
|
19-Apr-2002 |
alc |
o Remove vm_map_growstack() from ia64's trap_pfault(). o Remove the acquisition and release of Giant from ia64's trap_pfault(). (vm_fault() still acquires it.)
|
#
94819 |
|
16-Apr-2002 |
alc |
Remove code that updates vm->vm_ssize. This duplicates work already performed by vm_map_growstack().
|
#
94495 |
|
12-Apr-2002 |
dfr |
Print extra information in printtrap() if the interrupted state was for an IA-32 process. Don't sign extend arguments in ia32_syscall - its not normally going to be useful (e.g. pointers need to be zero extended).
|
#
94376 |
|
10-Apr-2002 |
dfr |
Add exception and syscall support for executing IA-32 binaries.
|
#
93761 |
|
04-Apr-2002 |
alc |
o Kill the MD grow_stack(). Call the MI vm_map_growstack() in its place. o Eliminate the use of useracc() and grow_stack() from sendsig().
Reviewed by: peter
|
#
92824 |
|
20-Mar-2002 |
jhb |
Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined. Instead of caching the ucred reference, just go ahead and eat the decerement and increment of the refcount. Now that Giant is pushed down into crfree(), we no longer have to get Giant in the common case. In the case when we are actually free'ing the ucred, we would normally free it on the next kernel entry, so the cost there is not new, just in a different place. This also removse td_cache_ucred from struct thread. This is still only done #ifdef DIAGNOSTIC.
Tested on: i386, alpha
|
#
91504 |
|
28-Feb-2002 |
arr |
- Move a comment from being on the same line as a #ifdef to the line following it. This should have gone in the previous commit, but misviewed Bruce's patch.
Requested by: bde
|
#
91475 |
|
28-Feb-2002 |
arr |
- Fix panic() message and a couple style nits that snuck in from the recent diagnostics commit (rev. 1.84).
|
#
91090 |
|
22-Feb-2002 |
julian |
Add some DIAGNOSTIC code. While in userland, keep the thread's ucred reference in a shadow field so that the usual place to store it is NULL. If DIAGNOSTIC is not set, the thread ucred is kept valid until the next kernel entry, at which time it is checked against the process cred and possibly corrected. Produces a BIG speedup in kernels with INVARIANTS set. (A previous commit corrected it for the non INVARIANTS case already)
Reviewed by: dillon@freebsd.org
|
#
90892 |
|
19-Feb-2002 |
julian |
Duplicate the changes to i386 to keep creds over the user boundary.
|
#
86593 |
|
19-Nov-2001 |
peter |
s/code/ucode/ (last minute typo)
|
#
86592 |
|
19-Nov-2001 |
peter |
Initial cut at calling the EFI-provided FPSWA (Floating Point Software Assist) driver to handle the "messy" floating point cases which cause traps to the kernel for handling.
|
#
86212 |
|
09-Nov-2001 |
dfr |
Raise SIGILL for General Exceptions - its closer to the correct meaning.
|
#
85525 |
|
26-Oct-2001 |
jhb |
Add a per-thread ucred reference for syscalls and synchronous traps from userland. The per thread ucred reference is immutable and thus needs no locks to be read. However, until all the proc locking associated with writes to p_ucred are completed, it is still not safe to use the per-thread reference.
Tested on: x86 (SMP), alpha, sparc64
|
#
85438 |
|
24-Oct-2001 |
dfr |
If we get an unhandled page fault in kernel mode, either panic (if pcb_onfault is not set) or arrange to restart at the location in pcb_onfault.
This ought to help the stability of a system under moderate load. It certainly stops DDB from hanging the kernel when it tries to access a non-present page.
|
#
85284 |
|
21-Oct-2001 |
dfr |
Set ar.fpsr to something sane before trying to handle a trap - the user might have trashed it.
|
#
85199 |
|
19-Oct-2001 |
dfr |
Make a start at an unaligned trap handler. Only integer loads and stores are handled so far.
|
#
85195 |
|
19-Oct-2001 |
dfr |
Translate various userland traps into SIGBUS (instead of just panicing).
|
#
85088 |
|
18-Oct-2001 |
marcel |
Fix typos in previous commit:
o s/sys_narg/sy_narg/ o s/SYS_MPSAFE/SYF_MPSAFE/
|
#
85082 |
|
17-Oct-2001 |
jhb |
- Small cleanups to the Giant handling in trap(). - Only release Giant in trap() if we locked it, otherwise we could release Giant in a kernel trap if we didn't get it for a page fault and the previous frame had grabbed the lock. - Only get Giant for !MP safe syscalls.
|
#
84839 |
|
12-Oct-2001 |
dfr |
If the faulting instruction is a cmpxchg, then isr.w and isr.r will both be set. We need to check isr.w before isr.r so that we can correctly handle a cmpxchg to a copy-on-write page.
This fixes the hang-after-fork problem for dynamically linked programs.
|
#
84677 |
|
08-Oct-2001 |
dfr |
Make printtrap() more informative.
|
#
83733 |
|
20-Sep-2001 |
dfr |
Don't clear the single-step bit after a trap - leave it up to the debugger. The code was broken anyway - it clear every bit *except* the single-step bit (oops).
|
#
83506 |
|
15-Sep-2001 |
dfr |
Fill out some gaps in ia64 DDB support. This involves generalising DDB's breakpoint handling slightly to cope with the fact that ia64 instructions are not located on byte boundaries.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
81493 |
|
10-Aug-2001 |
jhb |
- Close races with signals and other AST's being triggered while we are in the process of exiting the kernel. The ast() function now loops as long as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with preemption disabled so that any further AST's that arrive via an interrupt will be delayed until the low-level MD code returns to user mode. - Use u_int's to store the tick counts for profiling purposes so that we do not need sched_lock just to read p_sticks. This also closes a problem where the call to addupc_task() could screw up the arithmetic due to non-atomic reads of p_sticks. - Axe need_proftick(), aston(), astoff(), astpending(), need_resched(), clear_resched(), and resched_wanted() in favor of direct bit operations on p_sflag. - Fix up locking with sched_lock some. In addupc_intr(), use sched_lock to ensure pr_addr and pr_ticks are updated atomically with setting PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing PS_OWEUPC. We also do not grab the lock just to test a flag. - Simplify the handling of Giant in ast() slightly.
Reviewed by: bde (mostly)
|
#
81255 |
|
07-Aug-2001 |
jhb |
Grab Giant arond page faults. ia64 boots again in the simulator now.
|
#
78983 |
|
29-Jun-2001 |
jhb |
Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.
|
#
78962 |
|
29-Jun-2001 |
jhb |
Add a new MI pointer to the process' trapframe p_frame instead of using various differently named pointers buried under p_md.
Reviewed by: jake (in principle)
|
#
78762 |
|
25-Jun-2001 |
jhb |
Fix cut-n-paste brain-o.
Pointy-hat to: me
|
#
78636 |
|
22-Jun-2001 |
jhb |
- Grab the proc lock around CURSIG and postsig(). Don't release the proc lock until after grabbing the sched_lock to avoid CURSIG racing with psignal. - Don't grab Giant for addupc_task() as it isn't needed.
Reported by: tegge (signal race), bde (addupc_task a while back)
|
#
77796 |
|
05-Jun-2001 |
jhb |
Don't hold sched_lock across addupc_task().
Reported by: David Taylor <davidt@yadt.co.uk> Submitted by: bde
|
#
77448 |
|
29-May-2001 |
jhb |
- Catch up to the VM mutex changes. - Sort includes in a few places.
|
#
76494 |
|
11-May-2001 |
jhb |
Simplify the vm fault trap handling code a bit by using if-else instead of duplicating code in the then case and then using a goto to jump around the else case.
|
#
76078 |
|
27-Apr-2001 |
jhb |
Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the *_process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the *_process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the *_process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the *_process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter Looked over by: eivind
|
#
73931 |
|
07-Mar-2001 |
jhb |
- Release Giant a bit earlier on syscall exit. - Don't try to grab Giant before postsig() in userret() as it is no longer needed. - Don't grab Giant before psignal() in ast() but get the proc lock instead.
|
#
72990 |
|
24-Feb-2001 |
jhb |
Don't include machine/mutex.h and relocate sys/mutex.h's include to be closer to alphabetical order and identical to that of the alpha.
|
#
72903 |
|
22-Feb-2001 |
jhb |
Synch up with the other architectures: - Remove unneeded spl()'s around mi_switch() in userret(). - Don't hold sched_lock across addupc_task(). - Remove the MD function child_return() now that the MI function fork_return() is used instead. - Use TRAPF_USERMODE() instead of dinking with the trapframe directly to check for ast's in kernel mode. - Check astpending(curproc) and resched_wanted() in ast() and return if neither is true. - Use astoff() rather than setting the non-existent per-cpu variable astpending to 0 to clear an ast.
|
#
72884 |
|
22-Feb-2001 |
jhb |
Catch up to new MI astpending and need_resched handling.
|
#
72376 |
|
11-Feb-2001 |
jake |
Implement a unified run queue and adjust priority levels accordingly.
- All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
|
#
72200 |
|
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
#
71595 |
|
24-Jan-2001 |
dfr |
Fix typo.
|
#
71553 |
|
24-Jan-2001 |
jhb |
- Proc locking. - Update userret() to take a struct trapframe * as a second argument. - Axe have_giant and use mtx_owned(&Giant) where appropriate.
|
#
70507 |
|
30-Dec-2000 |
dfr |
Fix typo.
|
#
69881 |
|
11-Dec-2000 |
jake |
- Add code to detect if a system call returns with locks other than Giant held and panic if so (conditional on witness). - Change witness_list to return the number of locks held so this is easier. - Add kern/syscalls.c to the kernel build if witness is defined so that the panic message can contain the name of the offending system call. - Add assertions that Giant and sched_lock are not held when returning from a system call, which were missing for alpha and ia64.
|
#
68862 |
|
17-Nov-2000 |
jake |
- Split the run queue and sleep queue linkage, so that a process may block on a mutex while on the sleep queue without corrupting it. - Move dropping of Giant to after the acquire of sched_lock.
Tested by: John Hay <jhay@icomtek.csir.co.za> jhb
|
#
68808 |
|
16-Nov-2000 |
jhb |
Don't release and acquire Giant in mi_switch(). Instead, release and acquire Giant as needed in functions that call mi_switch(). The releases need to be done outside of the sched_lock to avoid potential deadlocks from trying to acquire Giant while interrupts are disabled.
Submitted by: witness
|
#
67636 |
|
26-Oct-2000 |
dfr |
Minor build fixes.
|
#
67199 |
|
16-Oct-2000 |
dfr |
* Correct some of my misunderstandings about how best to switch to the kernel backing store. * Implement syscalls via break instructions. * Fix backing store copying in cpu_fork() so that the child gets the right register values.
This thing is actually starting to work now. This set of changes takes me up to the second execve (the one which runs the first shell). Next stop single-user mode :-).
|
#
67020 |
|
12-Oct-2000 |
dfr |
* Fix exception handling so that it actually works. We can now handle exceptions from both kernel and user mode. * Fix context switching so that we can switch back to a proc which we switched away from (we were saving the state in the wrong place). * Implement lazy switching of the high-fp state. This needs to be looked at again for SMP to cope with the case of a process migrating from one processor to another while it has the high-fp state. * Make setregs() work properly. I still think this should be called cpu_exec() or something. * Various other minor fixes.
With this lot, we can execve() /sbin/init and we get all the way up to its first syscall. At that point, we stop because syscall handling is not done yet.
|
#
66633 |
|
04-Oct-2000 |
dfr |
Next round of fixes to the ia64 code. This includes simulated clock and disk drivers along with a load of fixes to context switching, fork handling and a load of other stuff I can't remember now. This takes us as far as start_init() before it dies. I guess now I will have to finish off the VM system and syscall handling :-).
|
#
66458 |
|
29-Sep-2000 |
dfr |
This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will not work on any real hardware (or fully work on any simulator). Much more needs to happen before this is actually functional but its nice to see the FreeBSD copyright message appear in the ia64 simulator.
|