History log of /freebsd-10.1-release/sys/ia64/ia64/trap.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 270296 21-Aug-2014 emaste

MFC r263815, r263872:

Move ia64 efi.h to sys in preparation for amd64 UEFI support

Prototypes specific to ia64 have been left in this file for now, under
__ia64__, rather than moving them to a new header under sys/ia64.
I anticipate that (some of) the corresponding functions will be shared
by the amd64, arm64, i386, and ia64 architectures, and we can adjust
this as EFI support on other than ia64 continues to develop.

Fix missed efi.h header change in r263815

Sponsored by: The FreeBSD Foundation


# 268200 02-Jul-2014 marcel

MFC r263323: Fix and improve exception tracing.


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 240244 08-Sep-2012 attilio

userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by: kib
MFC after: 1 week


# 225474 11-Sep-2011 kib

Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by: jhb
Approved by: re (bz)
MFC after: 2 weeks


# 219741 18-Mar-2011 marcel

Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about
the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though
it's not used anywhere in the source tree.


# 211515 19-Aug-2010 jhb

Remove unused KTRACE includes.


# 208514 24-May-2010 kib

Change ia64' struct syscall_args definition so that args is a pointer to
the arguments array instead of array itself. ia64 syscall arguments are
readily available in the frame, point args to it, do not do unnecessary
bcopy. Still reserve the array in syscall_args for ia32 emulation.

Suggested and reviewed by: marcel
MFC after: 1 month


# 208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


# 205713 26-Mar-2010 marcel

Rename disable_intr() to ia64_disable_intr() and rename enable_intr()
to ia64_enable_intr(). This reduces confusion with intr_disable() and
intr_restore().

Have configure_final() call ia64_finalize_intr() instead of enable_intr()
in preparation of adding support for binding interrupts to all CPUs.


# 203572 06-Feb-2010 marcel

Fix single-stepping when the kernel was entered through the EPC syscall
path. When the taken branch leaves the kernel and enters the process,
we still need to execute the instruction at that address. Don't raise
SIGTRAP when we branch into the process, but enable single-stepping
instead.


# 199868 27-Nov-2009 alc

Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY. The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with: kib


# 199566 20-Nov-2009 marcel

Add a seatbelt to the Nested TLB Fault handler to give us a chance
to panic when we have an unexpected TLB fault while interrupt
collection is disabled. Use a token rather than the actual address
of the restart point to avoid the need for the movl instruction.
The token is arbitrary. For the drummers: it's based on a single
paradiddle.


# 199135 10-Nov-2009 kib

Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by: marcel
Reviewed by: marcel, davidxu
PowerPC, ARM, ia64 changes: marcel
Sparc64 tested and reviewed by: marius, also sunv reviewed
MIPS tested by: gonzo
MFC after: 1 month


# 198733 31-Oct-2009 marcel

Reimplement the lazy FP context switching:
o Move all code into a single file for easier maintenance.
o Use a single global lock to avoid having to handle either
multiple locks or race conditions.
o Make sure to disable the high FP registers after saving
or dropping them.
o use msleep() to wait for the other CPU to save the high
FP registers.

This change fixes the high FP inconsistency panics.

A single global lock typically serializes too much, which may
be noticable when a lot of threads use the high FP registers,
but in that case it's probably better to switch the high FP
context synchronuously. Put differently: cpu_switch() should
switch the high FP registers if the incoming and outgoing
threads both use the high FP registers.


# 177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 173600 14-Nov-2007 julian

generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.


# 170291 04-Jun-2007 attilio

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


# 169814 21-May-2007 marcel

When speculation fails (as determined by the chk instruction) the
processor is to jump to recovery code. This branching behaviour
may not be implemented by the processor and a Speculative Operation
fault is raised. The OS is responsible to emulate the branch.
Implement this, because GCC 4.2 uses advanced loads regularly.


# 167352 09-Mar-2007 mohans

Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.


# 163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 162361 16-Sep-2006 rwatson

Add audit hooks for ppc, ia64 system call paths.

Reviewed by: marcel (ia64)
Obtained from: TrustedBSD Project
MFC after: 3 days


# 160801 28-Jul-2006 jhb

Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.


# 160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


# 160773 27-Jul-2006 jhb

Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
section.
- Add a new explicit check before userret() to make sure we don't return
with any locks held. The advantage here is that we can include the
syscall number and name in syscall() whereas that info is not available
in userret().
- Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by
the more general checks just added.

MFC after: 2 weeks


# 160770 27-Jul-2006 jhb

Add KTR_SYSC tracing to the syscall() implementations that didn't have it
yet.

MFC after: 1 week


# 160040 29-Jun-2006 marcel

Partial support for branch long emulation. This only emulates the
branch long jump and not the branch long call. Support for that is
forthcoming.


# 158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


# 155455 08-Feb-2006 phk

Simplify system time accounting for profiling.

Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.


# 151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


# 149915 09-Sep-2005 marcel

Change the High FP lock from a sleep lock to a spin lock. We can
take the lock from interrupt context, which causes an implicit
lock order reversal. We've been using the lock carefully enough
that making it a spin lock should not be harmful.


# 148807 06-Aug-2005 marcel

Improve SMP support:
o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
uses to look up translations it can't find in the TLB. As such,
the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
and best results are obtained when it's not shared between CPUs.
The collision chain (i.e. the hash bucket) is shared between CPUs,
as all buckets together constitute our collection of PTEs. To
achieve this, the collision chain does not point to the first PTE
in the list anymore, but to a hash bucket head structure. The
head structure contains the pointer to the first PTE in the list,
as well as a mutex to lock the bucket. Thus, each bucket is locked
independently of each other. With at least 1024 buckets in the VHPT,
this provides for sufficiently finei-grained locking to make the
ssolution scalable to large SMP machines.
o Add synchronisation to the lazy FP context switching. We do this
with a seperate per-thread lock. On SMP machines the lazy high FP
context switching without synchronisation caused inconsistent
state, which resulted in a panic. Since the use of the high FP
registers is not common, it's possible that races exist. The ia64
package build has proven to be a good stress test, so this will
get plenty of exercise in the near future.
o Don't use the local ID of the processor we want to send the IPI to
as the argument to ipi_send(). use the struct pcpu pointer instead.
The reason for this is that IPI delivery is unreliable. It has been
observed that sending an IPI to a CPU causes it to receive a stray
external interrupt. As such, we need a way to make the delivery
reliable. The intended solution is to queue requests in the target
CPU's per-CPU structure and use a single IPI to inform the CPU that
there's a new entry in the queue. If that IPI gets lost, the CPU
can check it's queue at any convenient time (such as for each
clock interrupt). This also allows us to send requests to a CPU
without interrupting it, if such would be beneficial.

With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.

The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.


# 147640 27-Jun-2005 marcel

Handle B-unit break instructions. The break.b is unique in that the
immediate is not saved by the architecture. Any of the break.{mifx}
instructions have their immediate saved in cr.iim on interruption.
Consequently, when we handle the break interrupt, we end up with a
break value of 0 when it was a break.b. The immediate is important
because it distinguishes between different uses of the break and
which are defined by the runtime specification.
The bottomline is that when the GNU debugger replaces a B-unit
instruction with a break instruction in the inferior, we would not
send the process a SIGTRAP when we encounter it, because the value
is not one we recognize as a debugger breakpoint.

This change adds logic to decode the bundle in which the break
instruction lives whenever the break value is 0. The assumption
being that it's a break.b and we fetch the immediate directly out
of the instruction. If the break instruction was not a break.b,
but any of break.{mifx} with an immediate of 0, we would be doing
unnecessary work. But since a break 0 is invalid, this is not a
problem and it will still result in a SIGILL being sent to the
process.

Approved by: re (scottl)


# 147639 27-Jun-2005 marcel

Replace the existing copyright notice with my own. Over the years I've
changed this file so much that it's equivalent to a rewrite, and I'm not
talking about any of the cosmetic changes of course.

Approved by: re (scottl)


# 147638 27-Jun-2005 marcel

Cosmetic: s/u_int64_t/uint64_t/g

Approved by: re (scottl)


# 144971 12-Apr-2005 jhb

Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.


# 139790 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


# 135783 25-Sep-2004 marcel

Move the IA-32 trap handling from trap() to ia32_trap(). Move the
ia32_syscall() function along with it to ia32_trap.c. When COMPAT_IA32
is not defined, we'll raise SIGEMT instead.


# 135405 17-Sep-2004 marcel

Provide our own FPSWA definitions, instead of depending on the Intel
EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers
conflict with the Intel ACPI headers (duplicate type definitions), so
are being phased out in the kernel.


# 134571 31-Aug-2004 julian

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


# 134568 31-Aug-2004 julian

Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after: 2 days


# 133291 07-Aug-2004 marcel

Implement single stepping when we leave the kernel through the EPC syscall
path. The basic problem is that we cannot set the single stepping flag
directly, because we don't leave the kernel via an interrupt return. So,
we need another way to set the single stepping flag.
The way we do this is by enabling the lower-privilege transfer trap, which
gets raised when we drop the privilege level. However, since we're still
running in kernel space (sec), we're not yet done. We clear the lower-
privilege transfer trap, enable the taken-branch trap and continue exiting
the kernel until we branch into user space.
Given the current code, there's a total of two traps this way before
we can raise SIGTRAP.


# 131945 10-Jul-2004 marcel

Update for the KDB framework:
o ksym_start and ksym_end changed type to vm_offset_t.
o Make debugging support conditional upon KDB instead of DDB.
o Call kdb_enter() instead of breakpoint().
o Remove implementation of Debugger().
o Call kdb_trap() according to the new world order.

unwinder:
o s/db_active/kdb_active/g
o Various s/ddb/kdb/g
o Add support for unwinding from the PCB as well as the trapframe.
Abuse a spare field in the special register set to flag whether
the PCB was actually constructed from a trapframe so that we can
make the necessary adjustments.

md_var.h:
o Add RSE convenience macros.
o Add ia64_bsp_adjust() to add or subtract from BSP while taking
NaT collections into account.


# 131838 08-Jul-2004 marcel

MFamd64 (1.275):
Reduce the scope of the Giant lock being held for non-mpsafe syscalls.
There was way too much code being covered.


# 131821 08-Jul-2004 marcel

Better handle the break instruction trap. The runtime specification
has outlined which break numbers are software interrupts, debugger
breakpoints and ABI specific breaks. We mostly treated all break
numbers we didn't care about as debugger breakpoints.


# 129023 07-May-2004 marcel

Revert previous commit. We should not get any FP traps from within
the kernel. We can guarantee this by resetting the FP status register.
This masks all FP traps. The reason we did get FP traps was that we
didn't reset the FP status register in all cases.

Make sure to reset the FP status register in syscall(). This is one of
the places where it was forgotten.

While on the subject, reset the FP status register only when we trapped
from user space.


# 128857 03-May-2004 marcel

Floating-point faults and exceptions can happen in the kernel too.
Do not panic when it happens; handle them.

Run into by: das


# 124739 20-Jan-2004 marcel

Fix handling of FP traps:
o For traps, the cr.iip register points to the next instruction to
execute on interrupt return (modulo slot). Since we need to get
the bundle of the instruction that caused the FP fault/trap, make
sure we fetch the previous bundle if the next instruction is in
fact the first in a bundle.
o When we call the FPSWA handler, we need to tell it whether it's
a trap or a fault (first argument). This was hardcoded to mean a
fault.

Also, for FP faults, when a fault is converted to a trap, adjust the
cr.iip and cr.ipsr registers to point to the next instruction. This
makes sure that the SIGFPE handler gets a consistent state.


# 124737 20-Jan-2004 marcel

s/framep/tf/g -- this normalizes on the use of tf to point to the
trapframe and improves grep-ability.


# 123346 09-Dec-2003 marcel

Don't panic for misalignment traps when the onfault handler is set.
Not all transfers between kernel and user space are byte oriented
and thus alignment safe. Especially fuword*() and suword*() are
sensitive to alignment but in general more optimal than block copies.
By catching the misalignment trap we avoid pessimizing the common
case of properly aligned memory accesses which we would do if we
were to use byte copies or adding tests for proper alignment.

Note that the expectation that the kernel produces aligned pointers
is unchanged. This change therefore relates to possible unaligned
pointers generated in userland.


# 122518 11-Nov-2003 marcel

Further work-out the handling of the high FP registers. The most
important change is in cpu_switch() where we disable the high FP
registers for the thread that we switch-out if the CPU currently
has its high FP registers. This avoids that the high FP registers
remain enabled for the thread even when the CPU has unloaded them
or the thread migrated to another processor.
Likewise, when we switch-in a thread of that has its high FP
registers on the CPU, we enable them. This avoids an otherwise
harmless, but unnecessary trap to have them enabled.

The code that handles the disabled high FP trap (in trap()) has
been turned into a critical section for the most part to avoid
being preempted. If there's a race, we bail out and have the
processor trap again if necessary.

Avoid using the generic ia64_highfp_save() function when the
context is predictable. The function adds unnecessary overhead.
Don't use ia64_highfp_load() for the same reason. The function
is now unused and can be removed.

These changes make the lazy context switching of the high FP
registers in an UP kernel functional.


# 121635 28-Oct-2003 marcel

When switching the RSE to use the kernel stack as backing store, keep
the RNAT bit index constant. The net effect of this is that there's
no discontinuity WRT NaT collections which greatly simplifies certain
operations. The cost of this is that there can be up to 504 bytes of
unused stack between the true base of the kernel stack and the start
of the RSE backing store. The cost of adjusting the backing store
pointer to keep the RNAT bit index constant, for each kernel entry,
is negligible.

The primary reasons for this change are:
1. Asynchronuous contexts in KSE processes have the disadvantage of
having to copy the dirty registers from the kernel stack onto the
user stack. The implementation we had so far copied the registers
one at a time without calculating NaT collection values. A process
that used speculation would not work. Now that the RNAT bit index
is constant, we can block-copy the registers from the kernel stack
to the user stack without having to worry about NaT collections.
They will be in the right place on the user stack.
2. The ndirty field in the trapframe is now also usable in userland.
This was previously not the case because ndirty also includes the
space occupied by NaT collections. The value could be off by 8,
depending on the discontinuity. Now that the RNAT bit index is
contants, we have exactly the same number of NaT collection points
on the kernel stack as we would have had on the user stack if we
didn't switch backing stores.
3. Debuggers and other applications that use ptrace(2) can now copy
the dirty registers from the kernel stack (using ptrace(2)) and
copy them whereever they want them (onto the user stack of the
inferior as might be the case for gdb) without having to worry
about NaT collections in the same way the kernel doesn't have to
worry about them.

There's a second order effect caused by the randomization of the
base of the backing store, for it depends on the number of dirty
registers the processor happened to have at the time of entry into
the kernel. The second order effect is that the RSE will have a
better cache utilization as compared to having the backing store
always aligned at page boundaries. This has not been measured and
may be in practice only minimally beneficial, if at all measurable.


# 121412 23-Oct-2003 marcel

Remove prototype of unaligned_fixup() and fix a nearby style(9)
bug.


# 120937 09-Oct-2003 robert

Implement preliminary support for the PT_SYSCALL command to ptrace(2).


# 120250 19-Sep-2003 marcel

Revamp trap(): make it more explicit which kinds of traps/faults we
can get (or not) and what we do with them. This fixes the behaviour
for NaT consumption and speculation faults in that we now don't panic
for user faults.

Remove the dopanic label and move the code to a function. This makes
it easier in the simulator to set a breakpoint.

While here, remove the special handling of the old break-based syscall
path and move it to where we handle the break vector. While here,
reserve a new break immediate for KSE. We currently use the old break-
based syscall to deal with restoring async contexts. However, it has
the side-effect of also setting the signal mask and callong ast() on
the way out. The new break immediate simply restores the context and
returns without calling ast().


# 119159 20-Aug-2003 marcel

Undo the mistake made in revision 1.77 of trap.c and which was the
ultimate trigger for the follow-up fixes in revisions 1.78, 1.80,
1.81 and 1.82 of trap.c. I was simply too pre-occupied with the
gateway page and how it blurs kernel space with user space and
vice versa that I couldn't see that it was all a load of bollocks.

It's not the IP address that matters, it's the privilege level that
counts. We never run in user space with lifted permissions and we
sure can not run in kernel space without it. Sure, the gateway page
is the exception, but not if you look at the privilege level. It's
user space if you run with user permissions and kernel space otherwise.

So, we're back to looking at the privilege level like it should be.
There's no other way.

Pointy hat: marcel


# 118853 13-Aug-2003 marcel

Don't use VM_MIN_KERNEL_ADDRESS to check if the faulting address is
in user space or kernel space. VM_MIN_KERNEL_ADDRESS starts after the
gateway page, which means that improper memory accesses to the gateway
page while in user mode would panic the kernel. Use VM_MAX_ADDRESS
instead. It ends before the gateway page. The difference between
VM_MIN_KERNEL_ADDRESS and VM_MAX_ADDRESS is exactly the gateway page.


# 118811 12-Aug-2003 marcel

Cleanup prototypes in cpu.h, including fswintrberr and any references
to it. Sort the remaining prototypes in cpu.h.

No functional change.


# 117999 25-Jul-2003 marcel

Remove __aligned(16) from the definition of struct _ia64_fpreg. It's
a non-standard construct. Instead, redefine struct _ia64_fpreg as a
union and put a long double in it. On ia64 and for LP64, this is
defined by the ABI to have 16-byte alignment. For ILP32 a long double
has 4-byte alignment, but we don't support ILP32.

Note that the in-memory image of a long double does not match the in-
memory image of spilled FP registers. This means that one cannot use
the fpr_flt field to interpet the bits. For this reason we continue
to use an aggregate type.


# 117984 24-Jul-2003 marcel

Disable the single-step trap on a debug related trap, including of
course the single-step trap itself.


# 117499 13-Jul-2003 marcel

Enable the high FP registers when we call the FPSWA handler and disable
them again afterwards. This fixes a disabled FP fault while in the FPSWA
handler.
While here, merge the FP fault and FP trap handling code to reduce code
duplication. Where code was different, it was not sure it should be.

Trigger case: ports/math/atlas


# 116361 14-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


# 115936 07-Jun-2003 marcel

If we get a fault in the gateway page, which would happen if we try
to deliver a signal and the RSE backing store has been exhausted or
the backing store pointer has been clobbered, we need to make sure
we call userret() and do_ast() when we exit from trap(). Not adjusting
the local variable 'user' in this case will prevent the faulty process
from being terminated and we end up in an infinite fault repetition.

Faulty process provided by: bento


# 115916 06-Jun-2003 marcel

Use TRAPF_USERMODE() to replace an equivalent check in trap(). While
here, amend the related comment.


# 115573 31-May-2003 marcel

Now that we have the signal trampolines in the gateway page and the
gateway page is considered kernel space, we can panic when we should
only SIGSEGV. Hence, add the additional constraint that for page
faults we also require running with kernel privileges. The gateway
page is the only kernel code running with user privileges, iso this
is a correct way to exclude the gateway page from kernel land.

We do not currently exclude the gateway page for other faults as it
is not always the right way to do it. Further tuning will happen on
a case by case bases.


# 115376 29-May-2003 marcel

Fix what I think is a cut-n-paste bug: use OID_AUTO for the
print_usertrap sysctl instead of CPU_UNALIGNED_PRINT. The
latter is used already.

Approved by: re@ (blanket)


# 115298 24-May-2003 marcel

Now that we define user mode as any IP address that isn't in the
kernel's VA regions, we cannot limit the use of break-based
syscalls to user mode only. The signal trampolines are in the
gateway page, which is mapped into the process address space in
region 5 and thus is kernel space.

We don't special case the gateway page here. Allow break-based
syscalls from anywhere in the kernel VA space.

Approved by: re@ (blanket)


# 115294 24-May-2003 marcel

Consistently us the same metric to differentiate between kernel mode
and user mode. We need to take into account that the EPC syscall path
introduces a grey area in which one can argue either way, including a
third: neither.

We now use the region in which the IP address lies. Regions 5, 6 and 7
are kernel VA regions and if the IP lies any any of those regions we
assume we're in kernel mode. Hence, we can be in kernel mode even if
we're not on the kernel stack and/or have user privileges. There're
gremlins living in the twilight zone :-)

For the EPC syscall path this particularly means that the process
leaves user mode the moment it calls into the gateway page. This
makes the most sense because from a process' point of view the call
represents a request to the kernel for some service and that service
has been performed if the call returns. With the metric we picked,
this also means that we're back in user mode IFF the call returns.

Approved by: re@ (blanket)


# 115084 16-May-2003 marcel

Revamp of the syscall path, exception and context handling. The
prime objectives are:
o Implement a syscall path based on the epc inststruction (see
sys/ia64/ia64/syscall.s).
o Revisit the places were we need to save and restore registers
and define those contexts in terms of the register sets (see
sys/ia64/include/_regset.h).

Secundairy objectives:
o Remove the requirement to use contigmalloc for kernel stacks.
o Better handling of the high FP registers for SMP systems.
o Switch to the new cpu_switch() and cpu_throw() semantics.
o Add a good unwinder to reconstruct contexts for the rare
cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o The EPC syscall doesn't preserve registers it does not need
to preserve and places the arguments differently on the stack.
This affects libc and truss.
o The address of the kernel page directory (kptdir) had to
be unstaticized for use by the nested TLB fault handler.
The name has been changed to ia64_kptdir to avoid conflicts.
The renaming affects libkvm.
o The trapframe only contains the special registers and the
scratch registers. For syscalls using the EPC syscall path
no scratch registers are saved. This affects all places where
the trapframe is accessed. Most notably the unaligned access
handler, the signal delivery code and the debugger.
o Context switching only partly saves the special registers
and the preserved registers. This affects cpu_switch() and
triggered the move to the new semantics, which additionally
affects cpu_throw().
o The high FP registers are either in the PCB or on some
CPU. context switching for them is done lazily. This affects
trap().
o The mcontext has room for all registers, but not all of them
have to be defined in all cases. This mostly affects signal
delivery code now. The *context syscalls are as of yet still
unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)


# 114305 30-Apr-2003 jhb

Range check the syscall number before looking it up in the syscallnames[]
array.

Submitted by: pho


# 113833 22-Apr-2003 davidxu

Remove single threading detecting code, these code really should be
replaced by thread_user_enter(), but current we don't want to enable
this in trap.


# 113686 18-Apr-2003 jhb

Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().


# 112883 31-Mar-2003 jeff

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


# 111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


# 111587 27-Feb-2003 davidxu

Needn't kse.h


# 111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


# 111024 17-Feb-2003 jeff

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


# 110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


# 109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


# 105900 24-Oct-2002 julian

Extract out KSE specific code from machine specific code
so that there is ony one copy of it. Fix that one copy
so that KSEs with no mailbox in a KSE program are not a cause
of page faults (this can legitmatly happen).

Submitted by: (parts) davidxu


# 104425 03-Oct-2002 peter

Update for post-kseIII


# 102560 29-Aug-2002 jake

Fixed printf format errors.


# 99900 13-Jul-2002 mini

Add additional cred_free_thread() calls that I had missed the first time.

Pointed out by: jhb


# 99095 29-Jun-2002 julian

Fix reverse ordering of locks. add a comment about locks on some platforms.

Submitted by: jhb@freebsd.org


# 99081 29-Jun-2002 julian

Add KSE stubs to MD parts of ia64 code.
Dfr will fill these out when we decide to enable KSEs on ia64
(probably not immediatly)


# 99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 98727 24-Jun-2002 mini

Remove unused diagnostic function cread_free_thread().

Approved by: alfred


# 98474 20-Jun-2002 peter

ia32 %edx return comes from td_retval[1], not td_retval[0]

Obtained from: dfr


# 98473 20-Jun-2002 peter

Use suword32/64 and fuword32/64 like elsewhere instead of inventing
suhword/fuhword.


# 98001 07-Jun-2002 jhb

- Fixup / remove obsolete comments.
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.


# 96961 19-May-2002 marcel

Fix a kernel page fault when accessing user memory. We were
combining too much conditions and as such ended up with the
kernel map instead of the corresponding process map. While
here, remove code to allow access to the stackgap and restyle
slightly to improve readability.

This fix specifically fixes the procfs failure we're having
when reading the process map (cat /proc/curproc/map)


# 96912 19-May-2002 marcel

o Remove namespace pollution from param.h:
- Don't include ia64_cpu.h and cpu.h
- Guard definitions by _NO_NAMESPACE_POLLUTION
- Move definition of KERNBASE to vmparam.h

o Move definitions of IA64_RR_{BASE|MASK} to vmparam.h
o Move definitions of IA64_PHYS_TO_RR{6|7} to vmparam.h

o While here, remove some left-over Alpha references.


# 95019 19-Apr-2002 alc

o Remove vm_map_growstack() from ia64's trap_pfault().
o Remove the acquisition and release of Giant from ia64's trap_pfault().
(vm_fault() still acquires it.)


# 94819 16-Apr-2002 alc

Remove code that updates vm->vm_ssize. This duplicates work already performed
by vm_map_growstack().


# 94495 12-Apr-2002 dfr

Print extra information in printtrap() if the interrupted state was for
an IA-32 process. Don't sign extend arguments in ia32_syscall - its not
normally going to be useful (e.g. pointers need to be zero extended).


# 94376 10-Apr-2002 dfr

Add exception and syscall support for executing IA-32 binaries.


# 93761 04-Apr-2002 alc

o Kill the MD grow_stack(). Call the MI vm_map_growstack()
in its place.
o Eliminate the use of useracc() and grow_stack() from sendsig().

Reviewed by: peter


# 92824 20-Mar-2002 jhb

Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount. Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case. In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place. This also removse td_cache_ucred from struct thread. This is
still only done #ifdef DIAGNOSTIC.

Tested on: i386, alpha


# 91504 28-Feb-2002 arr

- Move a comment from being on the same line as a #ifdef to the line
following it. This should have gone in the previous commit, but
misviewed Bruce's patch.

Requested by: bde


# 91475 28-Feb-2002 arr

- Fix panic() message and a couple style nits that snuck in from the
recent diagnostics commit (rev. 1.84).


# 91090 22-Feb-2002 julian

Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by: dillon@freebsd.org


# 90892 19-Feb-2002 julian

Duplicate the changes to i386 to keep creds over the user boundary.


# 86593 19-Nov-2001 peter

s/code/ucode/ (last minute typo)


# 86592 19-Nov-2001 peter

Initial cut at calling the EFI-provided FPSWA (Floating Point Software
Assist) driver to handle the "messy" floating point cases which
cause traps to the kernel for handling.


# 86212 09-Nov-2001 dfr

Raise SIGILL for General Exceptions - its closer to the correct meaning.


# 85525 26-Oct-2001 jhb

Add a per-thread ucred reference for syscalls and synchronous traps from
userland. The per thread ucred reference is immutable and thus needs no
locks to be read. However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on: x86 (SMP), alpha, sparc64


# 85438 24-Oct-2001 dfr

If we get an unhandled page fault in kernel mode, either panic (if
pcb_onfault is not set) or arrange to restart at the location in
pcb_onfault.

This ought to help the stability of a system under moderate load. It
certainly stops DDB from hanging the kernel when it tries to access a
non-present page.


# 85284 21-Oct-2001 dfr

Set ar.fpsr to something sane before trying to handle a trap - the user
might have trashed it.


# 85199 19-Oct-2001 dfr

Make a start at an unaligned trap handler. Only integer loads and stores
are handled so far.


# 85195 19-Oct-2001 dfr

Translate various userland traps into SIGBUS (instead of just panicing).


# 85088 18-Oct-2001 marcel

Fix typos in previous commit:

o s/sys_narg/sy_narg/
o s/SYS_MPSAFE/SYF_MPSAFE/


# 85082 17-Oct-2001 jhb

- Small cleanups to the Giant handling in trap().
- Only release Giant in trap() if we locked it, otherwise we could release
Giant in a kernel trap if we didn't get it for a page fault and the
previous frame had grabbed the lock.
- Only get Giant for !MP safe syscalls.


# 84839 12-Oct-2001 dfr

If the faulting instruction is a cmpxchg, then isr.w and isr.r will both
be set. We need to check isr.w before isr.r so that we can correctly
handle a cmpxchg to a copy-on-write page.

This fixes the hang-after-fork problem for dynamically linked programs.


# 84677 08-Oct-2001 dfr

Make printtrap() more informative.


# 83733 20-Sep-2001 dfr

Don't clear the single-step bit after a trap - leave it up to the
debugger. The code was broken anyway - it clear every bit *except* the
single-step bit (oops).


# 83506 15-Sep-2001 dfr

Fill out some gaps in ia64 DDB support. This involves generalising DDB's
breakpoint handling slightly to cope with the fact that ia64 instructions
are not located on byte boundaries.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 81493 10-Aug-2001 jhb

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


# 81255 07-Aug-2001 jhb

Grab Giant arond page faults. ia64 boots again in the simulator now.


# 78983 29-Jun-2001 jhb

Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.


# 78962 29-Jun-2001 jhb

Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by: jake (in principle)


# 78762 25-Jun-2001 jhb

Fix cut-n-paste brain-o.

Pointy-hat to: me


# 78636 22-Jun-2001 jhb

- Grab the proc lock around CURSIG and postsig(). Don't release the proc
lock until after grabbing the sched_lock to avoid CURSIG racing with
psignal.
- Don't grab Giant for addupc_task() as it isn't needed.

Reported by: tegge (signal race), bde (addupc_task a while back)


# 77796 05-Jun-2001 jhb

Don't hold sched_lock across addupc_task().

Reported by: David Taylor <davidt@yadt.co.uk>
Submitted by: bde


# 77448 29-May-2001 jhb

- Catch up to the VM mutex changes.
- Sort includes in a few places.


# 76494 11-May-2001 jhb

Simplify the vm fault trap handling code a bit by using if-else instead of
duplicating code in the then case and then using a goto to jump around
the else case.


# 76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


# 73931 07-Mar-2001 jhb

- Release Giant a bit earlier on syscall exit.
- Don't try to grab Giant before postsig() in userret() as it is no longer
needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.


# 72990 24-Feb-2001 jhb

Don't include machine/mutex.h and relocate sys/mutex.h's include to be
closer to alphabetical order and identical to that of the alpha.


# 72903 22-Feb-2001 jhb

Synch up with the other architectures:
- Remove unneeded spl()'s around mi_switch() in userret().
- Don't hold sched_lock across addupc_task().
- Remove the MD function child_return() now that the MI function
fork_return() is used instead.
- Use TRAPF_USERMODE() instead of dinking with the trapframe directly to
check for ast's in kernel mode.
- Check astpending(curproc) and resched_wanted() in ast() and return if
neither is true.
- Use astoff() rather than setting the non-existent per-cpu variable
astpending to 0 to clear an ast.


# 72884 22-Feb-2001 jhb

Catch up to new MI astpending and need_resched handling.


# 72376 11-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 71595 24-Jan-2001 dfr

Fix typo.


# 71553 24-Jan-2001 jhb

- Proc locking.
- Update userret() to take a struct trapframe * as a second argument.
- Axe have_giant and use mtx_owned(&Giant) where appropriate.


# 70507 30-Dec-2000 dfr

Fix typo.


# 69881 11-Dec-2000 jake

- Add code to detect if a system call returns with locks other than Giant
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.


# 68862 17-Nov-2000 jake

- Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb


# 68808 16-Nov-2000 jhb

Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by: witness


# 67636 26-Oct-2000 dfr

Minor build fixes.


# 67199 16-Oct-2000 dfr

* Correct some of my misunderstandings about how best to switch to the
kernel backing store.
* Implement syscalls via break instructions.
* Fix backing store copying in cpu_fork() so that the child gets the right
register values.

This thing is actually starting to work now. This set of changes takes me
up to the second execve (the one which runs the first shell). Next stop
single-user mode :-).


# 67020 12-Oct-2000 dfr

* Fix exception handling so that it actually works. We can now handle
exceptions from both kernel and user mode.
* Fix context switching so that we can switch back to a proc which we
switched away from (we were saving the state in the wrong place).
* Implement lazy switching of the high-fp state. This needs to be looked
at again for SMP to cope with the case of a process migrating from one
processor to another while it has the high-fp state.
* Make setregs() work properly. I still think this should be called
cpu_exec() or something.
* Various other minor fixes.

With this lot, we can execve() /sbin/init and we get all the way up to its
first syscall. At that point, we stop because syscall handling is not done
yet.


# 66633 04-Oct-2000 dfr

Next round of fixes to the ia64 code. This includes simulated clock and
disk drivers along with a load of fixes to context switching, fork
handling and a load of other stuff I can't remember now. This takes us as
far as start_init() before it dies. I guess now I will have to finish off
the VM system and syscall handling :-).


# 66458 29-Sep-2000 dfr

This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.