History log of /freebsd-9.3-release/sys/ia64/ia64/exception.S
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 223700 30-Jun-2011 marcel

Change the management of nested faults by switching to physical
addressing while reading or writing the trap frame. It's not
possible to guarantee that the one translation cache entry that
we depend on is not going to get purged by the CPU. We already
know that global shootdowns (ptc.g and/or ptc.ga) can (and will)
cause multiple TC entries to get purged and we initialize tried
to handle that by serializing kernel entry with these operations.
However, we need to serialize kernel exit as well.

But even if we can serialize, it appears that CPU threads within
a core can affect each other's TC entries beyond the global
shootdown. This would mean serializing any and all translatation
cache updates with the threads in a core with the kernel entry
and exit of any thread in that core. This is just too painful
and complicated.

Since we already properly coded for the 2 nested faults that we
can get, all we need to do is use those to obtain the physical
address of the trap frame, switch to physical mode and in that
way eliminate any further faults. The trap frame is already
aligned to 1KB boundaries to make sure we don't cross the page
boundary, this is safe to do.

We still need to serialize ptc.g or ptc.ga across CPUs because
the platform can only have 1 such operation outstanding at the
same time. We can now use a regular (spin) lock for this.

Also, it has been observed that we can get a nested TLB faults
for region 7 virtual addresses. This was unexpected. For now,
we enhance the nested TLB fault handler to deal with those as
well, but it needs to be understood.


# 221893 14-May-2011 marcel

Sharpening the saw:
o Clobber the register that holds the restart token immediately after
crossing the restart point. This prevents false positives (i.e. a
nested exception that we don't know can happen and that is being
treated as one we know by virtue of a lingering restart token).
o Now that the bootstrap kernel stack is free, switch onto it and call
trap() for nested traps that we don't know about. In trap we panic()
so that we can analyze the condition.


# 221271 30-Apr-2011 marcel

Stop linking against a direct-mapped virtual address and instead
use the PBVM. This eliminates the implied hardcoding of the
physical address at which the kernel needs to be loaded. Using the
PBVM makes it possible to load the kernel irrespective of the
physical memory organization and allows us to replicate kernel text
on NUMA machines.

While here, reduce the direct-mapped page size to the kernel's
page size so that we can support memory attributes better.


# 219758 18-Mar-2011 marcel

o Move the IVT and supporting functions to the front of the text
segment so that it's always mapped by the loader.
o Change the alternate fault handlers to account for PBVM. Since
currently the region is handled by the VHPT, no alternate faults
will be generated for it.


# 209085 11-Jun-2010 marcel

The ptc.g operation for the Mckinley and Madison processors has the
side-effect of purging more than the requested translation. While
this is not a problem in general, it invalidates the assumption made
during constructing the trapframe on entry into the kernel in SMP
configurations. The assumption is that only the first store to the
stack will possibly cause a TLB miss. Since the ptc.g purges the
translation caches of all CPUs in the coherency domain, a ptc.g
executed on one CPU can cause a purge on another CPU that is
currently running the critical code that saves the state to the
trapframe. This can cause an unexpected TLB miss and with interrupt
collection disabled this means an unexpected data nested TLB fault.

A data nested TLB fault will not save any context, nor provide a
way for software to determine what caused the TLB miss nor where
it occured. Careful construction of the kernel entry and exit code
allows us to handle a TLB miss in precisely orchastrated points
and thereby avoiding the need to wire the kernel stack, but the
unexpected TLB miss caused by the ptc.g instructution resulted in
an unrecoverable condition and resulting in machine checks.

The solution to this problem is to synchronize the kernel entry
on all CPUs with the use of the ptc.g instruction on a single CPU
by implementing a bare-bones readers-writer lock that allows N
readers (= N CPUs entering the kernel) and 1 writer (= execution
of the ptc.g instruction on some CPU). This solution wins over
a rendez-vous approach by not interrupting CPUs with an IPI.

This problem has not been observed on the Montecito.

PR: ia64/147772
MFC after: 6 days


# 205433 22-Mar-2010 marcel

Fix interrupt handling by extending the critical region so that
preemption doesn't happen until after all pending interrupt have
been services.
While here again, simplify the EOI handling by doing it after we
call the XIV-specific handlers, rather than in each of them. The
original thought was that we may want to do an EOI first and the
actual IPI handling next, but that's mostly a micro-optimization.


# 205234 16-Mar-2010 marcel

Revamp the interrupt code based on the previous commit:
o Introduce XIV, eXternal Interrupt Vector, to differentiate from
the interrupts vectors that are offsets in the IVT (Interrupt
Vector Table). There's a vector for external interrupts, which
are based on the XIVs.

o Keep track of allocated and reserved XIVs so that we can assign
XIVs without hardcoding anything. When XIVs are allocated, an
interrupt handler and a class is specified for the XIV. Classes
are:
1. architecture-defined: XIV 15 is returned when no external
interrupt are pending,
2. platform-defined: SAL reports which XIV is used to wakeup
an AP (typically 0xFF, but it's 0x12 for the Altix 350).
3. inter-processor interrupts: allocated for SMP support and
non-redirectable.
4. device interrupts (i.e. IRQs): allocated when devices are
discovered and are redirectable.

o Rewrite the central interrupt handler to call the per-XIV
interrupt handler and rename it to ia64_handle_intr(). Move
the per-XIV handler implementation to the file where we have
the XIV allocation/reservation. Clock interrupt handling is
moved to clock.c. IPI handling is moved to mp_machdep.c.

o Drop support for the Intel 8259A because it was broken. When
XIV 0 is received, the CPU should initiate an INTA cycle to
obtain the interrupt vector of the 8259-based interrupt. In
these cases the interrupt controller we should be talking to
WRT to masking on signalling EOI is the 8259 and not the I/O
SAPIC. This requires adriver for the Intel 8259A which isn't
available for ia64. Thus stop pretending to support ExtINTs
and instead panic() so that if we come across hardware that
has an Intel 8259A, so have something real to work with.

o With XIVs for IPIs dynamically allocatedi and also based on
priority, define the IPI_* symbols as variables rather than
constants. The variable holds the XIV allocated for the IPI.

o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV
assigned to IPI_STOP is delivered.


# 205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


# 204184 21-Feb-2010 marcel

Prefer I-units and M-units for nop instructions. This works around
McKinley flaws. It also avoids using the F-unit in the kernel for
no reason.


# 200240 07-Dec-2009 marcel

In exception_save, write-back ar.rnat after switching the backing-
store. Writing to ar.bspstore is defined to leave ar.rnat undefined.

PR: ia64/120315
MFC after: 3 days


# 199574 20-Nov-2009 marcel

No need to include opt_kstack_pages.h, because KSTACK_PAGES is
already defined through genassym.c


# 199566 20-Nov-2009 marcel

Add a seatbelt to the Nested TLB Fault handler to give us a chance
to panic when we have an unexpected TLB fault while interrupt
collection is disabled. Use a token rather than the actual address
of the restart point to avoid the need for the movl instruction.
The token is arbitrary. For the drummers: it's based on a single
paradiddle.


# 199502 18-Nov-2009 marcel

opt_* headers are included using the quoted form.


# 172692 16-Oct-2007 marcel

Set PTE_ACCESSED in the PTE and before inserting it in the VHPT.
This avoids back-to-back faults for all TLB misses. This can be
improved further in the future by also setting PTE_DIRTY for TLB
misses for write accesses.

MFC after: 1 week


# 171739 06-Aug-2007 marcel

Keep interrupts disabled while handling external interrupts.
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.

While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.

Further simplify assembly code by dealing with ASTs from C.

Approved by: re (blanket)


# 171666 30-Jul-2007 marcel

o Switch to physical addressing before dereferencing the VHPT
bucket pointer. The virtual mapping may not be present in the
translation cache. This will result in a nested TLB fault at
a place we don't handle (and don't want to handle).
o Make sure there's a stop after the rfi instruction, otherwise
its behaviour is undefined.
o Make sure we switch back to virtual addressing before doing
a rfi. Behaviour is undefined otherwise.

Approved by: re (blanket)


# 171665 30-Jul-2007 marcel

Add option EXCEPTION_TRACING, which enables KTR-like functionality
for processor interruptions. This is especially useful to track
unexpected nested TLB faults.

Approved by: re (blanket)


# 170026 27-May-2007 marcel

Have the processor defer all faults and exceptions for control
speculative loads. This at least makes control speculative loads
work. In the future we should analyze which faults/exceptions
we want to handle rather than defer to avoid having to call the
recovery code when it's not strictly necessary.


# 169760 19-May-2007 marcel

Add a level of indirection to the kernel PTE table. The old
scheme allowed for 1024 PTE pages, each containing 256 PTEs.
This yielded 2GB of KVA. This is not enough to boot a kernel
on a 16GB box and in general too low for a 64-bit machine.
By adding a level of indirection we now have 1024 2nd-level
directory pages, each capable of supporting 2GB of KVA. This
brings the grand total to 2TB of KVA.


# 148807 06-Aug-2005 marcel

Improve SMP support:
o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
uses to look up translations it can't find in the TLB. As such,
the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
and best results are obtained when it's not shared between CPUs.
The collision chain (i.e. the hash bucket) is shared between CPUs,
as all buckets together constitute our collection of PTEs. To
achieve this, the collision chain does not point to the first PTE
in the list anymore, but to a hash bucket head structure. The
head structure contains the pointer to the first PTE in the list,
as well as a mutex to lock the bucket. Thus, each bucket is locked
independently of each other. With at least 1024 buckets in the VHPT,
this provides for sufficiently finei-grained locking to make the
ssolution scalable to large SMP machines.
o Add synchronisation to the lazy FP context switching. We do this
with a seperate per-thread lock. On SMP machines the lazy high FP
context switching without synchronisation caused inconsistent
state, which resulted in a panic. Since the use of the high FP
registers is not common, it's possible that races exist. The ia64
package build has proven to be a good stress test, so this will
get plenty of exercise in the near future.
o Don't use the local ID of the processor we want to send the IPI to
as the argument to ipi_send(). use the struct pcpu pointer instead.
The reason for this is that IPI delivery is unreliable. It has been
observed that sending an IPI to a CPU causes it to receive a stray
external interrupt. As such, we need a way to make the delivery
reliable. The intended solution is to queue requests in the target
CPU's per-CPU structure and use a single IPI to inform the CPU that
there's a new entry in the queue. If that IPI gets lost, the CPU
can check it's queue at any convenient time (such as for each
clock interrupt). This also allows us to send requests to a CPU
without interrupting it, if such would be beneficial.

With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.

The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.


# 135783 25-Sep-2004 marcel

Move the IA-32 trap handling from trap() to ia32_trap(). Move the
ia32_syscall() function along with it to ia32_trap.c. When COMPAT_IA32
is not defined, we'll raise SIGEMT instead.


# 135590 22-Sep-2004 marcel

Redefine a PTE as a 64-bit integral type instead of a struct of
bit-fields. Unify the PTE defines accordingly and update all
uses.


# 134502 29-Aug-2004 marcel

s/ENTRY/ENTRY_NOPROFILE/g for particular functions that do not follow
the C calling convention or are otherwise not regular functions. This
allows us to boot a profiling kernel.


# 121635 28-Oct-2003 marcel

When switching the RSE to use the kernel stack as backing store, keep
the RNAT bit index constant. The net effect of this is that there's
no discontinuity WRT NaT collections which greatly simplifies certain
operations. The cost of this is that there can be up to 504 bytes of
unused stack between the true base of the kernel stack and the start
of the RSE backing store. The cost of adjusting the backing store
pointer to keep the RNAT bit index constant, for each kernel entry,
is negligible.

The primary reasons for this change are:
1. Asynchronuous contexts in KSE processes have the disadvantage of
having to copy the dirty registers from the kernel stack onto the
user stack. The implementation we had so far copied the registers
one at a time without calculating NaT collection values. A process
that used speculation would not work. Now that the RNAT bit index
is constant, we can block-copy the registers from the kernel stack
to the user stack without having to worry about NaT collections.
They will be in the right place on the user stack.
2. The ndirty field in the trapframe is now also usable in userland.
This was previously not the case because ndirty also includes the
space occupied by NaT collections. The value could be off by 8,
depending on the discontinuity. Now that the RNAT bit index is
contants, we have exactly the same number of NaT collection points
on the kernel stack as we would have had on the user stack if we
didn't switch backing stores.
3. Debuggers and other applications that use ptrace(2) can now copy
the dirty registers from the kernel stack (using ptrace(2)) and
copy them whereever they want them (onto the user stack of the
inferior as might be the case for gdb) without having to worry
about NaT collections in the same way the kernel doesn't have to
worry about them.

There's a second order effect caused by the randomization of the
base of the backing store, for it depends on the number of dirty
registers the processor happened to have at the time of entry into
the kernel. The second order effect is that the RSE will have a
better cache utilization as compared to having the backing store
always aligned at page boundaries. This has not been measured and
may be in practice only minimally beneficial, if at all measurable.


# 119787 05-Sep-2003 marcel

Fix a place where I forgot to change the code that checks whether
we return to kernel or userland. This triggered a panic in a KSE
application when TDF_USTATCLOCK was set in the case userland was
interrupted, but we never called ast() on our way out. As such,
we called ast() at some other time. Unfortunately, TDF_USTATCLOCK
handling assumes running in the interrupt thread. This was not
the case anymore.

To avoid making the same mistake later, interrupt() now returns
to its caller whether we interrupted userland or not. This avoids
that we have to duplicate the check in assembly, where it's bound
to fall off the scope. Now we simply check the return value and
call ast() if appropriate.

Run into this: davidxu


# 118563 06-Aug-2003 marcel

o In revision 1.45 of exception.S we changed exception_restore to
unconditionally restore ar.k7 (kernel memory stack) and ar.k6
(kernel register stack). I don't know what I was smoking then,
but if you unconditionally restore ar.k6, you also want to
compute its value unconditionally. By having the computation
predicated and dependent on whether we return to user mode, we
would end up writing junk (= invalid value for ar.bspstore) if
we would return to kernel mode. But the whole point of the
unconditional restoration was that there is a grey area where
we still need to have ar.k6 restored. If we restore with a junk
value, we would end up wedging the machine on the next interrupt.
So, unconditionally calculate the value we unconditionally write
to ar.k6.

o The previous braino was found while making the following change:
We used to clear the lower 9 bits of the value we write to ar.k6.
The meaning being that we know that the kernel register stack is
at least 512 byte aligned and simply clearing the lower 9 bits
allows us to return to a context of which we don't have dirty
registers on the kernel stack, even though the context that
entered the kernel does have dirty registers on the kernel stack.
By masking-off the lower bits, we correctly obtain the base of
the register stack without having to worry that we didn't actually
reached the base while unwinding it.
The change is to mask off the lower 13 bits, knowing that the
kernel register stack is always 8KB aligned. The advantage is that
we don't have to worry anymore if there's more than 512 bytes of
dirty registers on the kernel stack. A situation that frequently
occurs. In exec_setregs() in machdep.c:1.147 or older, we had to
deal with that situation by copying the active portion of the
register stack down in multiples of 512 bytes. Now that we mask off
the lower 13 bits we don't have to do that at all. Contemporary
IPF processors have a register file that can hold up to 96 stacked
registers (=784 bytes [incl. 2 NaT collections]). With no indication
that register files grow beyond a couple of hundred registers, we
should not have to worry about it anymore... and yes, 640KB is
enough for everybody :-)
This change helps setcontext(2) and cpu_set_upcall_kse() in that
they can return to completely different contexts without having to
mess with the kernel stack. Of course exec_setregs() doesn't need
to do that anymore as well.


# 118450 04-Aug-2003 marcel

Fix logic bug in the previous commit. Any region less than 5 is a
user space region. Hence, we need to test if 5 is greater than the
region; not greater equal.
This bug caused us to call ast() while interrupting kernel mode.


# 118402 03-Aug-2003 marcel

Fix handling of external interrupts: we weren't calling ast() when
interrupting user mode. The net effect of this bug is that a clock
interrupt does not cause rescheduling and processes are not
preempted. It only takes a "while (1);" to render the machine
useless.

This bug was introduced by the context changes and EPC syscall code.
Handling of ASTs was moved to C for clarity and ease of maintenance,
but was not added for the external interrupt case.

This needs to be revisited. We now have calls to do_ast() in trap(),
break_syscall() and ivt_External_Interrupt(). A single call in
exception_restore covers these 3 places without duplication. This
is where we handled ASTs prior to the overhaul, except that the
meat has been moved to do_ast(), a C function. This was the goal
to begin with.

Pointy hat: marcel


# 117436 11-Jul-2003 marcel

Remove a gratuitous align directive after the endp directive for
IVT entries.


# 117161 02-Jul-2003 ru

The .s files were repo-copied to .S files.

Approved by: marcel
Repocopied by: joe


# 115344 27-May-2003 marcel

A flushrs must be the first in an instruction group.

Approved by: re@ (blanket)


# 115291 24-May-2003 marcel

Unconditionally restore ar.k7 (memory stack) and ar.k6 (register stack)
when returning from an interrupt. Both registers are used on interrupt
to switch to the right kernel stack, but other than that they are not
used. This means we only have to make sure they contain proper values
while in user mode. As such, we conditionally restored these registers
based on whether we returned to userland or not. A nice property of
conditionally restoring ar.k6 and ar.k7 is that it introduces two
invariants: ar.k6 always points to the bottom of the kernel stack and
ar.k7 always points to the top of the kernel stack (immediately below
the PCB we have there).

However, the EPC syscall path introduces an irregularity: there's no
"thin red line" between user and kernel. There's a grey area that's a
couple of instructions wide. Any interruption in that grey area is
bound to see an inconsistent state. One such state is that we're in
kernel space for all practical purposes, but we still need to have
ar.k6 and ar.k7 restored as if we're in userland.

Thus: restore ar.k6 and ar.k7 unconditionally at the cost of losing
a valuable invariant. Both registers now hold the extend of the
usable portion of the kernel stack at any interrupt nesting, which
when in userland mean the bottom and the top of the kstack.


# 115274 23-May-2003 marcel

Fix a (new) source of instability:

When interrupting a kernel context, we don't need to switch stacks
(memory nor register). As such, we were also not restoring the
register stack pointer (ar.bspstore). This, however, fails to be
valid in 1 situation: when we interrupt a register stack switch as
is being done in restorectx(). The problem is that restorectx()
needs to have ar.bsp == ar.bspstore before it can assign the new
value to ar.bspstore. This is achieved by doing a loadrs prior to
assigning to ar.bspstore. If we take an interrupt in between the
loadrs and the assignment and we don't make sure we restore the
ar.bspstore prior to returning from the interrupt, we switch
stacks with possibly non-zero dirty registers, which means that
the new frame pointer (ar.bsp) will be invalid.

So, instead of jumping over the restoration of the register frame
pointer and related registers, we conditionalize it based on whether
we return to kernel context or user context. A future performance
tweak is possible by only restoring ar.bspstore when returning to
kernel mode *and* when the RSE is in enforced lazy mode. One cannot
assume ar.bsp == ar.bspstore if the RSE is not in enforced lazy mode
anyway.

While here (well, not quite) don't unconditionally assign to
ar.bspstore in exception_save. Only do that when we actually switch
stacks. It can only harm us to do it unconditionally.

Approved by: re@ (blanket)


# 115179 20-May-2003 marcel

o Fix a definite bogon: the dirty bity fault, instruction access
failt and data access fault install the PTE in question into
the VHPT table. However, a post-increment was missing and we
wrote the raw PTE data into the pagesize/access key field.
This leaves a corrupt VHPT entry.
o While here, remove the explicit cache purge. Insertion into
the translation implicitly purges any overlapping entries.
o Make sure there's a cycle break between the itc and the rfi.
o Whitespace fixes.


# 115084 16-May-2003 marcel

Revamp of the syscall path, exception and context handling. The
prime objectives are:
o Implement a syscall path based on the epc inststruction (see
sys/ia64/ia64/syscall.s).
o Revisit the places were we need to save and restore registers
and define those contexts in terms of the register sets (see
sys/ia64/include/_regset.h).

Secundairy objectives:
o Remove the requirement to use contigmalloc for kernel stacks.
o Better handling of the high FP registers for SMP systems.
o Switch to the new cpu_switch() and cpu_throw() semantics.
o Add a good unwinder to reconstruct contexts for the rare
cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o The EPC syscall doesn't preserve registers it does not need
to preserve and places the arguments differently on the stack.
This affects libc and truss.
o The address of the kernel page directory (kptdir) had to
be unstaticized for use by the nested TLB fault handler.
The name has been changed to ia64_kptdir to avoid conflicts.
The renaming affects libkvm.
o The trapframe only contains the special registers and the
scratch registers. For syscalls using the EPC syscall path
no scratch registers are saved. This affects all places where
the trapframe is accessed. Most notably the unaligned access
handler, the signal delivery code and the debugger.
o Context switching only partly saves the special registers
and the preserved registers. This affects cpu_switch() and
triggered the move to the new semantics, which additionally
affects cpu_throw().
o The high FP registers are either in the PCB or on some
CPU. context switching for them is done lazily. This affects
trap().
o The mcontext has room for all registers, but not all of them
have to be defined in all cases. This mostly affects signal
delivery code now. The *context syscalls are as of yet still
unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)


# 113181 06-Apr-2003 marcel

Remove the 32KB VHPT section from the kernel image. We don't really
use it because we allocate a VHPT based on the size of the physical
memory and even if the allocated VHPT is 32KB, we don't use the in-
image section for it. Since the VHPT must be naturally aligned, we
save 48K on average (due to alignment).
Consequently, we start off with the VHPT disabled (it is assumed
the VHPT is disabled because the EFI loader runs without memory
address translation and thus has no need to setup the VHPT). It's
probably a good idea to explicitly disable the VHPT if we make the
use of the VHPT optional.


# 113160 06-Apr-2003 marcel

Also set the access bit in the PTE when we get a data dirty bit fault.
This avoids an immediate access bit fault when we serviced the dirty
bit fault in case the access bit is unset. This typically happens for
newly allocated memory that's being zeroed and thus very common.


# 111036 17-Feb-2003 julian

Fix missed patch in last commit


# 111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


# 106197 30-Oct-2002 marcel

Don't pass the return address to exception_save in register b0. Use
a true scratch register. This change and future re-allocations will
eventually result in code that we can unwind to to get the preserved
registers of the process. This of course means that we cannot trash
them while saving the process context.

While re-allocating, remove the register aliases. Abstraction is in
this case disadvanteous.


# 105499 20-Oct-2002 marcel

Define IVT_ENTRY and IVT_END as special versions of ENTRY and END
for defining vectors. As a result, each vector will be a global
function with unwind directives to notify the unwinder that we're
in an interrupt handler. In the debugger this will show up something
like:

Debugger(0xe000000000a211d8, 0xe000000000748960) at Debugger+0x31
panic(0xe000000000a36858, 0xe0000000021d32d0, 0xe000000000ae42e8, ...
trap(0x14, 0x100000, 0xe0000000021d32d0, 0x0, 0xa0000000002095f0, ...
ivt_Data_TLB(0x14, 0x100000, 0xe0000000021d32d0) at ivt_Data_TLB+0x1f0


# 105010 12-Oct-2002 marcel

Plug two holes where we returned to userland without restoring
the predicate registers. Even though the ITLB and DTLB interrupts
happen often enough, this bug didn't do much harm. The reason
is that the interrupt handlers only modify p1 and since this is
a preserved (callee-saved) register it is hardly used in code
generated by the compiler. Compilers use scratch registers by
default. Changing the interrupt handlers to use p6 (ie a scratch
register) proved that the bug was in fact fatal.


# 95768 30-Apr-2002 marcel

Add ar.lc and ar.ec to the trapframe. These are not saved for syscalls,
only for exceptions.

While adding this to exception_save and exception_restore, it was hard
to find a good place to put the instructions. The code sequence was
sufficiently arbitrarily ordered that the density was low (roughly 67%).
No explicit bundling was used.
Thus, I rewrote the functions to optimize for density (close to 80% now),
and added explicit bundles and nop instructions. The immediate operand
on the nop instruction has been incremented with each instance, to make
debugging a bit easier when looking at recurring patterns. Redundant
stops have been removed as much as possible. Future optimizations can
focus more on performance. A well-placed lfetch can make all the
difference here!

Also, the FRAME_Fxx defines in frame.h were mostly bogus. FRAME_F10 to
FRAME_F15 were copied from FRAME_F9 and still had the same index. We
don't use them yet, so nothing was broken.


# 94365 10-Apr-2002 dfr

Call ast() from the syscall exit path as well as for full exception
restores.


# 93389 29-Mar-2002 jake

Remove abuse of intr_disable/restore in MI code by moving the loop in ast()
back into the calling MD code. The MD code must ensure no races between
checking the astpening flag and returning to usermode.

Submitted by: peter (ia64 bits)
Tested on: alpha (peter, jeff), i386, ia64 (peter), sparc64


# 92249 13-Mar-2002 dfr

Don't restore r13 when returning to kernel mode. We may have migrated to
a different cpu since the exception_save and r13 needs to point at the
current cpu's pcpu structure.


# 88685 30-Dec-2001 marcel

Add missing predicate in interruption_Data_TLB. Without this
predicate we never used the VHPT entry we found.

While here, normalize the compares.


# 87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


# 86951 27-Nov-2001 dfr

Minor tweaks to the TLB handling code - avoid movl instructions and add
itc.x instructions to attempt to avoid the little flurry of TLB exceptions
for handling access, dirty etc.


# 86290 12-Nov-2001 marcel

Invoke trap() for the alt. ITLB and alt. DTLB interrrupts when
the region is not 6 or 7. This changes the behaviour from
inserting a bogus region 6 mapping to a kernel panic.


# 86239 10-Nov-2001 marcel

Avoid using the .align directive to skip to the next vector offset.
It doesn't help us catch overflowing vector entries at compile time.
Instead use the .org directive. The last entry in the IVT doesn't
strictly need to be limited to 256 bytes, but doing so allows the
the VHPT to be placed immediately following the IVT without wasting
any space due to alignment.


# 85866 02-Nov-2001 dfr

Call ast() from exception_restore when we are restoring to user mode.


# 85785 31-Oct-2001 dfr

Experiment with rewriting the syscall() wrapper using explicit bundling
and trying to reduce stalls from reading certain high latency registers.
This should be faster than the old syscall code. Its certainly a lot
smaller.


# 85682 29-Oct-2001 dfr

Various fixes to make stack traces using the unwind tables work properly.


# 85285 21-Oct-2001 dfr

We need to save a bit more information in the partial syscall trapframe
in case we need to take a signal.


# 84632 07-Oct-2001 dfr

* Use srlz.i to serialise changes to psr.ic
* Don't enable psr.i at the same time as psr.dt and psr.ic

These changes improve stability considerably.


# 84556 05-Oct-2001 dfr

Fix some dependency violations (don't know why gas didn't catch this).


# 84115 29-Sep-2001 dfr

Use PAGE_SHIFT instead of a hardcoded constant for log2(PAGE_SIZE).


# 83912 24-Sep-2001 dfr

Make the Alternate {I,D} TLB vector code actually work for virtual
addresses greater than 256M (the page size for region 6 and 7).


# 83906 24-Sep-2001 dfr

Include <machine/pte.h> instead of <machine/pmap.h>


# 83833 22-Sep-2001 dfr

Remove a redundant stop.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 75912 24-Apr-2001 dfr

Don't trash the user's pr on syscalls.


# 75665 18-Apr-2001 dfr

Record the right value for tf_ndirty for kernel interruptions so that
we can examine the interrupted register stack frame in DDB.


# 70210 20-Dec-2000 marcel

Resolve RAW dependency violation between tbit and adds.


# 67522 24-Oct-2000 dfr

* Various fixes to breakage introduced by the atomic and mutex reorgs.
* Fixes to the signal delivery code. Not quite right yet.

I would have preferred to wait until I have signal delivery actually
working but the current kernel in CVS doesn't build.


# 67323 19-Oct-2000 dfr

* Disable interrupts when restoring a trapframe.
* Make sure we reset ar.k6 (used to hold the kernel stack pointer when
we are returning to user mode after a syscall.


# 67212 16-Oct-2000 dfr

Do a full exception_restore after an execve syscall to ensure that the
new program gets the right values for its arguments etc.


# 67199 16-Oct-2000 dfr

* Correct some of my misunderstandings about how best to switch to the
kernel backing store.
* Implement syscalls via break instructions.
* Fix backing store copying in cpu_fork() so that the child gets the right
register values.

This thing is actually starting to work now. This set of changes takes me
up to the second execve (the one which runs the first shell). Next stop
single-user mode :-).


# 67032 12-Oct-2000 dfr

Implement a rudimentary interrupt handling system which should be good
enough for clock interrupts in SKI.


# 67020 12-Oct-2000 dfr

* Fix exception handling so that it actually works. We can now handle
exceptions from both kernel and user mode.
* Fix context switching so that we can switch back to a proc which we
switched away from (we were saving the state in the wrong place).
* Implement lazy switching of the high-fp state. This needs to be looked
at again for SMP to cope with the case of a process migrating from one
processor to another while it has the high-fp state.
* Make setregs() work properly. I still think this should be called
cpu_exec() or something.
* Various other minor fixes.

With this lot, we can execve() /sbin/init and we get all the way up to its
first syscall. At that point, we stop because syscall handling is not done
yet.


# 66633 04-Oct-2000 dfr

Next round of fixes to the ia64 code. This includes simulated clock and
disk drivers along with a load of fixes to context switching, fork
handling and a load of other stuff I can't remember now. This takes us as
far as start_init() before it dies. I guess now I will have to finish off
the VM system and syscall handling :-).


# 66486 30-Sep-2000 dfr

Next round of ia64 work, including fixes to context switching,
implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these
changes, my test kernel reaches the mountroot prompt.


# 66463 29-Sep-2000 dfr

Implement dirty and access bit exceptions.


# 66460 29-Sep-2000 dfr

Use write-back instead of write-combining for region 7.


# 66458 29-Sep-2000 dfr

This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.