History log of /freebsd-10-stable/sys/ia64/ia64/vm_machdep.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 255289 06-Sep-2013 glebius

On those machines, where sf_bufs do not represent any real object, make
sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely
empty functions.

Reviewed by: alc, kib, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 231177 08-Feb-2012 marcel

Rev. 228360 moved the call to cpu_set_upcall() to happen before
td_proc gets initialized in td (=newtd). Use td0 instead.

MFC after: 3 days


# 209026 11-Jun-2010 marcel

Bump MAX_BPAGES from 256 to 1024. It seems that a few drivers, bge(4)
in particular, do not handle deferred DMA map load operations at all.
Any error, and especially EINPROGRESS, is treated as a hard error and
typically abort the current operation. The fact that the busdma code
queues the load operation for when resources (i.e. bounce buffers in
this particular case) are available makes this especially problematic.
Bounce buffering, unlike what the PR synopsis would suggest, works
fine.

While on the subject, properly implement swi_vm().

PR: 147502
MFC after: 1 week


# 204905 09-Mar-2010 marcel

Remove inclusion of <i386/include/psl.h>
While here move inclusion of <sys/lock.h> in a better place.


# 199135 10-Nov-2009 kib

Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by: marcel
Reviewed by: marcel, davidxu
PowerPC, ARM, ia64 changes: marcel
Sparc64 tested and reviewed by: marius, also sunv reviewed
MIPS tested by: gonzo
MFC after: 1 month


# 198733 31-Oct-2009 marcel

Reimplement the lazy FP context switching:
o Move all code into a single file for easier maintenance.
o Use a single global lock to avoid having to handle either
multiple locks or race conditions.
o Make sure to disable the high FP registers after saving
or dropping them.
o use msleep() to wait for the other CPU to save the high
FP registers.

This change fixes the high FP inconsistency panics.

A single global lock typically serializes too much, which may
be noticable when a lot of threads use the high FP registers,
but in that case it's probably better to switch the high FP
context synchronuously. Put differently: cpu_switch() should
switch the high FP registers if the incoming and outgoing
threads both use the high FP registers.


# 194524 20-Jun-2009 marcel

Drop the high FP state of an exiting thread in cpu_thread_exit() and
not in cpu_exit(). The latter is called after td_md.md_highfp_mtx
has been destroyed, which results in a race condition when another
thread wants to use the high FP registers on the CPU that still has
the high FP registers in question.


# 173615 14-Nov-2007 marcel

o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().

i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().

PR: ia64/118024


# 170305 04-Jun-2007 jeff

- Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


# 149915 09-Sep-2005 marcel

Change the High FP lock from a sleep lock to a spin lock. We can
take the lock from interrupt context, which causes an implicit
lock order reversal. We've been using the lock carefully enough
that making it a spin lock should not be harmful.


# 148807 06-Aug-2005 marcel

Improve SMP support:
o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
uses to look up translations it can't find in the TLB. As such,
the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
and best results are obtained when it's not shared between CPUs.
The collision chain (i.e. the hash bucket) is shared between CPUs,
as all buckets together constitute our collection of PTEs. To
achieve this, the collision chain does not point to the first PTE
in the list anymore, but to a hash bucket head structure. The
head structure contains the pointer to the first PTE in the list,
as well as a mutex to lock the bucket. Thus, each bucket is locked
independently of each other. With at least 1024 buckets in the VHPT,
this provides for sufficiently finei-grained locking to make the
ssolution scalable to large SMP machines.
o Add synchronisation to the lazy FP context switching. We do this
with a seperate per-thread lock. On SMP machines the lazy high FP
context switching without synchronisation caused inconsistent
state, which resulted in a panic. Since the use of the high FP
registers is not common, it's possible that races exist. The ia64
package build has proven to be a good stress test, so this will
get plenty of exercise in the near future.
o Don't use the local ID of the processor we want to send the IPI to
as the argument to ipi_send(). use the struct pcpu pointer instead.
The reason for this is that IPI delivery is unreliable. It has been
observed that sending an IPI to a CPU causes it to receive a stray
external interrupt. As such, we need a way to make the delivery
reliable. The intended solution is to queue requests in the target
CPU's per-CPU structure and use a single IPI to inform the CPU that
there's a new entry in the queue. If that IPI gets lost, the CPU
can check it's queue at any convenient time (such as for each
clock interrupt). This also allows us to send requests to a CPU
without interrupting it, if such would be beneficial.

With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.

The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.


# 147889 10-Jul-2005 davidxu

Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)


# 145433 23-Apr-2005 davidxu

Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.


# 144637 04-Apr-2005 jhb

Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more


# 140256 14-Jan-2005 jhb

- Remove some OBE comments regarding cpu_exit(). cpu_exit() is no longer
the last action of kern_exit(). Instead, it is a MD callout to cleanup
per-process state during exit.
- Add notes of concern to Alpha and ia64 about the possible need to drop
fp state in cpu_thread_exit() rather than in cpu_exit() since it is
per-thread state rather than per-process.


# 139790 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


# 129750 26-May-2004 tmm

Retire cpu_sched_exit(); it is not used any more.


# 128393 18-Apr-2004 alc

MFamd64
Simplify the sf_buf implementation. In short, make it a veneer
over the direct virtual-to-physical mapping.


# 127788 03-Apr-2004 alc

In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not. Add a new parameter so that the caller can specify which is
the case.

Reported by: dillon


# 127494 27-Mar-2004 marcel

MFi386: correctly calculate the top-of-stack when a kthread is created
with a larger kernel stack. Remove inclusion of opt_kstack_pages.h now
that it's unused.


# 127086 16-Mar-2004 alc

Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.


# 123929 28-Dec-2003 silby

Track three new sendfile-related statistics:
- The number of times sendfile had to do disk I/O
- The number of times sfbuf allocation failed
- The number of times sfbuf allocation had to wait


# 123920 27-Dec-2003 silby

Move the declaration of sfbufspeak and sfbufsused to mbuf.h,
and use imax instead of max, as sfbufspeak and sfbufsused
are signed.

Submitted by: bde


# 123884 27-Dec-2003 silby

Track current and peak sfbuf usage, export the values via sysctl.


# 122821 16-Nov-2003 alc

- Remove unnecessary synchronization from sf_buf_init(). (There is only
one active CPU when sf_buf_init() is performed.)


# 122780 16-Nov-2003 alc

- Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.


# 122373 09-Nov-2003 marcel

When a thread is being swapped-out, save the high FP registers. We
have a pointer in the PCPU to the PCB of the thread that currently
has its high FP registers loaded.


# 121635 28-Oct-2003 marcel

When switching the RSE to use the kernel stack as backing store, keep
the RNAT bit index constant. The net effect of this is that there's
no discontinuity WRT NaT collections which greatly simplifies certain
operations. The cost of this is that there can be up to 504 bytes of
unused stack between the true base of the kernel stack and the start
of the RSE backing store. The cost of adjusting the backing store
pointer to keep the RNAT bit index constant, for each kernel entry,
is negligible.

The primary reasons for this change are:
1. Asynchronuous contexts in KSE processes have the disadvantage of
having to copy the dirty registers from the kernel stack onto the
user stack. The implementation we had so far copied the registers
one at a time without calculating NaT collection values. A process
that used speculation would not work. Now that the RNAT bit index
is constant, we can block-copy the registers from the kernel stack
to the user stack without having to worry about NaT collections.
They will be in the right place on the user stack.
2. The ndirty field in the trapframe is now also usable in userland.
This was previously not the case because ndirty also includes the
space occupied by NaT collections. The value could be off by 8,
depending on the discontinuity. Now that the RNAT bit index is
contants, we have exactly the same number of NaT collection points
on the kernel stack as we would have had on the user stack if we
didn't switch backing stores.
3. Debuggers and other applications that use ptrace(2) can now copy
the dirty registers from the kernel stack (using ptrace(2)) and
copy them whereever they want them (onto the user stack of the
inferior as might be the case for gdb) without having to worry
about NaT collections in the same way the kernel doesn't have to
worry about them.

There's a second order effect caused by the randomization of the
base of the backing store, for it depends on the number of dirty
registers the processor happened to have at the time of entry into
the kernel. The second order effect is that the RSE will have a
better cache utilization as compared to having the backing store
always aligned at page boundaries. This has not been measured and
may be in practice only minimally beneficial, if at all measurable.


# 120683 03-Oct-2003 marcel

Swap the syscall caller frame info (i.e. the return pointer and
frame marker) and the syscall stub frame info in the trap frame.
Previously we stored the stub frame info in (rp,pfs) and the
caller frame info in (iip,cfm). This ends up being suboptimal
for the following reasons:
1. When we create a new context, such as for an execve(2), we had
to set the (rp,pfs) pair for the entry point when using the
syscall path out of the kernel but we need to set the (iip,cfm)
pair when we take the interrupt way out. This is mostly just
an inconsistency from the kernel's point of view, but an ugly
irregularity from gdb(1)'s point of view.
2. The getcontext(2) and setcontext(2) syscalls had to swap the
(rp,pfs) and (iip,cfm) pairs to make the context compatible
with one created purely in userland.

Swapping the (rp,pfs) and (iip,cfm) pairs is visible to signal
handlers that actually peek at the mcontext_t and to gdb(1).
Since this change is made for gdb(1) and we don't care about
signal handlers that peek at the mcontext_t because we're still
a tier 2 platform, this ABI breakage is academic at this moment
in time.

Note that there was no real reason to save the caller frame info
in (iip,cfm) and the stub frame info in (rp,pfs).


# 119624 31-Aug-2003 marcel

Use direct mapped KVA for the sf_buf allocator, as made possible
by the previous commit. While here, fix a typo, reformat comments
and fix a long line.

Tested with: ftpd


# 119563 29-Aug-2003 alc

Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.


# 119004 16-Aug-2003 marcel

In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.


# 118990 16-Aug-2003 marcel

Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.

ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).

alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.

powerpc: MD prototypes left in cpu.h. Comment added.

Suggested by: bde
Tested with: make universe (pc98 incomplete)


# 118739 10-Aug-2003 marcel

o move cpu_reset() from vm_machdep.c to machdep.c.
o reorder cpu_boot(), cpu_halt() and identifycpu().

No functional change.


# 118588 07-Aug-2003 marcel

o Fix cut-n-paste whitespace corruption in previous commit
o For trap-based upcalls the argument (the kse_mailbox) to
the UTS must be written onto the kernel stack, not the
user stack. While here, deal with the fact that we may
be at a NaT collection point.


# 118566 06-Aug-2003 marcel

In cpu_set_upcall_kse(), create the upcall according to the entry
path into the kernel. Normally it's due to a syscall, but one can
also be created as the result of a clock interrupt (for example).
This now even more looks like exec_setregs().

While here, add an assert that we don't expect more than 8KB of
dirty registers on the kernel stack.


# 118239 30-Jul-2003 peter

Deal with 'options KSTACK_PAGES' being a global option.


# 116971 28-Jun-2003 marcel

Implement cpu_set_upcall_kse(). Elementary testing shows that this
function behaves correctly in principle, but is not expected to be
100% complete. In any case, with this commit we have KSE ported
enough to start runtime testing with threaded applications and fix
whatever bugs or omissions we encounter. Yay!


# 116188 11-Jun-2003 peter

GC unused cpu_wait() function


# 115858 04-Jun-2003 marcel

Change the second (and last) argument of cpu_set_upcall(). Previously
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.

Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.

This change officially marks the ability on ia64 for 1:1 threading.

Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64


# 115651 01-Jun-2003 marcel

Improve on cpu_set_upcall:
o Use pcb and tf for the new pcb and the new trapframe and use pcb0
for the old (current) pcb. The mix of pcb, pcb2 and tf was slightly
confusing.
o Don't define td->td_frame here. It has already been set previously
by cpu_thread_setup. Add a KASSERT to make sure pcb and tf are both
non-NULL.
o Make sure the number of dirty registers is 0 for the new thread.
There are no user registers on the backing store because we heven't
enter userland yet.


# 115605 01-Jun-2003 marcel

Implement cpu_thread_setup(). This is mostly the same as on i386,
except for the fact that trapframes have a size recorded in it
that we set here too. We need this for proper thread setup.

Pointed out by: mtm


# 115570 31-May-2003 marcel

Implement cpu_set_upcall(). Required by libthr and used by
thr_create(2). This implementation is so far only compile tested.
But since this is also the last of the functions required to
support libthr, we're now functionally complete (for some weird
definition of functionally; and complete). Runtime testing can
commence.


# 115084 16-May-2003 marcel

Revamp of the syscall path, exception and context handling. The
prime objectives are:
o Implement a syscall path based on the epc inststruction (see
sys/ia64/ia64/syscall.s).
o Revisit the places were we need to save and restore registers
and define those contexts in terms of the register sets (see
sys/ia64/include/_regset.h).

Secundairy objectives:
o Remove the requirement to use contigmalloc for kernel stacks.
o Better handling of the high FP registers for SMP systems.
o Switch to the new cpu_switch() and cpu_throw() semantics.
o Add a good unwinder to reconstruct contexts for the rare
cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o The EPC syscall doesn't preserve registers it does not need
to preserve and places the arguments differently on the stack.
This affects libc and truss.
o The address of the kernel page directory (kptdir) had to
be unstaticized for use by the nested TLB fault handler.
The name has been changed to ia64_kptdir to avoid conflicts.
The renaming affects libkvm.
o The trapframe only contains the special registers and the
scratch registers. For syscalls using the EPC syscall path
no scratch registers are saved. This affects all places where
the trapframe is accessed. Most notably the unaligned access
handler, the signal delivery code and the debugger.
o Context switching only partly saves the special registers
and the preserved registers. This affects cpu_switch() and
triggered the move to the new semantics, which additionally
affects cpu_throw().
o The high FP registers are either in the PCB or on some
CPU. context switching for them is done lazily. This affects
trap().
o The mcontext has room for all registers, but not all of them
have to be defined in all cases. This mostly affects signal
delivery code now. The *context syscalls are as of yet still
unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)


# 111028 17-Feb-2003 jeff

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


# 110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


# 109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


# 109342 15-Jan-2003 dillon

Merge all the various copies of vm_fault_quick() into a single
portable copy.


# 109340 15-Jan-2003 dillon

Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy. Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).


# 107719 10-Dec-2002 julian

Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by: re (jhb)


# 107481 01-Dec-2002 alc

MFi386
Hold the page queues lock around vm_page_unhold() in vunmapbuf().

Approved by: re (blanket)


# 107180 22-Nov-2002 mux

Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit(). This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit(). We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by: re@
Suggested by: jhb


# 106189 30-Oct-2002 marcel

Rewrite cpu_switch(). The most notable change is the fact that we now
have f16-f31 as part of the context. The PCB has been reorganized to
better match how we save and restore the (preserved) registers. This
commit also moves the context restoriation to its own function (named
pcb_restore), as we did with pcb_save.

Only minimal effort has been put in writing optimal assembly. The
expectation is that there will be more rounds of changes.


# 104426 03-Oct-2002 peter

Update stubs for post-kseIII.


# 103049 06-Sep-2002 peter

Zap the implementations of the i386-aout specific cpu_coredump function.
Most of the non-i386 platforms had rather broken implementations anyway.


# 102161 20-Aug-2002 rwatson

Correct one more errant whitespace nit that crept in during changes
in the arguments to vn_rdwr(). Hopefully the last.


# 101945 15-Aug-2002 rwatson

Correct a minor whitespace nit that sneaked in with my previous commit.


# 101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 99081 29-Jun-2002 julian

Add KSE stubs to MD parts of ia64 code.
Dfr will fill these out when we decide to enable KSEs on ia64
(probably not immediatly)


# 98765 24-Jun-2002 jake

Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.


# 96059 05-May-2002 marcel

Remove definition of struct ia64_fdesc. It's been moved to md_var.h


# 95202 21-Apr-2002 dfr

Setup the child's return values correctly when forking an IA-32 process.


# 93761 04-Apr-2002 alc

o Kill the MD grow_stack(). Call the MI vm_map_growstack()
in its place.
o Eliminate the use of useracc() and grow_stack() from sendsig().

Reviewed by: peter


# 92843 20-Mar-2002 alfred

Remove __P.

Reviewd by: peter


# 92286 14-Mar-2002 dfr

* Initialise pcb_pmap for new threads.
* Add support for forking new threads from &thread0 as well as curthread.


# 92020 10-Mar-2002 dfr

Add an implementation of cpu_throw() and make restorectx() simply branch
to the tail of cpu_switch.


# 90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# 88727 30-Dec-2001 marcel

Revert previous definition of cpu_throw(). Non-MP configurations
were broken as well.


# 88689 30-Dec-2001 marcel

o Remove temporary implementation of cpu_throw in vm_machdep.c
and instead make it an alternate entry-point of cpu_switch()
in swtch.s
o Add SMP support to cpu_switch().


# 84732 09-Oct-2001 dfr

Clarify a comment.

Requested by: jhb


# 84381 02-Oct-2001 mjacob

Fix problem where a user buffer outside of the area being tested
will be corrupted.

PR: 29194
Obtained from: Tor.Egge@fast.no
MFC after: 2 weeks


# 84117 29-Sep-2001 dfr

Call cpu_boot from cpu_reset.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83276 10-Sep-2001 peter

Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by: jake, tmm, dillon


# 83223 08-Sep-2001 peter

Missing part of dillon's coredump commit. cpu_coredump() was still
passing IO_NODELOCKED to vn_rdwr(), this would cause operations on the
unlocked core vnode and softupdates nastiness if an a.out binary cored.


# 82938 04-Sep-2001 peter

Nuke #if 0'ed "setredzone()" stub. We never used it, and probably
never will. I've implemented an optional redzone as part of the KSE
upage breakup.


# 82788 02-Sep-2001 peter

Sync with i386 / alpha. Whitespace unindent / style prep for kse.


# 79265 04-Jul-2001 dillon

Move vm_page_zero_idle() from machine-dependant sections to a
machine-independant source file, vm/vm_zeroidle.c. It was exactly the
same for all platforms and updating them all was getting annoying.


# 79263 04-Jul-2001 dillon

Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 79123 03-Jul-2001 jhb

Allow Giant to be recursed when a process terminates.


# 78962 29-Jun-2001 jhb

Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by: jake (in principle)


# 77448 29-May-2001 jhb

- Catch up to the VM mutex changes.
- Sort includes in a few places.


# 75701 19-Apr-2001 dfr

Don't unwrap the function descriptor used as the callout argument to
fork_exit(). The MI version of fork_exit() needs a real function
descriptor, not a simple function pointer.


# 73922 07-Mar-2001 jhb

Use the proc lock to protect p_pptr when waking up our parent in cpu_exit()
and remove the mpfixme() message that is now fixed.


# 72906 22-Feb-2001 jhb

Rename switch_trampoline() to fork_trampoline() on the alpha and ia64.

Suggested by: dfr


# 72905 22-Feb-2001 jhb

Don't set the sched_lock lesting level for new processes as it is no
longer used.


# 72904 22-Feb-2001 jhb

Catch comments up to child_return() -> fork_return() as well.


# 72899 22-Feb-2001 jhb

Use the MI fork_return() fork trampoline callout function for child
processes instead of the MD child_return().


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


# 68762 15-Nov-2000 jhb

Don't perform an mi_switch() when we release Giant during cpu_exit(). We
are about to call cpu_switch() anyways.

Found by: witness


# 67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


# 67210 16-Oct-2000 dfr

Clear ar.pfs for the child process in cpu_fork - switch_trampoline
doesn't want a stack frame.


# 67199 16-Oct-2000 dfr

* Correct some of my misunderstandings about how best to switch to the
kernel backing store.
* Implement syscalls via break instructions.
* Fix backing store copying in cpu_fork() so that the child gets the right
register values.

This thing is actually starting to work now. This set of changes takes me
up to the second execve (the one which runs the first shell). Next stop
single-user mode :-).


# 67020 12-Oct-2000 dfr

* Fix exception handling so that it actually works. We can now handle
exceptions from both kernel and user mode.
* Fix context switching so that we can switch back to a proc which we
switched away from (we were saving the state in the wrong place).
* Implement lazy switching of the high-fp state. This needs to be looked
at again for SMP to cope with the case of a process migrating from one
processor to another while it has the high-fp state.
* Make setregs() work properly. I still think this should be called
cpu_exec() or something.
* Various other minor fixes.

With this lot, we can execve() /sbin/init and we get all the way up to its
first syscall. At that point, we stop because syscall handling is not done
yet.


# 66937 10-Oct-2000 dfr

* Add rudimentary DDB support (no kgdb, no backtrace, no single step).
* Track recent changes to SWI code.
* Allocate RIDs for pmaps (untested).
* Implement assembler version of cpu_switch - its cleaner that way.


# 66633 04-Oct-2000 dfr

Next round of fixes to the ia64 code. This includes simulated clock and
disk drivers along with a load of fixes to context switching, fork
handling and a load of other stuff I can't remember now. This takes us as
far as start_init() before it dies. I guess now I will have to finish off
the VM system and syscall handling :-).


# 66486 30-Sep-2000 dfr

Next round of ia64 work, including fixes to context switching,
implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these
changes, my test kernel reaches the mountroot prompt.


# 66458 29-Sep-2000 dfr

This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.