History log of /freebsd-10-stable/sys/ia64/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
310508 24-Dec-2016 avg

define Maxmem for ia64, the only platform that didn't have it

This is a direct commit to stable/10 as the platform was removed
in the newer branches.

Maxmem is required for compiling fwohci(4) on ia64 since commit
r310081, MFC of r277511.
It was easier to add Maxmem than to make a special case for ia64
in fwohci.

Reported by: jhb, gjb
Discussed with: kib, jhb

292348 16-Dec-2015 ken

MFC r291716, r291724, r291741, r291742

In addition to those revisions, add this change to a file that is not in
head:

sys/ia64/include/bus.h:
Guard kernel-only parts of the ia64 machine/bus.h header with
#ifdef _KERNEL.

This allows userland programs to include <machine/bus.h> to get the
definition of bus_addr_t and bus_size_t.

------------------------------------------------------------------------
r291716 | ken | 2015-12-03 15:54:55 -0500 (Thu, 03 Dec 2015) | 257 lines

Add asynchronous command support to the pass(4) driver, and the new
camdd(8) utility.

CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl. User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.

While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists. This allows user applications to have more
flexibility in their data handling operations.

Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out. This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.

The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS. The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.

There are some things things would be good to add:

1. Come up with a way to do unmapped I/O on multiple buffers.
Currently the unmapped I/O interface operates on a struct bio,
which includes only one address and length. It would be nice
to be able to send an unmapped scatter/gather list down to
busdma. This would allow eliminating the copy we currently do
for data.

2. Add an ioctl to list currently outstanding CCBs in the various
queues.

3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
that.

4. Test physical address support. Virtual pointers and scatter
gather lists have been tested, but I have not yet tested
physical addresses or scatter/gather lists.

5. Investigate multiple queue support. At the moment there is one
queue of commands per pass(4) device. If multiple processes
open the device, they will submit I/O into the same queue and
get events for the same completions. This is probably the right
model for most applications, but it is something that could be
changed later on.

Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.

This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.

It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.

It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout. It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.

The I/O is done by two threads, one for the reader and one for the
writer. The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order. That could be modified later on for random I/O patterns
or slightly out of order I/O.

camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.

For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side. In addition to testing both
interfaces, this makes any potential reblocking of I/O easier. No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.

For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.

Things that would be nice to do for camdd(8) eventually:

1. Add support for I/O pattern generation. Patterns like all
zeros, all ones, LBA-based patterns, random patterns, etc. Right
Now you can always use /dev/zero, /dev/random, etc.

2. Add support for a "sink" mode, so we do only reads with no
writes. Right now, you can use /dev/null.

3. Add support for automatic queue depth probing, so that we can
figure out the right queue depth on the input and output side
for maximum throughput. At the moment it defaults to 6.

4. Add support for SATA device passthrough I/O.

5. Add support for random LBAs and/or lengths on the input and
output sides.

6. Track average per-I/O latency and busy time. The busy time
and latency could also feed in to the automatic queue depth
determination.

sys/cam/scsi/scsi_pass.h:
Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
and fetch asynchronous CAM CCBs respectively.

Although these ioctls do not have a declared argument, they
both take a union ccb pointer. If we declare a size here,
the ioctl code in sys/kern/sys_generic.c will malloc and free
a buffer for either the CCB or the CCB pointer (depending on
how it is declared). Since we have to keep a copy of the
CCB (which is fairly large) anyway, having the ioctl malloc
and free a CCB for each call is wasteful.

sys/cam/scsi/scsi_pass.c:
Add asynchronous CCB support.

Add two new ioctls, CAMIOQUEUE and CAMIOGET.

CAMIOQUEUE adds a CCB to the incoming queue. The CCB is
executed immediately (and moved to the active queue) if it
is an immediate CCB, but otherwise it will be executed
in passstart() when a CCB is available from the transport layer.

When CCBs are completed (because they are immediate or
passdone() if they are queued), they are put on the done
queue.

If we get the final close on the device before all pending
I/O is complete, all active I/O is moved to the abandoned
queue and we increment the peripheral reference count so
that the peripheral driver instance doesn't go away before
all pending I/O is done.

The new passcreatezone() function is called on the first
call to the CAMIOQUEUE ioctl on a given device to allocate
the UMA zones for I/O requests and S/G list buffers. This
may be good to move off to a taskqueue at some point.
The new passmemsetup() function allocates memory and
scatter/gather lists to hold the user's data, and copies
in any data that needs to be written. For virtual pointers
(CAM_DATA_VADDR), the kernel buffer is malloced from the
new pass(4) driver malloc bucket. For virtual
scatter/gather lists (CAM_DATA_SG), buffers are allocated
from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
Physical pointers are passed in unchanged. We have support
for up to 16 scatter/gather segments (for the user and
kernel S/G lists) in the default struct pass_io_req, so
requests with longer S/G lists require an extra kernel malloc.

The new passcopysglist() function copies a user scatter/gather
list to a kernel scatter/gather list. The number of elements
in each list may be different, but (obviously) the amount of data
stored has to be identical.

The new passmemdone() function copies data out for the
CAM_DATA_VADDR and CAM_DATA_SG cases.

The new passiocleanup() function restores data pointers in
user CCBs and frees memory.

Add new functions to support kqueue(2)/kevent(2):

passreadfilt() tells kevent whether or not the done
queue is empty.

passkqfilter() adds a knote to our list.

passreadfiltdetach() removes a knote from our list.

Add a new function, passpoll(), for poll(2)/select(2)
to use.

Add devstat(9) support for the queued CCB path.

sys/cam/ata/ata_da.c:
Add support for the BIO_VLIST bio type.

sys/cam/cam_ccb.h:
Add a new enumeration for the xflags field in the CCB header.
(This doesn't change the CCB header, just adds an enumeration to
use.)

sys/cam/cam_xpt.c:
Add a new function, xpt_setup_ccb_flags(), that allows specifying
CCB flags.

sys/cam/cam_xpt.h:
Add a prototype for xpt_setup_ccb_flags().

sys/cam/scsi/scsi_da.c:
Add support for BIO_VLIST.

sys/dev/md/md.c:
Add BIO_VLIST support to md(4).

sys/geom/geom_disk.c:
Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size
limiting code in g_disk_start() a bit.

sys/kern/subr_bus_dma.c:
Change _bus_dmamap_load_vlist() to take a starting offset and
length.

Add a new function, _bus_dmamap_load_pages(), that will load a list
of physical pages starting at an offset.

Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
Allow unmapped I/O to start at an offset.

sys/kern/subr_uio.c:
Add two new functions, physcopyin_vlist() and physcopyout_vlist().

sys/pc98/include/bus.h:
Guard kernel-only parts of the pc98 machine/bus.h header with
#ifdef _KERNEL.

This allows userland programs to include <machine/bus.h> to get the
definition of bus_addr_t and bus_size_t.

sys/sys/bio.h:
Add a new bio flag, BIO_VLIST.

sys/sys/uio.h:
Add prototypes for physcopyin_vlist() and physcopyout_vlist().

share/man/man4/pass.4:
Document the CAMIOQUEUE and CAMIOGET ioctls.

usr.sbin/Makefile:
Add camdd.

usr.sbin/camdd/Makefile:
Add a makefile for camdd(8).

usr.sbin/camdd/camdd.8:
Man page for camdd(8).

usr.sbin/camdd/camdd.c:
The new camdd(8) utility.

Sponsored by: Spectra Logic

------------------------------------------------------------------------
r291724 | ken | 2015-12-03 17:07:01 -0500 (Thu, 03 Dec 2015) | 6 lines

Fix typos in the camdd(8) usage() function output caused by an error in
my diff filter script.

Sponsored by: Spectra Logic

------------------------------------------------------------------------
r291741 | ken | 2015-12-03 22:38:35 -0500 (Thu, 03 Dec 2015) | 10 lines

Fix g_disk_vlist_limit() to work properly with deletes.

Add a new bp argument to g_disk_maxsegs(), and add a new function,
g_disk_maxsize() tha will properly determine the maximum I/O size for a
delete or non-delete bio.

Submitted by: will
Sponsored by: Spectra Logic

------------------------------------------------------------------------
------------------------------------------------------------------------
r291742 | ken | 2015-12-03 22:44:12 -0500 (Thu, 03 Dec 2015) | 5 lines

Fix a style issue in g_disk_limit().

Noticed by: bdrewery

------------------------------------------------------------------------

Sponsored by: Spectra Logic

288287 27-Sep-2015 kib

MFC r288000:
Add support for weak symbols to the kernel linkers.

287945 17-Sep-2015 rstone

MFC r280957

Fix integer truncation bug in malloc(9)

A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int. This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc. zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

283910 02-Jun-2015 jhb

MFC 281266:
Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h>
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
libbfd looks for these types, and having them fixes 'gcore' in gdb of a
32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
dump code instead of duplicating the definitions.

282506 05-May-2015 hselasky

MFC r282120:
The add_bounce_page() function can be called when loading physical
pages which pass a NULL virtual address. If the BUS_DMA_KEEP_PG_OFFSET
flag is set, use the physical address to compute the page offset
instead. The physical address should always be valid when adding
bounce pages and should contain the same page offset like the virtual
address.

Submitted by: Svatopluk Kraus <onwahe@gmail.com>
Reviewed by: jhb@

278412 08-Feb-2015 peter

Repair ia64 build after r278347 - remove const from set_mcontext

274648 18-Nov-2014 kib

Merge the fueword(9) and casueword(9). In particular,

MFC r273783:
Add fueword(9) and casueword(9) functions.
MFC note: ia64 is handled like arm, with NO_FUEWORD define.

MFC r273784:
Replace some calls to fuword() by fueword() with proper error checking.

MFC r273785:
Convert kern_umtx.c to use fueword() and casueword().
MFC note: the sys__umtx_lock and sys__umtx_unlock syscalls are not
converted, they are removed from HEAD, and not used. The do_sem2*()
family is not yet merged to stable/10, corresponding chunk will be
merged after do_sem2* are committed.

MFC r273788 (by jkim):
Actually install casuword(9) to fix build.

MFC r273911:
Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9).

271998 22-Sep-2014 marcel

Make sure all memory updates are made visible before we let go
of the thread in cpu_switch(). It's otherwise possible that on
another CPU the thread continues from stale context data.

Note that this is prominent on newer CPUs, like the Montecito,
that really take advantage of the weak memory ordering. First
generation Itanium 2 is not that aggressive and does not need
this.

This is a direct commit to stable/10.

Approved by: re@ (gjb)

271239 07-Sep-2014 marcel

Fix previous commit: unbreak build of libkvm by including sys/systm.h
only when _KERNEL is defined.

Approved by: re@ (implicit)

271211 06-Sep-2014 marcel

Fix the PCPU access macros. It was found that the PCPU pointer, when
held in register r13, is used outside the bounds of critical_enter()
and critical_exit() by virtue of optimizations performed by the
compiler. The net effect being that address computations of fields
in the PCPU structure could be relative to the PCPU structure of the
CPU on which the address computation was performed and not related
to the CPU that executes the actual load or store operation.
The typical failure mode being that the per-CPU cache of UMA got
corrupted due to accesses from other CPUs.

Adding more volatile decorating to the register expression does not
help. The thinking being that volatile is assumed to work on memory
references and not register references. Thus, the fix is to perform
the address computation using a volatile inline assembly statement.

Additionally, since the reference is fundamentally non-atomic on ia64
by virtue of have a distinct address computation followed by the
actual load or store operation, it is required to wrap the entire
PCPU access in a critical section.

With PCPU_GET and friends requiring curthread now that they're in a
critical section, low-level use of these macros in functions like
cpu_switch() is not possible anymore. Consequently, a second order
set of changes is needed to avoid using PCPU_GET and friends where
curthread is either not set yet, or in the process of being changed.
In those cases, explicit dereferencing of pcpup is needed. In those
cases it is also possible to do that.

This is a direct commit to stable/10.

Approved by: re@ (marius)

270920 01-Sep-2014 kib

Fix a leak of the wired pages when unwiring of the PROT_NONE-mapped
wired region. Rework the handling of unwire to do the it in batch,
both at pmap and object level.

All commits below are by alc.

MFC r268327:
Introduce pmap_unwire().

MFC r268591:
Implement pmap_unwire() for powerpc.

MFC r268776:
Implement pmap_unwire() for arm.

MFC r268806:
pmap_unwire(9) man page.

MFC r269134:
When unwiring a region of an address space, do not assume that the
underlying physical pages are mapped by the pmap. This fixes a leak
of the wired pages on the unwiring of the region mapped with no access
allowed.

MFC r269339:
In the implementation of the new function pmap_unwire(), the call to
MOEA64_PVO_TO_PTE() must be performed before any changes are made to the
PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic.

MFC r269365:
Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)

MFC r269433:
Handle wiring failures in vm_map_wire() with the new functions
pmap_unwire() and vm_object_unwire().
Retire vm_fault_{un,}wire(), since they are no longer used.

MFC r269438:
Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable
"rv" is uninitialized.

MFC r269485:
Retire pmap_change_wiring().

Reviewed by: alc

270835 30-Aug-2014 alc

Update an assertion to reflect the changes made in r270439. This is a
direct commit to stable/10 because ia64 is no longer supported by HEAD.

Reported by: marcel
Sponsored by: EMC / Isilon Storage Division

270573 25-Aug-2014 marcel

Make sure the psr field in the trapframe (which holds the value of cr.ipsr)
is properly synthesized for the EPC syscall. Properly synthesized in this
case means that the bank number (BN bitfield) is set to 1. This is needed
because the move-from-PSR instruction does copy all bits! In this case
the BN bitfield was not copied.

While normally this is not a problem, because when we leave the kernel via
the EPC syscall path again, we don't actually care about the BN bitfield.
We restore PSR with a move-to-PSR instruction, which also doesn't cover
the BN bitfield.

There is however a scenario where we enter the kernel via the EPC syscall
path and leave the kernel via the exception/interrupt path. That path
uses the RFI (Return-From-Interrupt) instruction and it restores all bits.
What happens in that case is that we don't properly switch to register
bank 1 and any exception/interrupt that happens while running in bank 0
clobbers the process' (or kernel's) banked registers. This is because the
CPU switches to bank 0 on an exception/interrupt so that there are 16
general registers available for constructing a trapframe and saving the
context. Consequently: normal code should always use register bank 1.

This bug has been present since 2003 (11 years) and has been the cause
for many "unexplained" kernel panics. It says something about how often
we hit this problem on the one hand and how tricky it was to find it.

Many thanks to: clusteradm@ for enabling me to track this down!

270439 24-Aug-2014 kib

Merge the changes to pmap_enter(9) for sleep-less operation (requested
by flag). The ia64 pmap.c changes are direct commit, since ia64 is
removed on head.

MFC r269368 (by alc):
Retire PVO_EXECUTABLE.

MFC r269728:
Change pmap_enter(9) interface to take flags parameter and superpage
mapping size (currently unused).

MFC r269759 (by alc):
Update the text of a KASSERT() to reflect the changes in r269728.

MFC r269822 (by alc):
Change {_,}pmap_allocpte() so that they look for the flag
PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether
to sleep on page table page allocation.

MFC r270151 (by alc):
Replace KASSERT that no PV list locks are held with a conditional
unlock.

Reviewed by: alc
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation

270296 21-Aug-2014 emaste

MFC r263815, r263872:

Move ia64 efi.h to sys in preparation for amd64 UEFI support

Prototypes specific to ia64 have been left in this file for now, under
__ia64__, rather than moving them to a new header under sys/ia64.
I anticipate that (some of) the corresponding functions will be shared
by the amd64, arm64, i386, and ia64 architectures, and we can adjust
this as EFI support on other than ia64 continues to develop.

Fix missed efi.h header change in r263815

Sponsored by: The FreeBSD Foundation

268813 17-Jul-2014 imp

MFC r263749,267146:

>r267146 | imp | 2014-06-05 22:08:55 -0600 (Thu, 05 Jun 2014) | 4 lines
>Restore comments accidentally removed.

>r263749 | imp | 2014-03-25 16:08:31 -0600 (Tue, 25 Mar 2014) | 18 lines
>Rather than require a makeoptions DEBUG to get debug correct,
>add it in kern.mk, but only if we're using clang. While this
>option is supported by both clang and gcc, in the future there
>may be changes to clang which change the defaults that require
>a tweak to build our kernel such that other tools in our tree
>will work. Set a good example by forcing -gdwarf-2 only for
>clang builds, and only if the user hasn't specified another
>dwarf level already. Update UPDATING to reflect the changed
>state of affairs. This also keeps us from having to update
>all the ARM kernels to add this, and also keeps us from
>in the future having to update all the MIPS kernels and is
>one less place the user will have to know to do something
>special for clang and one less thing developers will need
>to do when moving an architecture to clang.

268201 03-Jul-2014 marcel

MFC r263380 & r268185: Add KTR events for the PMAP interface functions.

268200 02-Jul-2014 marcel

MFC r263323: Fix and improve exception tracing.

268199 02-Jul-2014 marcel

MFC r263254: Move the implementation of kdb_cpu_trap() from <machine/kdb.h>
to machdep.c.

268198 02-Jul-2014 marcel

MFC r263253: Don't use the ITC as the faulting address for external
interrupts.

268195 02-Jul-2014 marcel

MFC r263248 & r263257: In intr_event_handle() we already save and set
td_intr_frame, so don't do it also in ia64_handle_intr().

268194 02-Jul-2014 marcel

MFC r262726: When reading physical memory, make sure to access it using
the right memory attributes.

268192 02-Jul-2014 marcel

MFC r259959 & r260009: Add prototypical support for minidumps.

268189 02-Jul-2014 marcel

MFC r257484: Change PAL_PTCE_INFO related variables.

268184 02-Jul-2014 marcel

MFC r257477: Purge the translation cache of APs before we unleash them.

268181 02-Jul-2014 marcel

MFC 257475: Respect the kern.smp.disabled tunable.

266331 17-May-2014 ian

MFC 263301

In kernel config files, it is supposed to be 'options<space><tab>' not
'options<tab><tab>', per long standing (but recently not so strictly
enforced) convention.


/freebsd-10-stable/sys/amd64/conf/NOTES
/freebsd-10-stable/sys/arm/conf/AC100
/freebsd-10-stable/sys/arm/conf/ARMADAXP
/freebsd-10-stable/sys/arm/conf/ARNDALE
/freebsd-10-stable/sys/arm/conf/ATMEL
/freebsd-10-stable/sys/arm/conf/AVILA
/freebsd-10-stable/sys/arm/conf/BEAGLEBONE
/freebsd-10-stable/sys/arm/conf/BWCT
/freebsd-10-stable/sys/arm/conf/CAMBRIA
/freebsd-10-stable/sys/arm/conf/CNS11XXNAS
/freebsd-10-stable/sys/arm/conf/COLIBRI-VF50
/freebsd-10-stable/sys/arm/conf/COSMIC
/freebsd-10-stable/sys/arm/conf/CRB
/freebsd-10-stable/sys/arm/conf/CUBIEBOARD
/freebsd-10-stable/sys/arm/conf/CUBIEBOARD2
/freebsd-10-stable/sys/arm/conf/DB-78XXX
/freebsd-10-stable/sys/arm/conf/DB-88F5XXX
/freebsd-10-stable/sys/arm/conf/DB-88F6XXX
/freebsd-10-stable/sys/arm/conf/DIGI-CCWMX53
/freebsd-10-stable/sys/arm/conf/DOCKSTAR
/freebsd-10-stable/sys/arm/conf/DREAMPLUG-1001
/freebsd-10-stable/sys/arm/conf/EA3250
/freebsd-10-stable/sys/arm/conf/EB9200
/freebsd-10-stable/sys/arm/conf/EFIKA_MX
/freebsd-10-stable/sys/arm/conf/EP80219
/freebsd-10-stable/sys/arm/conf/ETHERNUT5
/freebsd-10-stable/sys/arm/conf/GUMSTIX
/freebsd-10-stable/sys/arm/conf/HL200
/freebsd-10-stable/sys/arm/conf/HL201
/freebsd-10-stable/sys/arm/conf/IMX53-QSB
/freebsd-10-stable/sys/arm/conf/IMX6
/freebsd-10-stable/sys/arm/conf/IQ31244
/freebsd-10-stable/sys/arm/conf/KB920X
/freebsd-10-stable/sys/arm/conf/LN2410SBC
/freebsd-10-stable/sys/arm/conf/NSLU
/freebsd-10-stable/sys/arm/conf/PANDABOARD
/freebsd-10-stable/sys/arm/conf/QILA9G20
/freebsd-10-stable/sys/arm/conf/QUARTZ
/freebsd-10-stable/sys/arm/conf/RADXA
/freebsd-10-stable/sys/arm/conf/RPI-B
/freebsd-10-stable/sys/arm/conf/SAM9260EK
/freebsd-10-stable/sys/arm/conf/SAM9G20EK
/freebsd-10-stable/sys/arm/conf/SAM9X25EK
/freebsd-10-stable/sys/arm/conf/SHEEVAPLUG
/freebsd-10-stable/sys/arm/conf/SN9G45
/freebsd-10-stable/sys/arm/conf/TS7800
/freebsd-10-stable/sys/arm/conf/VERSATILEPB
/freebsd-10-stable/sys/arm/conf/VYBRID.common
/freebsd-10-stable/sys/arm/conf/WANDBOARD.common
/freebsd-10-stable/sys/arm/conf/ZEDBOARD
/freebsd-10-stable/sys/i386/conf/NOTES
/freebsd-10-stable/sys/i386/conf/XEN
conf/GENERIC
/freebsd-10-stable/sys/mips/conf/ALCHEMY
/freebsd-10-stable/sys/mips/conf/AP121
/freebsd-10-stable/sys/mips/conf/AP91
/freebsd-10-stable/sys/mips/conf/AP93
/freebsd-10-stable/sys/mips/conf/AP94
/freebsd-10-stable/sys/mips/conf/AP96
/freebsd-10-stable/sys/mips/conf/AR71XX_BASE
/freebsd-10-stable/sys/mips/conf/AR724X_BASE
/freebsd-10-stable/sys/mips/conf/AR91XX_BASE
/freebsd-10-stable/sys/mips/conf/AR933X_BASE
/freebsd-10-stable/sys/mips/conf/AR934X_BASE
/freebsd-10-stable/sys/mips/conf/CARAMBOLA2
/freebsd-10-stable/sys/mips/conf/ENH200
/freebsd-10-stable/sys/mips/conf/PB47
/freebsd-10-stable/sys/mips/conf/PB92
/freebsd-10-stable/sys/mips/conf/PICOSTATION_M2HP
/freebsd-10-stable/sys/mips/conf/ROUTERSTATION
/freebsd-10-stable/sys/mips/conf/ROUTERSTATION_MFS
/freebsd-10-stable/sys/mips/conf/RSPRO
/freebsd-10-stable/sys/mips/conf/RSPRO_MFS
/freebsd-10-stable/sys/mips/conf/RSPRO_STANDALONE
/freebsd-10-stable/sys/mips/conf/RT305X
/freebsd-10-stable/sys/mips/conf/SENTRY5
/freebsd-10-stable/sys/mips/conf/SWARM64_SMP
/freebsd-10-stable/sys/mips/conf/SWARM_SMP
/freebsd-10-stable/sys/mips/conf/TP-WN1043ND
/freebsd-10-stable/sys/mips/conf/WZR-300HP
/freebsd-10-stable/sys/mips/conf/XLRN32
/freebsd-10-stable/sys/mips/conf/std.SWARM
/freebsd-10-stable/sys/mips/conf/std.XLP
/freebsd-10-stable/sys/powerpc/conf/GENERIC
/freebsd-10-stable/sys/powerpc/conf/GENERIC64
/freebsd-10-stable/sys/powerpc/conf/MPC85XX
/freebsd-10-stable/sys/powerpc/conf/NOTES
266312 17-May-2014 ian

MFC 263036, 263059: delete advertising clause in licenses, renumber.

266204 16-May-2014 ian

MFC r257854 (discussed with alc@)

As of r257209, all architectures have defined
VM_KMEM_SIZE_SCALE. In other words, every architecture is now
auto-sizing the kmem arena. This revision changes kmeminit() so
that the definition of VM_KMEM_SIZE_SCALE becomes mandatory and
the definition of VM_KMEM_SIZE becomes optional.

Replace or eliminate all existing definitions of VM_KMEM_SIZE.
With auto-sizing enabled, VM_KMEM_SIZE effectively became an
alternate spelling for VM_KMEM_SIZE_MIN on most architectures.
Use VM_KMEM_SIZE_MIN for clarity.

265606 07-May-2014 scottl

Merge r264984

Retire smp_active. It was racey and caused demonstrated problems with
the cpufreq code. Replace its use with smp_started. There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Obtained from: Netflix, Inc.

265388 05-May-2014 ken

MFC the mpr(4) driver for LSI's 12Gb SAS cards.

This includes r265236, r265237, r265241 and r265261:

------------------------------------------------------------------------
r265236 | ken | 2014-05-02 14:25:09 -0600 (Fri, 02 May 2014) | 51 lines

Bring in the mpr(4) driver for LSI's MPT3 12Gb SAS controllers.

This is derived from the mps(4) driver, but it supports only the 12Gb
IT and IR hardware including the SAS 3004, SAS 3008 and SAS 3108.

Some notes about this driver:
o The 12Gb hardware can do "FastPath" I/O, and that capability is included in
this driver.

o WarpDrive functionality has been removed, since it isn't supported in
the 12Gb driver interface.

o The Scatter/Gather list handling code is significantly different between
the 6Gb and 12Gb hardware. The 12Gb boards support IEEE Scatter/Gather
lists.

Thanks to LSI for developing and testing this driver for FreeBSD.

share/man/man4/mpr.4:
mpr(4) man page.

sys/dev/mpr/*:
mpr(4) driver files.

sys/modules/Makefile,
sys/modules/mpr/Makefile:
Add a module Makefile for the mpr(4) driver.

sys/conf/files:
Add the mpr(4) driver.

sys/amd64/conf/GENERIC,
sys/i386/conf/GENERIC,
sys/mips/conf/OCTEON1,
sys/sparc64/conf/GENERIC:
Add the mpr(4) driver to all config files that currently
have the mps(4) driver.

sys/ia64/conf/GENERIC:
Add the mps(4) and mpr(4) drivers to the ia64 GENERIC
config file.

sys/i386/conf/XEN:
Exclude the mpr module from building here.

Submitted by: Steve McConnell <Stephen.McConnell@lsi.com>
Tested by: Chris Reeves <chrisr@spectralogic.com>
Sponsored by: LSI, Spectra Logic
Relnotes: LSI 12Gb SAS driver mpr(4) added

------------------------------------------------------------------------
------------------------------------------------------------------------
r265237 | ken | 2014-05-02 14:36:20 -0600 (Fri, 02 May 2014) | 8 lines

Add the mpr(4) man page to the man4 Makefile.

This should have been included in r265236.

Submitted by: Steve McConnell <Stephen.McConnell@lsi.com>
MFC after: 3 days
Sponsored by: LSI, Spectra Logic

------------------------------------------------------------------------
------------------------------------------------------------------------
r265241 | brueffer | 2014-05-02 15:14:28 -0600 (Fri, 02 May 2014) | 2 lines

Use our standard SYNOPSIS wording; perform some cleanup while here.

------------------------------------------------------------------------
------------------------------------------------------------------------
r265261 | brueffer | 2014-05-03 05:15:28 -0600 (Sat, 03 May 2014) | 2 lines

Add a missing colon.

------------------------------------------------------------------------

Submitted by: Steve McConnell <Stephen.McConnell@lsi.com>
Tested by: Chris Reeves <chrisr@spectralogic.com>
Sponsored by: LSI, Spectra Logic
Relnotes: LSI 12Gb SAS driver mpr(4) added

264496 15-Apr-2014 tijl

MFC r263998:

Rename __wchar_t so it no longer conflicts with __wchar_t from clang 3.4
-fms-extensions.

262004 16-Feb-2014 marcel

MFC r260175:
Implement atomic_swap_<type>.

262002 16-Feb-2014 marcel

MFC r260914:
In pmap_set_pte(), make sure to enforce ordering by inserting a memory fence.

262001 16-Feb-2014 marcel

MFC r260666:
In the nested TLB fault handler, for a direct-mapped address, make
sure to clear the lower 12 bits.

261996 16-Feb-2014 marcel

MFC r259244:
Allow pmap_remove_pages() to be called for physical maps not
associated with the current thread.

261986 16-Feb-2014 marcel

MFC r257910:
Don't enable interrupts before we call sched_throw().

261985 16-Feb-2014 marcel

MFC r257487:
Use LOG2_ID_PAGE_SIZE again for the identity mapping in regions 6 & 7.

259510 17-Dec-2013 kib

MFC r257228:
Add bus_dmamap_load_ma() function to load map with the array of
vm_pages.

256283 10-Oct-2013 gjb

- Remove debugging from GENERIC* kernel configurations
- Enable MALLOC_PRODUCTION
- Default dumpdev=NO
- Remove UPDATING entry regarding debugging features
- Bump __FreeBSD_version to 1000500

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255724 20-Sep-2013 alc

The pmap function pmap_clear_reference() is no longer used. Remove it.

pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


255426 09-Sep-2013 jhb

Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by: alc
Approved by: re (kib)


255289 06-Sep-2013 glebius

On those machines, where sf_bufs do not represent any real object, make
sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely
empty functions.

Reviewed by: alc, kib, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


255028 29-Aug-2013 alc

Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division


254667 22-Aug-2013 kib

Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


254480 18-Aug-2013 pjd

Add process descriptors support to the GENERIC kernel. It is already being
used by the tools in base systems and with sandboxing more and more tools
the usage should only increase.

Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Sponsored by: Google Summer of Code 2013
MFC after: 1 month


254300 13-Aug-2013 jkim

Tidy up global locks for ACPICA. There is no functional change.


254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


254133 09-Aug-2013 avg

follow up to r254051

- update powerpc/GENERIC64 as well, suggested by mdf
- update comments so that they make sense after the change, suggested by
jhb

X-MFC after: never (change specific to head)


254051 07-Aug-2013 avg

enable KDB_TRACE in GENERICs

KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.

X-MFC after: never (change specific to head)


254025 07-Aug-2013 jeff

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253845 31-Jul-2013 obrien

Back out r253779 & r253786.


253779 29-Jul-2013 obrien

Decouple yarrow from random(4) device.

* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.

* random(4) device doesn't really depend on rijndael-*. Yarrow, however, does.

* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
random_adaptor is basically an adapter that plugs in to random(4).
random_adaptor can only be plugged in to random(4) very early in bootup.
Unplugging random_adaptor from random(4) is not supported, and is probably a
bad idea anyway, due to potential loss of entropy pools.
We currently have 3 random_adaptors:
+ yarrow
+ rdrand (ivy.c)
+ nehemeiah

* Remove platform dependent logic from probe.c, and move it into
corresponding registration routines of each random_adaptor provider.
probe.c doesn't do anything other than picking a specific random_adaptor
from a list of registered ones.

* If the kernel doesn't have any random_adaptor adapters present then the
creation of /dev/random is postponed until next random_adaptor is kldload'ed.

* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
system wide one.

Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: obrien


253750 28-Jul-2013 avg

Revert r253748,253749

This WIP should not have been committed yet.

Pointyhat to: avg


253748 28-Jul-2013 avg

put contents of cpu.h under _KERNEL

no userland-serviceable parts inside

MFC after: 20 days


253560 23-Jul-2013 marcel

In pci_cfgregread() and pci_cfgregwrite(), multiplex the domain and
bus number into the bus argument. The bus number occupies the least
significant 8 bits. The PCI domain occupies the most significant 24
bits.

On the Altix 350, the PCI domain is a required parameter, but
changing the prototype of the pci_cfgreg*() functions to include a
separate domain argument has wide-spread consequences across the
supported architectures. We'd be changing a known interface.

Multiplexing is an acceptable kluge to give us what we need with
manageable impact. Note that the PCI bus number fits in 8 bits,
so the multiplexing of the domain is a backward compatible change.


253559 23-Jul-2013 marcel

In ia64_mca_init(), don't limit the allocation of the info block to
fall within the first 256MB of memory. The origin/reason for that
limitation is not known, but it's not believed to be required for
proper initialization. What is known is that the Altix 350 does not
have physical memory at that address (by virtue of the address space
bits).

Keep the boundary at 256MB so that the info block will be covered
by a single direct-mapped translation.

While here, change the flags to M_NOWAIT to eliminate confusion. It
does not change the behaviour of contigmalloc(). What is does is
makes the flags argument explicitly say what the actual behaviour
is.


253558 23-Jul-2013 marcel

In pmap_mapdev(), if the physical memory range is not covered by an EFI
memory descriptor, don't return NULL as the virtual address, return the
direct-mapped uncacheable virtual address for it. At first, this was
needed only for the Altix 350, but now even some high-end HP machines
have devices mapped to physical addresses that aren't covered by the
EFI memory map.


252434 01-Jul-2013 kib

Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.

On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.

On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot. The effect would be a counter going off by
arbitrary value after zeroing. Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive. On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.

PowerPC64 is treated the same as amd64.

For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching. It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm). If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.

Noted by: bde
Sponsored by: The FreeBSD Foundation


252280 27-Jun-2013 jkim

Move definitions required by userland applications out of acpica_machdep.h.


250963 24-May-2013 achim

Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver.

Approved by: scottl (mentor)


250884 21-May-2013 attilio

o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


250544 12-May-2013 peter

Tidy up some CVS workarounds.


250338 07-May-2013 attilio

Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept. The change should also be useful
for consolidation and consistency.

Sponsored by: EMC / Isilon storage division
Obtained from: jeff
Reviewed by: alc


249410 12-Apr-2013 trasz

Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE'
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.

With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.

Reviewed by: ken
Sponsored by: FreeBSD Foundation


249268 08-Apr-2013 glebius

Merge from projects/counters: counter(9).

Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.

See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.

In collaboration with: kib
Reviewed by: luigi
Tested by: ae, ray
Sponsored by: Nginx, Inc.


249265 08-Apr-2013 glebius

Merge from projects/counters:

Pad struct pcpu so that its size is denominator of PAGE_SIZE. This
is done to reduce memory waste in UMA_PCPU_ZONE zones.

Sponsored by: Nginx, Inc.


249083 04-Apr-2013 mav

Remove all legacy ATA code parts, not used since options ATA_CAM enabled in
most kernels before FreeBSD 9.0. Remove such modules and respective kernel
options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the
atacontrol utility and some man pages. Remove useless now options ATA_CAM.

No objections: current@, stable@
MFC after: never


248508 19-Mar-2013 kib

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


248280 14-Mar-2013 kib

Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


247463 28-Feb-2013 mav

MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.


247454 28-Feb-2013 davide

MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.


247251 25-Feb-2013 marcel

kernacc() expects all KVAs to be covered in the kernel map. With the
introduction of the PBVM, this stopped being the case. Redefine the
VM parameters so that the PBVM is included in the kernel map. In
particular this introduces VM_INIT_KERNEL_ADDRESS to point to the base
of region 5 now that VM_MIN_KERNEL_ADDRESS points to the base of
region 4 to include the PBVM.
While here define KERNBASE to the actual link address of the kernel as
is intended.

PR: 169926


247197 23-Feb-2013 marcel

Enable PREEMPTION by default now that PR 147501 has been fixed.


246890 17-Feb-2013 marcel

Close a race relating to setting the PCPU pointer (r13). Register r13
points to the TLS in user space and points to the PCPU structure in
the kernel. The race is the result of having the exception handler on
the one hand and the RPC system call entry on the other. The EPC
syscall path is non-atomic in that interrupts are enabled while the
two stacks are switched. The register stack is switched last as that
is the stack used to determine whether we're going back to user space
by the exception handler. If we go back to user space, we restore r13,
otherwise we leave r13 alone. The EPC syscall path however set r13 to
the PCPU structure *before* switching the register stack, which means
that there was a window in which the exception handler would restore
r13 when it was already pointing to the PCPU structure. This is fatal
when the exception happened on CPU x, but left from the exception on
anotehr CPU. In that case r13 would point to the PCPU of the CPU the
thread was running on. This immediately results in getting the wrong
value for curthread.
The fix is to make sure we assign r13 *after* we set ar.bspstore to
point to the kernel register stack for the thread.


246882 16-Feb-2013 marcel

Return EFAULT when the address is not a kernel virtual address.


246715 12-Feb-2013 marcel

Eliminate the PC_CURTHREAD symbol and load the current thread's
thread structure pointer atomically from r13 (the pcpu pointer)
for the current CPU/core.
Add a CTASSERT in machdep.c to make sure that pc_curthread is in
fact the first field in struct pcpu.

The only non-atomic operations left were those related to process-
space operations, such as casuword, subyte, suword16, fubyte,
fuword16, copyin, copyout and their variations.

The casuword function has been re-structured more complete than
the others. This way we have an example of a better bundling
without introducing a lot of risk when we get it wrong. The
other functions can be rebundled in separate commits and with
the appropriate testing.


246714 12-Feb-2013 marcel

Eliminate padding by moving 'narg' next to 'code'. Both are 32-bit
entities in the syscall_args structure that otherwise has 64-bit
only fields.


246713 12-Feb-2013 kib

Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)


246712 12-Feb-2013 marcel

Now that we actually use more memory descriptors, make sure to dump
them as well.


245044 04-Jan-2013 hrs

Remove firewire devices missed in r244992.


245003 03-Jan-2013 kib

Enable the UFS quotas for big-iron GENERIC kernels.

Discussed with: mckusick
MFC after: 2 weeks


244992 03-Jan-2013 des

As discussed on -current last October, remove the firewire drivers from
GENERIC.


243040 14-Nov-2012 kib

Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks


242534 03-Nov-2012 attilio

Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by: alc, jimharris


242218 28-Oct-2012 kib

Fix compilation on ia64 when page size is configured for 16KB.

Reviewed by: alc, marcel


242121 26-Oct-2012 alc

Port the new PV entry allocator from amd64/i386. This allocator has two
advantages. First, PV entries are roughly half the size. Second, this
allocator doesn't access the paging queues, and thus it allows for the
removal of the page queues lock from this pmap.

Replace all uses of the page queues lock by a R/W lock that is private
to this pmap.

Tested by: marcel


241020 28-Sep-2012 alc

Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.


240244 08-Sep-2012 attilio

userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by: kib
MFC after: 1 week


239379 18-Aug-2012 marcel

Use pmap_kextract(x) rather than pmap_extract(kernel_pmap, x). The
former knows about all the special mappings, like PBVM. The kernel
text and data are in the PBVM.


239376 18-Aug-2012 marcel

Remove support for SKI: HP's Itanium simulator. It's pretty much not
used, serves very little value given that FreeBSD runs on real H/W
for a long time.
Note that SKI is open-source (see http://ski.sourceforge.net), so
if there's interest and value again, then this code can be revived.

Discussed with: jhb


239331 16-Aug-2012 jhb

Add locking for sscdisk(4) and mark it MPSAFE. Since this driver just
makes calls out to the emulator, the locking is fairly simple. A global
mutex protects the list of ssc disks, and each ssc disk has a mutex
to protect it's bioq.

Approved by: marcel


239065 05-Aug-2012 kib

After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by: alc
MFC after: 2 weeks


238257 08-Jul-2012 marcel

Move PCPU initialization to a new function called cpu_pcpu_setup().
This makes it easier to add additional CPU or platform information
to the per-CPU structure without duplicated code.


238256 08-Jul-2012 marcel

Unleash the APs at SI_SUB_KICK_SCHEDULER so that we have them all
up and running to service interrupts. This is especially important
when the firmware has bound interrupts to CPUs, like for the SGI
Altix 350. We wake up APs at SI_SUB_CPU time and they sit and spin
until we unleash them, so there's nothing fundamentally different
from a MD perspective.


238190 07-Jul-2012 marcel

Implement ia64_physmem_alloc() and use it consistently to get memory
before VM has been initialized. This includes:
1. Replacing pmap_steal_memory(),
2. Replace the handcrafted logic to allocate a naturally aligned VHPT,
3. Properly allocate the DPCPU for the BSP.

Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't
cross into the next PBVM page. If we were to cross into the next
page, then there wouldn't be a PTE entry on the page table for it
and we would end up with a MCA following a page fault. As such,
this commit fixes MCAs occasionally seen.


238184 07-Jul-2012 marcel

Hide the creation of phys_avail behind an API to make it easier to do it
correctly. We now iterate the EFI memory descriptors once and collect all
the information in a single pass. This includes:
1. The I/O port base address,
2. The PAL memory region. Have the physmem API track this.
3. Memory descriptors of memory we can't use, like bad memory, runtime
services code & data, etc. Have the physmem API track these.
4. memory descriptors of memory we can use or re-use, such as free
memory, boot time services code & data, loader code & data, etc.
These are added by the physmem API.

Since the PBVM page table and pages are in memory described as loader
data, inform the physmem API of chunks that need to be delated from the
available physical memory.

While here, remove Maxmem and replace it with the better named paddr_max.
Maxmem was defined as physmem, which is generally wrong. Now, paddr_max
is properly defined as the largesty physical address.

The upshot of all this is that:
1. We properly determine realmem.
2. We maximize physmem by re-using memory where possible.
3. We remove complexity from ia64_init() in machdep.c.
4. Remove confusion about realmem, physmem & Maxmem.

The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c,
as well as replace the handcrafted allocation of the VHPT for the BSP in
pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation
of phys_avail after it is being created.


237517 24-Jun-2012 andrew

Make the wchar_t type machine dependent.

This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the
ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an
unsigned short with the former preferred.

Because of this requirement we need to move the definition of __wchar_t to
a machine dependent header. It also cleans up the macros defining the limits
of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine
dependent header then using them to define WCHAR_MIN and WCHAR_MAX
respectively.

Discussed with: bde


237433 22-Jun-2012 kib

Implement mechanism to export some kernel timekeeping data to
usermode, using shared page. The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with: bde
Reviewed by: jhb
Tested by: flo
MFC after: 1 month


237430 22-Jun-2012 kib

Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to
timekeeping information.

MFC after: 1 week


237168 16-Jun-2012 alc

The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer. This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by: kib
MFC after: 6 weeks


236409 01-Jun-2012 jkim

Improve style(9) in the previous commit.


236403 01-Jun-2012 iwasaki

Call AcpiLeaveSleepStatePrep() in interrupt disabled context
(described in ACPICA source code).

- Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c
and call AcpiLeaveSleepStatePrep() in interrupt disabled context.
- Add acpi_wakeup_machdep() to execute wakeup MD procedures and call
it twice in interrupt disabled/enabled context (ia64 version is
just dummy).
- Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in
order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep().
- Move identity mapping related code to acpi_install_wakeup_handler()
(i386 version) for preparation of x86/acpica/acpi_wakeup.c
(MFC candidate).

Reviewed by: jkim@
MFC after: 2 days


236375 01-Jun-2012 alc

pmap_alloc_vhpt() doesn't need the pages that it allocates to be mapped
into the kernel map, so vm_page_alloc_contig() can be used in place of
contigmalloc().

Reviewed by: marcel


235941 24-May-2012 bz

MFp4 bz_ipv6_fast:

in_cksum.h required ip.h to be included for struct ip. To be
able to use some general checksum functions like in_addword()
in a non-IPv4 context, limit the (also exported to user space)
IPv4 specific functions to the times, when the ip.h header is
present and IPVERSION is defined (to 4).

We should consider more general checksum (updating) functions
to also allow easier incremental checksum updates in the L3/4
stack and firewalls, as well as ponder further requirements by
certain NIC drivers needing slightly different pseudo values
in offloading cases. Thinking in terms of a better "library".

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235041 04-May-2012 marcel

Don't assume we have legacy PICs (i.e. 8259A in cascade) at the legacy
I/O port addresses. Even if we do, this is hardly the place to mask
interrupts. It's not clear that this was at all needed. The code came
with CVS revision 1.2 of nexus.c when interrupt support was first added.
What is known is that ia64 has always been designed around the IOSAPIC,
and that doing I/O like this prevents Altix from booting.


234785 29-Apr-2012 dim

Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after: 2 weeks


233271 21-Mar-2012 ed

Remove pty(4) from our kernel configurations.

As of FreeBSD 8, this driver should not be used. Applications that use
posix_openpt(2) and openpty(3) use the pts(4) that is built into the
kernel unconditionally. If it turns out high profile depend on the
pty(4) module anyway, I'd rather get those fixed. So please report any
issues to me.

The pty(4) module is still available as a kernel module of course, so a
simple `kldload pty' can be used to run old-style pseudo-terminals.


233207 19-Mar-2012 tijl

Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace
amd64/i386/pc98 specialreg.h with stubs.


233204 19-Mar-2012 tijl

Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs.


233203 19-Mar-2012 tijl

Move userland bits (and some common kernel bits) from amd64 and i386
segments.h to a new x86 segments.h.

Add __packed attribute to some structs (just to be sure).
Also make it clear that i386 GDT and LDT entries are used in ia64 code.


233125 18-Mar-2012 tijl

Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.

Reviewed by: kib


232619 06-Mar-2012 attilio

Disable the option VFS_ALLOW_NONMPSAFE by default on all the supported
platforms.
This will make every attempt to mount a non-mpsafe filesystem to the
kernel forbidden, unless it is expressely compiled with
VFS_ALLOW_NONMPSAFE option.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.


232356 01-Mar-2012 jhb

- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
constraints.

These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.

Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.

Reviewed by: scottl


232250 28-Feb-2012 gavin

Correct capitalization of "Hz" in user-visible text (manpages, printf(),
etc).

MFC after: 3 days


231177 08-Feb-2012 marcel

Rev. 228360 moved the call to cpu_set_upcall() to happen before
td_proc gets initialized in td (=newtd). Use td0 instead.

MFC after: 3 days


230475 23-Jan-2012 das

Add C11 macros describing subnormal numbers to float.h.

Reviewed by: bde


230366 20-Jan-2012 das

Add parentheses where required. Without them, `sizeof LDBL_MAX'
is a syntax error and shouldn't be, while `1 FLT_ROUNDS' isn't a
syntax error and should be. Thanks to bde for the examples.


229997 12-Jan-2012 ken

Add the CAM Target Layer (CTL).

CTL is a disk and processor device emulation subsystem originally written
for Copan Systems under Linux starting in 2003. It has been shipping in
Copan (now SGI) products since 2005.

It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
(who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
available under a BSD-style license. The intent behind the agreement was
that Spectra would work to get CTL into the FreeBSD tree.

Some CTL features:

- Disk and processor device emulation.
- Tagged queueing
- SCSI task attribute support (ordered, head of queue, simple tags)
- SCSI implicit command ordering support. (e.g. if a read follows a mode
select, the read will be blocked until the mode select completes.)
- Full task management support (abort, LUN reset, target reset, etc.)
- Support for multiple ports
- Support for multiple simultaneous initiators
- Support for multiple simultaneous backing stores
- Persistent reservation support
- Mode sense/select support
- Error injection support
- High Availability support (1)
- All I/O handled in-kernel, no userland context switch overhead.

(1) HA Support is just an API stub, and needs much more to be fully
functional.

ctl.c: The core of CTL. Command handlers and processing,
character driver, and HA support are here.

ctl.h: Basic function declarations and data structures.

ctl_backend.c,
ctl_backend.h: The basic CTL backend API.

ctl_backend_block.c,
ctl_backend_block.h: The block and file backend. This allows for using
a disk or a file as the backing store for a LUN.
Multiple threads are started to do I/O to the
backing device, primarily because the VFS API
requires that to get any concurrency.

ctl_backend_ramdisk.c: A "fake" ramdisk backend. It only allocates a
small amount of memory to act as a source and sink
for reads and writes from an initiator. Therefore
it cannot be used for any real data, but it can be
used to test for throughput. It can also be used
to test initiators' support for extremely large LUNs.

ctl_cmd_table.c: This is a table with all 256 possible SCSI opcodes,
and command handler functions defined for supported
opcodes.

ctl_debug.h: Debugging support.

ctl_error.c,
ctl_error.h: CTL-specific wrappers around the CAM sense building
functions.

ctl_frontend.c,
ctl_frontend.h: These files define the basic CTL frontend port API.

ctl_frontend_cam_sim.c: This is a CTL frontend port that is also a CAM SIM.
This frontend allows for using CTL without any
target-capable hardware. So any LUNs you create in
CTL are visible in CAM via this port.

ctl_frontend_internal.c,
ctl_frontend_internal.h:
This is a frontend port written for Copan to do
some system-specific tasks that required sending
commands into CTL from inside the kernel. This
isn't entirely relevant to FreeBSD in general,
but can perhaps be repurposed.

ctl_ha.h: This is a stubbed-out High Availability API. Much
more is needed for full HA support. See the
comments in the header and the description of what
is needed in the README.ctl.txt file for more
details.

ctl_io.h: This defines most of the core CTL I/O structures.
union ctl_io is conceptually very similar to CAM's
union ccb.

ctl_ioctl.h: This defines all ioctls available through the CTL
character device, and the data structures needed
for those ioctls.

ctl_mem_pool.c,
ctl_mem_pool.h: Generic memory pool implementation used by the
internal frontend.

ctl_private.h: Private data structres (e.g. CTL softc) and
function prototypes. This also includes the SCSI
vendor and product names used by CTL.

ctl_scsi_all.c,
ctl_scsi_all.h: CTL wrappers around CAM sense printing functions.

ctl_ser_table.c: Command serialization table. This defines what
happens when one type of command is followed by
another type of command.

ctl_util.c,
ctl_util.h: CTL utility functions, primarily designed to be
used from userland. See ctladm for the primary
consumer of these functions. These include CDB
building functions.

scsi_ctl.c: CAM target peripheral driver and CTL frontend port.
This is the path into CTL for commands from
target-capable hardware/SIMs.

README.ctl.txt: CTL code features, roadmap, to-do list.

usr.sbin/Makefile: Add ctladm.

ctladm/Makefile,
ctladm/ctladm.8,
ctladm/ctladm.c,
ctladm/ctladm.h,
ctladm/util.c: ctladm(8) is the CTL management utility.
It fills a role similar to camcontrol(8).
It allow configuring LUNs, issuing commands,
injecting errors and various other control
functions.

usr.bin/Makefile: Add ctlstat.

ctlstat/Makefile
ctlstat/ctlstat.8,
ctlstat/ctlstat.c: ctlstat(8) fills a role similar to iostat(8).
It reports I/O statistics for CTL.

sys/conf/files: Add CTL files.

sys/conf/NOTES: Add device ctl.

sys/cam/scsi_all.h: To conform to more recent specs, the inquiry CDB
length field is now 2 bytes long.

Add several mode page definitions for CTL.

sys/cam/scsi_all.c: Handle the new 2 byte inquiry length.

sys/dev/ciss/ciss.c,
sys/dev/ata/atapi-cam.c,
sys/cam/scsi/scsi_targ_bh.c,
scsi_target/scsi_cmds.c,
mlxcontrol/interface.c: Update for 2 byte inquiry length field.

scsi_da.h: Add versions of the format and rigid disk pages
that are in a more reasonable format for CTL.

amd64/conf/GENERIC,
i386/conf/GENERIC,
ia64/conf/GENERIC,
sparc64/conf/GENERIC: Add device ctl.

i386/conf/PAE: The CTL frontend SIM at least does not compile
cleanly on PAE.

Sponsored by: Copan Systems, SGI and Spectra Logic
MFC after: 1 month


229605 05-Jan-2012 adrian

Flip on IEEE80211_SUPPORT_MESH and AH_SUPPORT_AR5416, the
wlan and ath modules respectively assume this is set.

Pointy hat to: adrian


228973 29-Dec-2011 rwatson

Add "options CAPABILITY_MODE" and "options CAPABILITIES" to GENERIC kernel
configurations for various architectures in FreeBSD 10.x. This allows
basic Capsicum functionality to be used in the default FreeBSD
configuration on non-embedded architectures; process descriptors are not
yet enabled by default.

MFC after: 3 months
Sponsored by: Google, Inc


228631 17-Dec-2011 avg

kern cons: introduce infrastructure for console grabbing by kernel

At the moment grab and ungrab methods of all console drivers are no-ops.

Current intended meaning of the calls is that the kernel takes control of
console input. In the future the semantics may be extended to mean that
the calling thread takes full ownership of the console (e.g. console
output from other threads could be suspended).

Inspired by: bde
MFC after: 2 months


228522 15-Dec-2011 alc

Eliminate vestiges of page coloring.


228469 13-Dec-2011 ed

Replace __signed by signed.

The signed keyword is an integral part of the C syntax. There's no need
to use __signed.


227333 08-Nov-2011 attilio

Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by: gianni
Reviewed by: kib


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


226835 27-Oct-2011 kensmith

Adjust the debugger options slightly. This should help me do the right
thing when changing the debugging options as part of head becoming a new
stable branch. It may also help people who for one reason or another want
to run head but don't want it slowed down by the debugging support.

Reviewed by: kib


226818 26-Oct-2011 kensmith

Move the debugging support to its own section. This matches what is
in the other architectures' GENERIC and makes removing it at the point
we're creating a new stable branch a bit easier.

Discussed with: marcel


226607 21-Oct-2011 das

People porting FreeBSD to new architectures ought not have to
implement a deprecated FPU control interface in addition to the
standard one. To make this clearer, further deprecate ieeefp.h
by not declaring the function prototypes except on architectures
that implement them already.

Currently i386 and amd64 implement the ieeefp.h interface for
compatibility, and for fp[gs]etprec(), which doesn't exist on
most other hardware. Powerpc, sparc64, and ia64 partially implement
it and probably shouldn't, and other architectures don't implement it
at all.


226547 19-Oct-2011 kensmith

Add a warning about why sbp(4) is commented out so that curious folks
are forewarned they might wind up with a hole in their foot if they
decide to give it a try.

Suggested by: dougb


226510 18-Oct-2011 kensmith

Comment out the sbp(4) driver for architectures that support it.

As part of the 8.0-RELEASE cycle this was done in stable/8 (r199112)
but was left alone in head so people could work on fixing an issue that
caused boot failure on some motherboards. Apparently nobody has worked
on it and we are getting reports of boot failure with the 9.0 test builds.
So this time I'll comment out the driver in head (still hoping someone
will work on it) and MFC to stable/9.

Submitted by: Alberto Villa <avilla at FreeBSD dot org>


226112 07-Oct-2011 kib

Remove unused define.

MFC after: 1 month


225841 28-Sep-2011 kib

Remove locking of the vm page queues from several pmaps, which only
protected the dirty mask updates. The dirty mask updates are handled
by atomics after the r225840.

Submitted by: alc
Tested by: flo (sparc64)
MFC after: 2 weeks


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


225474 11-Sep-2011 kib

Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by: jhb
Approved by: re (bz)
MFC after: 2 weeks


225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


224746 09-Aug-2011 kib

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


224668 06-Aug-2011 marcel

Fix kernel core dumps now that the kernel is using PBVM. The basic
problem to solve is that we don't have a fixed mapping from kernel
text to physical address so that libkvm can bootstrap itself. We
solve this by passing the physical address of the bootinfo structure
to the consumer as the entry point of the core file. This way,
libkvm can extract the PBVM page table information and locate the
kernel in the core file.
We also need to dump memory chunks of type loader data, because
those hold the kernel and the PBVM page table (among other things).

Approved by: re (blanket)


224663 05-Aug-2011 marcel

Follow-up commit: refactor pmap_kextract() to make it easier to
catch and debug issues like the one fixed in the previous commit:
Replace all return statements with goto statements so that we end
up at a single place with a value for the physical address.
Print a message for all unknown KVA addresses.

Approved by: re (blanket)


224662 05-Aug-2011 marcel

Remove stray semicolon in pmap_kextract() that turned the conditional
"return (0)" into an unconditional one and as such broke PBVM address
queries -- such as during kernel core dumps.

Approved by: re (blanket)


224217 19-Jul-2011 attilio

Bump MAXCPU for amd64, ia64 and XLP mips appropriately.
From now on, default values for FreeBSD will be 64 maxiumum supported
CPUs on amd64 and ia64 and 128 for XLP. All the other architectures
seem already capped appropriately (with the exception of sparc64 which
needs further support on jalapeno flavour).

Bump __FreeBSD_version in order to reflect KBI/KPI brekage introduced
during the infrastructure cleanup for supporting MAXCPU > 32. This
covers cpumask_t retiral too.

The switch is considered completed at the present time, so for whatever
bug you may experience that is reconducible to that area, please report
immediately.

Requested by: marcel, jchandra
Tested by: pluknet, sbruno
Approved by: re (kib)


224216 19-Jul-2011 attilio

On 64 bit architectures size_t is 8 bytes, thus it should use an 8 bytes
storage.
Fix the sintrcnt/sintrnames specification.

No MFC is previewed for this patch.

Reported, reviewed and tested by: marcel
Approved by: re (kib)


224207 19-Jul-2011 attilio

Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by: pluknet
Approved by: re (kib)


224187 18-Jul-2011 attilio

- Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
This area can be widely improved by applying the same to other
architectures and likely finding an unified approach among them and
move the whole code to be MI. More work in this area is expected to
happen fairly soon.

No MFC is previewed for this patch.

Tested by: pluknet
Reviewed by: jhb
Approved by: re (kib)


224185 18-Jul-2011 jhb

Enable NEW_PCIB by default on ia64.

Approved by: re (kib), marcel


224184 18-Jul-2011 jhb

Implement bus_adjust_resource() for the ia64 nexus driver.

Reviewed by: marcel
Approved by: re (kib)


224116 16-Jul-2011 marcel

Don't assume pmap_mapdev() gets called only for memory mapped I/O
addresses (i.e. uncacheable). ACPI in particular uses pmap_mapdev()
rather excessively (number of calls) just to get a valid KVA. To
that end, have pmap_mapdev():
1. cache the last result so that we don't waste time for multiple
consecutive invocations with the same PA/SZ.
2. find the memory descriptor that covers the PA and return NULL
if none was found or when the PA is for a common DRAM address.
3. Use either a region 6 or region 7 KVA, in accordance with the
memory attribute.


224114 16-Jul-2011 marcel

Don't send EOI to the CPU before we handled the interrupt. This could
potentially trigger multiple pending interrupts for level-sensitive
interrupts. However, the event timer interrupt does need EOI before
being handled to avoid missing clock events.

These conflicting requirements are handled by having the XIV handler
inform the dispatch code whether or not it send EOI to the CPU. If not,
the dispatch code will do it. This allows handlers to send EOI before
doing potentially long-running activities, while still have a sensible
default behaviour.


224112 16-Jul-2011 marcel

Add a few more helper functions for working with memory descriptors:
o efi_md_find() - returns the md that covers the given address
o efi_md_last() - returns the last md in the list
o efi_md_prev() - returns the md that preceeds the given md.


223873 08-Jul-2011 marcel

Implement basic support for memory attributes. At this time we only
distinguish between UC and WB memory so that we can map the page to
either a region 6 address (for UC) or a region 7 address (for WB).

This change is only now possible, because previously we would map
regions 6 and 7 with 256MB translations and on top of that had the
kernel mapped in region 7 using a wired translation. The introduction
of the PBVM moved the kernel into its own region and freed up region
7 and allowed us to revert to standard page-sized translations.

This commit inroduces pmap_page_to_va() that respects the attribute.


223763 04-Jul-2011 marcel

Disable PREEMPTION for now. See also PR ia64/147501.


223758 04-Jul-2011 attilio

With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by: marcel, marius, alc
Tested by: pluknet
MD testing by: marcel, marius, gonzo, andreast


223732 02-Jul-2011 alc

When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by: kib


223700 30-Jun-2011 marcel

Change the management of nested faults by switching to physical
addressing while reading or writing the trap frame. It's not
possible to guarantee that the one translation cache entry that
we depend on is not going to get purged by the CPU. We already
know that global shootdowns (ptc.g and/or ptc.ga) can (and will)
cause multiple TC entries to get purged and we initialize tried
to handle that by serializing kernel entry with these operations.
However, we need to serialize kernel exit as well.

But even if we can serialize, it appears that CPU threads within
a core can affect each other's TC entries beyond the global
shootdown. This would mean serializing any and all translatation
cache updates with the threads in a core with the kernel entry
and exit of any thread in that core. This is just too painful
and complicated.

Since we already properly coded for the 2 nested faults that we
can get, all we need to do is use those to obtain the physical
address of the trap frame, switch to physical mode and in that
way eliminate any further faults. The trap frame is already
aligned to 1KB boundaries to make sure we don't cross the page
boundary, this is safe to do.

We still need to serialize ptc.g or ptc.ga across CPUs because
the platform can only have 1 such operation outstanding at the
same time. We can now use a regular (spin) lock for this.

Also, it has been observed that we can get a nested TLB faults
for region 7 virtual addresses. This was unexpected. For now,
we enhance the nested TLB fault handler to deal with those as
well, but it needs to be understood.


223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


223544 25-Jun-2011 marcel

Oops. The sec field of struct bintime is *not* a 32-bit type.
It's time_t, which is 64 bits on ia64.


223542 25-Jun-2011 marcel

Define the minimum fractional period in terms of hz. We know hz is
a magnitude smaller than itc_freq. A minimum period of 10*hz is
sufficient precision. As a side-effect, the number of clocks per
second, when the machine is idle, dropped by more than 50%.
Be anal and define the maximum period to be at least 4G seconds.
With a 64-bit counter and an ITC frequency that's expected to be
always less than 4Ghz, it takes longer than that to wrap around.


223529 25-Jun-2011 marcel

Replace the original copyright notice with my own. Everything in
this file is written by me and has no bearing on the initial or
original version.


223528 25-Jun-2011 marcel

Update copyright.


223526 25-Jun-2011 marcel

Switch to the event timers infrastructure. This includes:
o Setting td_intr_frame to the XIVs trap frame because it's referenced
by the ET event handler.
o Signal EOI to the CPU before calling the registered XIV handlers.
This prevents lost ITC interrupts, which cause starvation in one-shot
mode.
o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters.
o Have the APs call cpu_initclocks() so as to limited the scattering of
clock related initialization. cpu_initclocks() calls the <self>_bsp()
or <self>_ap() version accordingly.
o Uncomment the ET clock handling in cpu_idle().
o Update the DDB 'show pcpu' output for the new MD fields.
o Entirely rewritten ia64_ih_clock(). Note that we don't create as many
clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale.
We can only have 240 XIVs and we can have more CPUs than that. There's
a single intrcnt index for the cumulative clock ticks and we keep per
CPU counts in the PCPU stats structure.
o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order).

Open issues:
o Clock interrupts can still be lost. Some tweaking is still necessary.

Thanks to: mav@ for his support, feedback and explanations.

ET stats while committing:
eris% sysctl machdep.cpu | grep nclks

machdep.cpu.0.nclks: 24007
machdep.cpu.1.nclks: 22895
machdep.cpu.2.nclks: 13523
machdep.cpu.3.nclks: 9342
machdep.cpu.4.nclks: 9103
machdep.cpu.5.nclks: 9298
machdep.cpu.6.nclks: 10039
machdep.cpu.7.nclks: 9479
eris% vmstat -i | grep clock
clock 108599 50


223478 23-Jun-2011 marcel

Unblock the outgoing thread after we performed pmap_switch() to
switch the region registers. pmap_switch() returns the pmap for
which the region register are currently programmed, which needs
to be re-programmed on the CPU the ougoing thread gets switched
in. This change does not noticibly change anything or fix known
bugs, but does give me a warm fuzzy feeling by being more
correct.


223365 21-Jun-2011 alc

Use a non-standard page size that is supported.


223171 17-Jun-2011 marcel

Improve on style(9)


223170 17-Jun-2011 marcel

Properly serialize the global shootdown with the instruction
stream of the local processor. Also explicitly invalidate
the ALAT. This is done on the other CPUs in the coherence
domain by virtue of the ptc.ga instruction, but does not
apply to the local CPU.


222971 11-Jun-2011 marcel

Add the model number for the Montvale processor (marketed as Itanium 2 9100).
At this time we're missing just one: Tukwila (Itanium 2 9300).


222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


222800 07-Jun-2011 marcel

Call set_cputicker() to have the time counter use the ITC register.
Note that the ITC frequency is fixed.


222769 06-Jun-2011 marcel

Improve cpu_idle():
o cpu_idle_hook is expected to be called with interrupts
disabled and re-enables interrupts on return.
o sync with x86: don't idle when the CPU has runnable tasks
o have callers of ia64_call_pal_static() disable interrupts
and re-enable interrupts.
o add, but compile-out, support for idle mode. This will be
enabled at some later time, after proper testing.


222531 31-May-2011 nwhitehorn

On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by: jhb


221894 14-May-2011 marcel

Prefer switching the memory stack from user to kernel *before* switching
the register stack. While the ordering doesn't matter, it creates an
invariant not previously there: the memory stack pointer will always be
larger than the register stack pointer. With this invariant in place,
it's easier to add instrumentation code that detects a stack overflow
because in such a scenario the memory stack pointer and register stack
pointers have crossed each other.

Aside: basic kernel operation needs about half the stack size (~16K)
at most. We have plenty of head room on the kernel stack...


221893 14-May-2011 marcel

Sharpening the saw:
o Clobber the register that holds the restart token immediately after
crossing the restart point. This prevents false positives (i.e. a
nested exception that we don't know can happen and that is being
treated as one we know by virtue of a lingering restart token).
o Now that the bootstrap kernel stack is free, switch onto it and call
trap() for nested traps that we don't know about. In trap we panic()
so that we can analyze the condition.


221890 14-May-2011 marcel

Be pedantic: mark the pcpu pointer (= register r13) itself as volatile.


221889 14-May-2011 marcel

Turn ia64_srlz() and ia64_srlz_i() into defines so that the code is
still correct when inlining is disabled.


221855 13-May-2011 mdf

Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk. Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file. (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by: alc
MFC after: 1 week
MFC with: r221853


221606 07-May-2011 marcel

In pmap_kextract(), return the physical address for PBVM virtual
addresses as well (incl. the PBVM page table).


221526 06-May-2011 jhb

Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus
versions instead. They were never needed as bus_generic_intr() and
bus_teardown_intr() had been changed to pass the original child device up
in 42734, but the ISA bus was not converted to new-bus until 45720.


221334 02-May-2011 marcel

Don't use the whole region 5 for KVA, because the CPU may not implement all
of the 61 bits available within the region for virtual addressing. Since
there's no good way for us to map out the gap in the virtual address space,
limit KVA to the architectural minimum implemented address bits. This still
gives us 1 petabyte of KVA, so no worries.


221271 30-Apr-2011 marcel

Stop linking against a direct-mapped virtual address and instead
use the PBVM. This eliminates the implied hardcoding of the
physical address at which the kernel needs to be loaded. Using the
PBVM makes it possible to load the kernel irrespective of the
physical memory organization and allows us to replicate kernel text
on NUMA machines.

While here, reduce the direct-mapped page size to the kernel's
page size so that we can support memory attributes better.


221218 29-Apr-2011 jhb

Change rman_manage_region() to actually honor the rm_start and rm_end
constraints on the rman and reject attempts to manage a region that is out
of range.
- Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of
~0ul).
- To preserve existing behavior, change rman_init() to set rm_start and
rm_end to allow managing the full range (0 to ~0ul) if they are not set by
the caller when rman_init() is called.


221173 28-Apr-2011 attilio

Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by: Sandvine Incorporated
Discussed with: emaste, des
MFC after: 2 weeks


221124 27-Apr-2011 rmacklem

This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.


221035 25-Apr-2011 marcel

Remove prototypes of non-existent functions.


220982 24-Apr-2011 mav

Switch the GENERIC kernels for all architectures to the new CAM-based ATA
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).

ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.


220313 04-Apr-2011 marcel

Use the new arch_loadaddr I/F to align ELF objects to PBVM page
boundaries. For good measure, align all other objects to cache
lines boundaries.

Use the new arch_loadseg I/F to keep track of kernel text and
data so that we can wire as much of it as is possible. It is
the responsibility of the kernel to link critical (read IVT
related) code and data at the front of the respective segment
so that it's covered by TRs before the kernel has a chance to
add more translations.

Use a better way of determining whether we're loading a legacy
kernel or not. We can't check for the presence of the PBVM page
table, because we may have unloaded that kernel and loaded an
older (legacy) kernel after that. Simply use the latest load
address for it.


220238 01-Apr-2011 kib

Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.

In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
on amd64.

The changes are hidden under COMPAT_43.

MFC after: 1 month


220042 26-Mar-2011 alc

Eliminate an unused definition.

Reviewed by: marcel


219841 21-Mar-2011 marcel

Fix switching to physical mode as part of calling into EFI runtime
services or PAL procedures. The new implementation is based on
specific functions that are known to be called in certain scenarios
only. This in particular fixes the PAL call to obtain information
about translation registers. In general, the new implementation does
not bank on virtual addresses being direct-mapped and will work when
the kernel uses PBVM.

When new scenarios need to be supported, new functions are added if
the existing functions cannot be changed to handle the new scenario.
If a single generic implementation is possible, it will become clear
in due time.

While here, change bootinfo to a pointer type in anticipation of
future development.


219808 21-Mar-2011 marcel

Change region 4 to be part of the kernel. This serves 2 purposes:
1. The PBVM is in region 4, so if we want to make use of it, we
need region 4 freed up.
2. Region 4 and above cannot be represented by an off_t by virtue
of that type being signed. This is problematic for truss(1),
ktrace(1) and other such programs.


219775 19-Mar-2011 bz

For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it. LINT will
still build it.

While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases. In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.

Discussed with: qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR: kern/148018, kern/155604, kern/144917, kern/146792
MFC after: 2 weeks


219758 18-Mar-2011 marcel

o Move the IVT and supporting functions to the front of the text
segment so that it's always mapped by the loader.
o Change the alternate fault handlers to account for PBVM. Since
currently the region is handled by the VHPT, no alternate faults
will be generated for it.


219756 18-Mar-2011 marcel

Remove inclusion of unneeded bootinfo.h header.


219741 18-Mar-2011 marcel

Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about
the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though
it's not used anywhere in the source tree.


219691 16-Mar-2011 marcel

MFaltix:
Add support for Pre-Boot Virtual Memory (PBVM) to the loader.

PBVM allows us to link the kernel at a fixed virtual address without
having to make any assumptions about the physical memory layout. On
the SGI Altix 350 for example, there's no usuable physical memory
below 192GB. Also, the PBVM allows us to control better where we're
going to physically load the kernel and its modules so that we can
make sure we load the kernel in memory that's close to the BSP.

The PBVM is managed by a simple page table. The minimum size of the
page table is 4KB (EFI page size) and the maximum is currently set
to 1MB. A page in the PBVM is 64KB, as that's the maximum alignment
one can specify in a linker script. The bottom line is that PBVM is
between 64KB and 8GB in size.

The loader maps the PBVM page table at a fixed virtual address and
using a single translations. The PBVM itself is also mapped using a
single translation for a maximum of 32MB.

While here, increase the heap in the EFI loader from 512KB to 2MB
and set the stage for supporting relocatable modules.


219523 11-Mar-2011 mdf

Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.


219468 10-Mar-2011 mdf

Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name. Also consistenly use strlcpy().

Suggested by: Warner Losh


219405 08-Mar-2011 dchagin

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


218773 17-Feb-2011 alc

Remove pmap fields that are either unused or not fully implemented.

Discussed with: kib


218382 06-Feb-2011 marcel

Comment-out FLOWTABLE. It causes a kernel panic due to a misaligned memory
access related to an IPv6 route update.

PR: kern/148018


218195 02-Feb-2011 mdf

Put the general logic for being a CPU hog into a new function
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after: 1 week


217688 21-Jan-2011 pluknet

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


217519 17-Jan-2011 jkim

Remove empty dev_mem_md_init() stubs.


217515 17-Jan-2011 jkim

Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.

MFC after: 1 month


217265 11-Jan-2011 jhb

Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by: bde


217192 09-Jan-2011 kib

Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by: alc


217180 09-Jan-2011 das

The highest-precision floating point type on ia64 has 64 bits of
precision, so DECIMAL_DIG should be 21, as on i386/amd64.


217147 08-Jan-2011 tijl

On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by: bde [1]
Approved by: kib (mentor)


217145 08-Jan-2011 tijl

Fix types of some values in machine/_limits.h.

On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by: bde
Approved by: kib (mentor)


217097 07-Jan-2011 kib

Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.


216143 03-Dec-2010 brucec

Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.


216134 02-Dec-2010 brucec

Disallow passing in a count of zero bytes to the bus_space(9) functions.

Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR: kern/80980
Discussed with: jhb


216094 01-Dec-2010 alc

phys_avail[] is correctly defined as an array of vm_paddr_t's in
machdep.c. Use that same type, and not vm_offset_t, in this include file.


215054 09-Nov-2010 jhb

- Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
in the _mtx_* or __mtx_* namespaces. While here, change the names to more
closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by: bde (1, 2)


215052 09-Nov-2010 jhb

Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.


215023 09-Nov-2010 jkim

Reduce diff between platforms and fix style(9) bugs.


214835 05-Nov-2010 jhb

Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger. Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock. However, trap interrupts for single-stepping
can still occur even when interrupts are disabled. Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased. Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero. To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with: bde
MFC after: 1 month


213282 29-Sep-2010 neel

Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.

The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by: NetApp
Submitted by: Will McGovern (will at netapp dot com)
Reviewed by: mjacob, jhb


213157 25-Sep-2010 imp

Remove clauses 3 and 4, per changes to NetBSD versions of these files.


212413 10-Sep-2010 avg

bus_add_child: change type of order parameter to u_int

This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to: r212213
MFC after: 10 days


211515 19-Aug-2010 jhb

Remove unused KTRACE includes.


211412 17-Aug-2010 kib

Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by: marius (sparc64)
MFC after: 1 month


211006 07-Aug-2010 kib

Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS
in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well.

MFC after: 1 week


210939 06-Aug-2010 jhb

Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month


210623 29-Jul-2010 jhb

Mark the __curthread() functions as __pure2 and remove the volatile keyword
from the inline assembly. This allows the compiler to cache invocations of
curthread since it's value does not change within a thread context.

Submitted by: zec (i386)
MFC after: 1 week


210564 28-Jul-2010 mdf

Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size. The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class. This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused. At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by: -arch (mostly silence)
Reviewed by: zml
Approved by: zml (mentor)


210550 27-Jul-2010 jhb

Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.

Reviewed by: alc


210369 22-Jul-2010 kib

When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by: jhb, nwhitehorn
MFC after: 2 weeks


209779 07-Jul-2010 marcel

Add acpi_find_table() -- a convenience function for looking up an
ACPI table given the signature.


209775 07-Jul-2010 marcel

Remove pointless BOOTP conditional.


209749 06-Jul-2010 marcel

Provide more examples for error injection.


209671 03-Jul-2010 marcel

Allocate and setup an interrupt vector for corrected machine checks.
For now, just print when we get the interrupt, but eventually we need
to collect the details and provide a more useful report.


209618 01-Jul-2010 marcel

When compiling with profiling, we define PROF for userspace and GPROF
for the kernel.


209617 30-Jun-2010 marcel

While functions are ideally aligned to a 32-byte boundary, don't
assume this to be the case.


209613 30-Jun-2010 jhb

Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.


209085 12-Jun-2010 marcel

The ptc.g operation for the Mckinley and Madison processors has the
side-effect of purging more than the requested translation. While
this is not a problem in general, it invalidates the assumption made
during constructing the trapframe on entry into the kernel in SMP
configurations. The assumption is that only the first store to the
stack will possibly cause a TLB miss. Since the ptc.g purges the
translation caches of all CPUs in the coherency domain, a ptc.g
executed on one CPU can cause a purge on another CPU that is
currently running the critical code that saves the state to the
trapframe. This can cause an unexpected TLB miss and with interrupt
collection disabled this means an unexpected data nested TLB fault.

A data nested TLB fault will not save any context, nor provide a
way for software to determine what caused the TLB miss nor where
it occured. Careful construction of the kernel entry and exit code
allows us to handle a TLB miss in precisely orchastrated points
and thereby avoiding the need to wire the kernel stack, but the
unexpected TLB miss caused by the ptc.g instructution resulted in
an unrecoverable condition and resulting in machine checks.

The solution to this problem is to synchronize the kernel entry
on all CPUs with the use of the ptc.g instruction on a single CPU
by implementing a bare-bones readers-writer lock that allows N
readers (= N CPUs entering the kernel) and 1 writer (= execution
of the ptc.g instruction on some CPU). This solution wins over
a rendez-vous approach by not interrupting CPUs with an IPI.

This problem has not been observed on the Montecito.

PR: ia64/147772
MFC after: 6 days


209048 11-Jun-2010 alc

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


209026 11-Jun-2010 marcel

Bump MAX_BPAGES from 256 to 1024. It seems that a few drivers, bge(4)
in particular, do not handle deferred DMA map load operations at all.
Any error, and especially EINPROGRESS, is treated as a hard error and
typically abort the current operation. The fact that the busdma code
queues the load operation for when resources (i.e. bounce buffers in
this particular case) are available makes this especially problematic.
Bounce buffering, unlike what the PR synopsis would suggest, works
fine.

While on the subject, properly implement swi_vm().

PR: 147502
MFC after: 1 week


208990 10-Jun-2010 alc

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


208659 30-May-2010 alc

Simplify the inner loop of get_pv_entry(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty,
wait instead until the loop completes.


208646 29-May-2010 alc

Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.


208574 26-May-2010 alc

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


208514 24-May-2010 kib

Change ia64' struct syscall_args definition so that args is a pointer to
the arguments array instead of array itself. ia64 syscall arguments are
readily available in the frame, point args to it, do not do unnecessary
bcopy. Still reserve the array in syscall_args for ia32 emulation.

Suggested and reviewed by: marcel
MFC after: 1 month


208504 24-May-2010 alc

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


208392 21-May-2010 jhb

- Adjust the whitespace for the lines that output fields in 'show pcpu' in
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
the idle process is now shared across all idle threads.

MFC after: 1 month


208283 19-May-2010 marcel

Switch to C99 exact-width types.


208175 16-May-2010 alc

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


207796 08-May-2010 alc

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


207410 30-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


207373 29-Apr-2010 alc

MFamd64/i386 r207205
Clearing a page table entry's accessed bit and setting the page's
PG_REFERENCED flag in pmap_protect() can't really be justified, so
don't do it. Moreover, on ia64, don't set the page's dirty field
unless pmap_protect() is removing write access.


207329 28-Apr-2010 attilio

- Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by: Sandvine Incorporated
PR: threads/116181
Reviewed by: emaste, marcel
MFC after: 3 weeks


207269 27-Apr-2010 kib

Style: use #define<TAB> instead of #define<SPACE>.

Noted by: bde, pluknet gmail com
MFC after: 11 days


207155 24-Apr-2010 alc

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


207152 24-Apr-2010 kib

Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by: pluknet
Reviewed by: imp, jhb, nwhitehorn
MFC after: 2 weeks


207077 22-Apr-2010 thompsa

Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had
the illusion of a tunable setting but was always turned on regardless.

MFC after: 1 week


206570 13-Apr-2010 marcel

Populate the sysctl tree with any MCA records we collected.
The sequence number is used as the name of a sysctl node,
under which we add the MCA records using the CPU id as the
leaf name.

Add the hw.mca.inject sysctl to provide a way to inject
MC errors and trigger machine checks.

PR: ia64/113102


206558 13-Apr-2010 marcel

Change the (generic) argument to ia64_store_mca_state() from the
cpuid to the struct pcpu of the CPU. We casting between pointer
types only then.


206556 13-Apr-2010 marcel

o s/u_int64_t/uint64_t/g
o style(9) fixes.


206541 13-Apr-2010 marcel

Sync up to SDM 2.2.


205727 27-Mar-2010 marcel

Bring up-to-date:
o Switch to ITANIUM2 has the cpu. This has absolutely no effect
on the code, but makes for a better example.
o Drop COMPAT_FREEBSD6. We're tier 2, so you're supposed to run
8-stable or newer.
o Add PREEMPTION. It works now.
o Remove HWPMC_HOOKS. We don't have support for hwpmc yet.

o Add a bunch of new devices: atapist, hptiop, amr, ips, twa, igb,
ixgbe, ae, age, alc, ale, bce, bfe, et, jme, msk, nge, sk, ste,
stge, tx, vge, axe, rue, udav, fwip, and all USB serial.
o Remove "legacy" devices: le, vx, dc, pcn, rl, sis.

Make sure to the module list is a superset of what goes into GENERIC.


205726 27-Mar-2010 marcel

Implement interrupt to CPU binding. Assign interrupts to CPUs in a
round-robin fashion, starting with the highest priority interrupt
on the highest-numbered CPU and cycling downwards.


205723 27-Mar-2010 marcel

Remove nx_pcibus from the nexus resource. Nexus is not involved
with PCI busses. Remove nexus_read_ivar() and nexus_write_ivar()
to give default behaviour. Remove <machine/nexusvar.h> as well,
because there's nothing in it that's being used.


205713 26-Mar-2010 marcel

Rename disable_intr() to ia64_disable_intr() and rename enable_intr()
to ia64_enable_intr(). This reduces confusion with intr_disable() and
intr_restore().

Have configure_final() call ia64_finalize_intr() instead of enable_intr()
in preparation of adding support for binding interrupts to all CPUs.


205665 26-Mar-2010 marcel

Only use the interval timer for clock interrupts on the BSP and
have the BSP use IPIs to trigger clock interrupts on the APs.
This allows us to run on hardware configurations for which the
ITC has non-uniform frequencies across CPUs.

While here, change the clock XIV to type IPI so as to protect
the interrupt delivery against CPU re-balancing once that's
implemented.


205660 26-Mar-2010 nwhitehorn

Fix the ia64 build.

Pointy hat to: me


205642 25-Mar-2010 nwhitehorn

Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by: jhb


205454 22-Mar-2010 marcel

o Remove the pmap argument to pmap_invalidate_all() as it's not used
other than in a potentially dangerous KASSERT.
o Hand-inline pmap_remove_page() as it's only called from 1 place and
the abstraction that pmap_remove_page() provides is not enough to
warrant the obfuscation. Eliminate the dangerous KASSERT in the
process.
o In pmap_remove_pte(), remove the KASSERT for pmap being the current
one as it's not safe in the face of CPU migration.


205435 22-Mar-2010 marcel

Drop the pmap argument to pmap_invalidate_page(). It's not used other
than in a KASSERT. The KASSERT is broken in that it's done outside the
critical section and as such isn't protected against CPU migration.
Improve pmap_invalidate_page() as follows:
o calculate vhpt_ofs inside the critical region for exactly the same
reason.
o calculate the tag outside the FOREACH loop, as it's loop-invariant.
This is more efficient.
o Replace the test and set with an atomic cmpset operation because we
are changing other CPU's VHPT tables and this avoids invalidating
after the entry got modified. Not necessarily a problem, but better
safe than sorry.


205434 22-Mar-2010 marcel

With preemption, the high FP registers may get enabled by cpu_switch()
before we grab the mutex. Don't assert that they must be disabled at
that point. We pretty much bypass all logic in that case anyway and
leave immediately, so there's no harm.


205433 22-Mar-2010 marcel

Fix interrupt handling by extending the critical region so that
preemption doesn't happen until after all pending interrupt have
been services.
While here again, simplify the EOI handling by doing it after we
call the XIV-specific handlers, rather than in each of them. The
original thought was that we may want to do an EOI first and the
actual IPI handling next, but that's mostly a micro-optimization.


205432 22-Mar-2010 marcel

Disable interrupts when calling into SAL for PCI configuration
cycles. This serves 2 purposes:
1. It prevents preemption and CPU migration while running SAL code.
2. It reduces the chance of stack overflows: we're supposed to enter
SAL with at least 16KB of either memory- or register stack space,
which we can't do without switching to a different stack.


205431 22-Mar-2010 marcel

Define curthread as an inline function that loads the thread pointer
directly from r13, the pcpu pointer. This guarantees correct behaviour
when the thread migrates to a different CPU.


205429 21-Mar-2010 marcel

Print MD fields in the pcpu to aid debugging.


205428 21-Mar-2010 marcel

Don't include <machine/_regset.h> when _MACHINE_REGSET_H_ in defined.
This is not for multiple inclusion purposes, because _regset.h already
handles this, but to enable inclusion of the MD header by cross-tools
on non-ia64 installations. The cross-tool can include _regset.h itself
before including MD headers that depend on it.


205357 20-Mar-2010 marcel

Don't check for boot_verbose in the environment. The loader does
that already and sets RB_VERBOSE. The loader has always done it.


205234 17-Mar-2010 marcel

Revamp the interrupt code based on the previous commit:
o Introduce XIV, eXternal Interrupt Vector, to differentiate from
the interrupts vectors that are offsets in the IVT (Interrupt
Vector Table). There's a vector for external interrupts, which
are based on the XIVs.

o Keep track of allocated and reserved XIVs so that we can assign
XIVs without hardcoding anything. When XIVs are allocated, an
interrupt handler and a class is specified for the XIV. Classes
are:
1. architecture-defined: XIV 15 is returned when no external
interrupt are pending,
2. platform-defined: SAL reports which XIV is used to wakeup
an AP (typically 0xFF, but it's 0x12 for the Altix 350).
3. inter-processor interrupts: allocated for SMP support and
non-redirectable.
4. device interrupts (i.e. IRQs): allocated when devices are
discovered and are redirectable.

o Rewrite the central interrupt handler to call the per-XIV
interrupt handler and rename it to ia64_handle_intr(). Move
the per-XIV handler implementation to the file where we have
the XIV allocation/reservation. Clock interrupt handling is
moved to clock.c. IPI handling is moved to mp_machdep.c.

o Drop support for the Intel 8259A because it was broken. When
XIV 0 is received, the CPU should initiate an INTA cycle to
obtain the interrupt vector of the 8259-based interrupt. In
these cases the interrupt controller we should be talking to
WRT to masking on signalling EOI is the 8259 and not the I/O
SAPIC. This requires adriver for the Intel 8259A which isn't
available for ia64. Thus stop pretending to support ExtINTs
and instead panic() so that if we come across hardware that
has an Intel 8259A, so have something real to work with.

o With XIVs for IPIs dynamically allocatedi and also based on
priority, define the IPI_* symbols as variables rather than
constants. The variable holds the XIV allocated for the IPI.

o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV
assigned to IPI_STOP is delivered.


205172 15-Mar-2010 marcel

Have cpu_throw() loop on blocked_lock as well. This bug has existed
a long time and has gone unnoticed just as long, because I kept
using sched_4bsd (due to sched_ule not working with preemption),
but GENERIC had sched_ule by default -- including SMP.

While here, remove unused inclusion of <machine/clock.h>, remove
totally bogus inclusion of <i386/include/specialreg.h>.


205116 13-Mar-2010 ed

Remove COMPAT_43TTY from stock kernel configuration files.

COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.


205015 11-Mar-2010 nwhitehorn

Accidentally committed test code. Remove it.

Big pointy hat: me


205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


204905 09-Mar-2010 marcel

Remove inclusion of <i386/include/psl.h>
While here move inclusion of <sys/lock.h> in a better place.


204904 09-Mar-2010 marcel

Remove support for SYS_RES_DRQ.


204646 03-Mar-2010 joel

The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from: NetBSD


204425 27-Feb-2010 marcel

Interrupt related cleanups:
o Assign vectors based on priority, because vectors have
implied priority in hardware.
o Use unordered memory accesses to the I/O sapic and use
the acceptance form of the mf instruction.
o Remove the sapicreg.h and sapicvar.h headers. All definitions
in sapicreg.h are private to sapic.c and all definitions in
sapicvar.h are either private or interface functions. Move the
interface functions to intr.h.
o Hide the definition of struct sapic.


204184 22-Feb-2010 marcel

Prefer I-units and M-units for nop instructions. This works around
McKinley flaws. It also avoids using the F-unit in the kernel for
no reason.


204183 21-Feb-2010 marcel

Normalize nop instructions: Only use 0 for the immediate operand.


204182 21-Feb-2010 marcel

Remove pm_active from struct pmap as it serves no purpose.

MFC after: 1 week


203938 15-Feb-2010 attilio

Adjust style (following the already existing rules) for the newly
introduced option DEADLKRES.

Reported by: danfe, julian, avg


203884 14-Feb-2010 marcel

Some code cleanups:
o s/u_int32_t/uint32_t/g
o Add multiple-inclusion protection.
o Break long lines.


203883 14-Feb-2010 marcel

Some code churn:
o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used
by calling either bus_space_map() or pmap_mapdev().
o Implement bus_space_map() in terms of pmap_mapdev() and implement
bus_space_unmap() in terms of pmap_unmapdev().
o Have ia64_pib hold the uncached virtual address of the processor interrupt
block throughout the kernel's life and access the elements of the PIB
through this structure pointer.

This is a non-functional change with the exception of using ia64_ld1() and
ia64_st8() to write to the PIB. We were still using assignments, for which
the compiler generates semaphore reads -- which cause undefined behaviour
for uncacheable memory. Note also that the memory barriers in ipi_send() are
critical for proper functioning.

With all the mapping of uncached memory done by pmap_mapdev(), we can keep
track of the translations and wire them in the CPU. This then eliminates
the need to reserve a whole region for uncached I/O and it eliminates
translation traps for device I/O accesses.


203758 10-Feb-2010 attilio

Add the options DEADLKRES (introducing the deadlock resolver thread) in
the 'debugging' section of any HEAD kernel and enable for the mainstream
ones, excluding the embedded architectures.
It may, of course, enabled on a case-by-case basis.

Sponsored by: Sandvine Incorporated
Requested by: emaste
Discussed with: kib


203572 06-Feb-2010 marcel

Fix single-stepping when the kernel was entered through the EPC syscall
path. When the taken branch leaves the kernel and enters the process,
we still need to execute the instruction at that address. Don't raise
SIGTRAP when we branch into the process, but enable single-stepping
instead.


203106 28-Jan-2010 marcel

In pci_cfgregread() and pci_cfgregwrite(), validate the arguments and check
that the alignment matches the width of the read or write.


203054 27-Jan-2010 marcel

In cpu_switch(), use an atomic operation to set the td_lock
of the old thread to the mutex that's passed.

Pointed out by: attilio, jhb


202904 23-Jan-2010 marcel

Remove cpu_boot() and call efi_reset_system() directly from
cpu_reset().


202273 14-Jan-2010 marcel

Add ioctl requests to /dev/io on ia64 for reading and writing
EFI variables. The primary reason for this is that it allows
sysinstall(8) to add a boot menu item for the newly installed
FreeBSD image.


202272 14-Jan-2010 marcel

Fix previous commitr:. efi_var_set() was copied from efi_var_get(),
but wasn't actually changed.


202271 14-Jan-2010 marcel

Add wrappers for the RT Variable Services. While here, translate the
EFI status into a standard errno value and change efi_set_time() to
return a standard error.

MFC after: 1 week


202097 11-Jan-2010 marcel

Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after: 1 week


202019 10-Jan-2010 imp

Add INCLUDE_CONFIG_FILE in GENERIC on all non-embedded platforms.

# This is the resolution of removing it from DEFAULTS...

MFC after: 5 days


201813 08-Jan-2010 bz

In sys/<arch>/conf/Makefile set TARGET to <arch>. That allows
sys/conf/makeLINT.mk to only do certain things for certain
architectures.

Note that neither arm nor mips have the Makefile there, thus
essentially not (yet) supporting LINT. This would enable them
do add special treatment to sys/conf/makeLINT.mk as well chosing
one of the many configurations as LINT.

This is a hack of doing this and keeping it in a separate commit
will allow us to more easily identify and back it out.

Discussed on/with: arch, jhb (as part of the LINT-VIMAGE thread)
MFC after: 1 month


201534 04-Jan-2010 imp

Revert 200594. This file isn't intended for these sorts of things.


201443 03-Jan-2010 brooks

Add vlan(4) to all GENERIC kernels.

MFC after: 1 week


201373 02-Jan-2010 marcel

Change BUS_SPACE_MAXADDR from 2^32-1 to 2^64-1. 2^32-1 is representative
for its origin, more than for its accuracy.

MFC after: 1 week


201269 30-Dec-2009 marcel

Revamp bus_space access functions:
o Optimize for memory mapped I/O by making all I/O port acceses function
calls and marking the test for the IA64_BUS_SPACE_IO tag with
__predict_false(). Implement the I/O port access functions in a new
file, called bus_machdep.c.
o Change the bus_space_handle_t for memory mapped I/O to the virtual
address rather than the physical address. This eliminates the PA->VA
translation for every I/O access. The handle for I/O port access is
still the port number.
o Move inb(), outb(), inw(), outw(), inl(), outl(), and their string
variants from cpufunc.h and define them in bus.h. On ia64 these are
not CPU functions at all. In bus.h they are merely aliases for the
new I/O port access functions defined in bus_machdep.h.
o Handle the ACPI resource bug in nexus_set_resource(). There we can
do it once so that we don't have to worry about it whenever we need
to write to an I/O port that is really a memory mapped address.

The upshot of this change is that the KBI is better defined and that I/O
port access always involves a function call, allowing us to change the
actual implementation without breaking the KBI. For memory mapped I/O the
virtual address is abstracted, so that we can change the VA->PA mapping
in the kernel without causing an KBI breakage. The exception at this time
is for bus_space_map() and bus_space_unmap().

MFC after: 1 week.


201223 29-Dec-2009 rnoland

Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.

This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by: jhb@
MFC after: Not in this lifetime...


201145 28-Dec-2009 antoine

(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month


201032 26-Dec-2009 marcel

Use unordered memory loads and stores for the in* and out*
family of functions.


200889 23-Dec-2009 marcel

Export the bus, cpu and itc frequencies under the hw.freq sysctl node.
The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The
frequencies are rounded to the nearest whole MHz.

While here, rename and re-type bus_frequency, processor_frequency and
itc_frequency to bus_freq, cpu_freq and itc_freq and make them static.
As unsigned integers, the hw.freq.cpu sysctl can more easily be made
generic (across all architectures) making porting easier.

MFC after: 3 days


200888 23-Dec-2009 marcel

Add a bit definition for invalid timestamp in the record header.


200594 16-Dec-2009 dougb

Add INCLUDE_CONFIG_FILE, and a note in comments about how to also
include the comments with CONFIGARGS


200240 08-Dec-2009 marcel

In exception_save, write-back ar.rnat after switching the backing-
store. Writing to ar.bspstore is defined to leave ar.rnat undefined.

PR: ia64/120315
MFC after: 3 days


200207 07-Dec-2009 marcel

Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id
excluded, as it's used by MI code) and mode the sysctl variables from
pcpu_stats to pcpu_md.
Adjust all references accordingly.

While nearby, change the PCPU sysctl tree so that they match the CPU
device sysctl tree -- they are now children of a static node called
"machdep.cpu" and are named only with their cpu ID.


200200 07-Dec-2009 marcel

Allocate the VHPT for each CPU in cpu_mp_start(), rather than
allocating MAXCPU VHPTs up-front. This allows us to max-out MAXCPU
without memory waste -- MAXCPU is now 32 for SMP kernels.

This change also eliminates the VHPT scaling based in the total
memory in the system. It's the workload that determines the best size
of the VHPT. The workload can be affected by the amount of memory,
but not necessarily. For example, there's no performance difference
between VHPT sizes of 256KB, 512KB and 1MB when building the LINT
kernel. This was observed with a system that has 8GB of memory.
By default the kernel will allocate a 1MB VHPT. The user can tune the
system with the "machdep.vhpt.log2size" tunable.


200051 03-Dec-2009 marcel

Make sure bus space accesses use unorder memory loads and stores.
Memory accesses are posted in program order by virtue of the
uncacheable memory attribute.
Since GCC, by default, adds acquire and release semantics to
volatile memory loads and stores, we need to use inline assembly
to guarantee it. With inline assembly, we don't need volatile
pointers anymore.

Itanium does not support semaphore instructions to uncacheable
memory.


199941 29-Nov-2009 marcel

Move the sysctl related fields to the end of the structure and
make them conditional upon _KERNEL. libkvm includes <sys/pcpu.h>
and <sys/sysctl.h> does not expose the structure definitions to
userland.


199893 28-Nov-2009 marcel

Eliminate teh use of MAXCPU in static arrays of interrupt counters by
adding statistics counters to the PCPU structure. Export the counters
through sysctl by giving each PCPU structure its own sysctl context.

While here, fix cnt.v_intr by not just having it count clock interrupts,
but every interrupt and add more counters for each interrupt source.


199868 27-Nov-2009 alc

Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY. The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with: kib


199727 24-Nov-2009 marcel

Improve upon revision 196196 by removing the newly added comment
in the wrong place and instead add a KASSERT in the right place.


199719 23-Nov-2009 marcel

Revert previous commit. The problem was not related to overrunning
the kernel stack at all. The new USB stack simply caused a change
in timing that triggered a firmware bug more often. The addition
of PRINTF_BUFR_SIZE apparently triggered the same firmware bug
even more reliably.

But even with KSTACK_PAGES=5, one instance of the firmware bug
remained: booting with a CD inserted. This problem was run into
by accident after installing Debian and having to boot FreeBSD
to fixup the GPT partitioning (Thanks... not). After bumping
KSTACK_PAGES to 5, it was pretty unbelievable that the stack was
still being too small.

After updating the firmware we could boot with a CD inserted and
KSTACK_PAGES could be lowered back to 4 pages without problems.

Note: It is believed to be a timing related firmware bug, because
the machine check information showed access to the serial console
on one CPU and access to the EHCI HCD on the other CPU. Since
both are devices on the management unit and thus virtualized in
some way, any execution trace that does not include concurrent
access to the BMC from both CPUs is fine.

Note also that it's not understood exactly how increasing the
kernel stack avoided hitting the firmware bug. A change in page
faults does change timing, but it's not known if that's what's
happening here.

In any case: the problem is being monitored. Reverting back to
4 pages for the kernel stack is preferred, because it makes it
easier to switch to 16K pages (double the page size) without
wasting too much memory by not being able to half the number of
pages...


199574 20-Nov-2009 marcel

No need to include opt_kstack_pages.h, because KSTACK_PAGES is
already defined through genassym.c


199566 20-Nov-2009 marcel

Add a seatbelt to the Nested TLB Fault handler to give us a chance
to panic when we have an unexpected TLB fault while interrupt
collection is disabled. Use a token rather than the actual address
of the restart point to avoid the need for the movl instruction.
The token is arbitrary. For the drummers: it's based on a single
paradiddle.


199502 19-Nov-2009 marcel

opt_* headers are included using the quoted form.


199135 10-Nov-2009 kib

Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by: marcel
Reviewed by: marcel, davidxu
PowerPC, ARM, ia64 changes: marcel
Sparc64 tested and reviewed by: marius, also sunv reviewed
MIPS tested by: gonzo
MFC after: 1 month


198733 31-Oct-2009 marcel

Reimplement the lazy FP context switching:
o Move all code into a single file for easier maintenance.
o Use a single global lock to avoid having to handle either
multiple locks or race conditions.
o Make sure to disable the high FP registers after saving
or dropping them.
o use msleep() to wait for the other CPU to save the high
FP registers.

This change fixes the high FP inconsistency panics.

A single global lock typically serializes too much, which may
be noticable when a lot of threads use the high FP registers,
but in that case it's probably better to switch the high FP
context synchronuously. Put differently: cpu_switch() should
switch the high FP registers if the incoming and outgoing
threads both use the high FP registers.


198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198452 24-Oct-2009 marcel

Add PRINTF_BUFR_SIZE=128, since we have SMP by default.
While here, fix tabulation.


198451 24-Oct-2009 marcel

A 32KB kernel stack is not quite enough. The new USB stack is a bit
more stack hungry as compared to the old one that my RX2660 gets
a machine check and spontaneously reboots at the time the USB DVD
drive is found and attached to CAM as a mass storage device. This
doesn't happen always, but definitely varies per kernel build.
Likewise when using a 128-byte printf buffer. The additional 128
bytes that printf needs seems to be enough to have the memory stack
and register stack collide and causing a machine check.

Thus: Bump KSTACK_PAGES from 4 to 5.


198341 21-Oct-2009 marcel

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


198338 21-Oct-2009 marcel

o Align function on a 32-byte boundary so that the core's front-end
can deliver 2 bundles per cycle to the back-end.
o Mark syscall stubs with a special unwind ABI tag so that unwind
libraries know how to unwind.


197933 10-Oct-2009 kib

Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with: bz
Reviewed by: kan
Tested by: bz (i386, amd64), bsam (linux)
MFC after: some time


197729 03-Oct-2009 bz

Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by: kib
MFC after: 1 month


197316 18-Sep-2009 alc

Add a new sysctl for reporting all of the supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


196994 08-Sep-2009 phk

Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.


196268 16-Aug-2009 marcel

Decouple ACPI CPU Ids from FreeBSD's cpuid. The ACPI Ids can be
sparse, which causes a kernel assert.

Approved by: re (kensmith)


196196 13-Aug-2009 attilio

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


195649 12-Jul-2009 alc

Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.

vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by: re (kib)


195625 11-Jul-2009 marcel

On exec(2), when loading the ELF image, pmap_enter_object() is
called to prefault pages. This is an obvious place for making
sure the I-cache is coherent. It was missing though. As such,
execution over NFS and ZFS file systems was failing. NFS was
fixed the wrong way (by flushing the D-cache as part of the
NFS code) in a previous commit. ZFS problems were encountered
after that and indicated that something else was wrong...

Approved by: re (kib)


195376 05-Jul-2009 sam

Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by: bde, imp, marcel
Approved by: re (kensmith)


195295 02-Jul-2009 ed

Enable POSIX semaphores on all non-embedded architectures by default.

More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.

Proposed by: miwi
Patch by: Florian Smeets <flo kasimir com>
Approved by: re (kib)


195060 26-Jun-2009 alc

Correct the #endif comment.

Noticed by: jmallett
Approved by: re (kib)


195033 26-Jun-2009 alc

This change is the next step in implementing the cache control functionality
required by video card drivers. Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures. In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with: jhb


194784 23-Jun-2009 jeff

Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.

Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas


194524 20-Jun-2009 marcel

Drop the high FP state of an exiting thread in cpu_thread_exit() and
not in cpu_exit(). The latter is called after td_md.md_highfp_mtx
has been destroyed, which results in a race condition when another
thread wants to use the high FP registers on the CPU that still has
the high FP registers in question.


193530 05-Jun-2009 jkim

Import ACPICA 20090521.


193334 02-Jun-2009 rwatson

Remove MAC kernel config files and add "options MAC" to GENERIC, with the
goal of shipping 8.0 with MAC support in the default kernel. No policies
will be compiled in or enabled by default, but it will now be possible to
load them at boot or runtime without a kernel recompile.

While the framework is not believed to impose measurable overhead when no
policies are loaded (a result of optimization over the past few months in
HEAD), we'll continue to benchmark and optimize as the release approaches.
Please keep an eye out for performance or functionality regressions that
could be a result of this change.

Approved by: re (kensmith)
Obtained from: TrustedBSD Project


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


193018 29-May-2009 ed

Last minute TTY API change: remove mutex argument from tty_alloc().

I don't want people to override the mutex when allocating a TTY. It has
to be there, to keep drivers like syscons happy. So I'm creating a
tty_alloc_mutex() which can be used in those cases. tty_alloc_mutex()
should eventually be removed.

The advantage of this approach, is that we can just remove a function,
without breaking the regular API in the future.


192918 27-May-2009 rink

ia64: Move MCA information retrieval to a per-CPU kthread

Once AP's are launched, their MCA state information is stored and later obtainable using a sysctl. Since the size of the MCA state information is unknown, it will be malloc'ed as needed. However, when 'ia64_ap_startup' runs, it's not yet safe to call malloc and this may cause 'panic: blockable sleep lock (sleep mutex) 8192 @ /usr/src/sys/vm/uma_core.c'. This commit avoids this issue by scheduling a separate kthread to obtain this information, which immediately terminates afterwards.


192324 18-May-2009 marcel

Rename ia64_invalidate_icache() to ia64_sync_icache(). We're
not invalidating anything.


192323 18-May-2009 marcel

Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.


191954 10-May-2009 kuriyama

- Use "device\t" and "options \t" for consistency.


191449 24-Apr-2009 marcel

Remove isa_irq_pending(). It's not used.


191309 20-Apr-2009 rwatson

Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by: bde [1], jhb [2]
MFC after: 2 weeks


191278 19-Apr-2009 rwatson

Add description and cautionary note regarding CACHE_LINE_SIZE.

MFC after: 2 weeks
Suggested by: alc


191276 19-Apr-2009 rwatson

For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant. These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after: 2 weeks
Discussed on: arch@


191201 17-Apr-2009 jhb

Restore bus DMA bounce pages to an offset of 0 when they are released by
a tag that has BUS_DMA_KEEP_PG_OFFSET set. Otherwise the page could be
reused with a non-zero offset by a tag that doesn't have
BUS_DMA_KEEP_PG_OFFSET leading to data corruption.

Sleuthing by: avg
Reviewed by: scottl


191011 13-Apr-2009 kib

The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the
uio_td to extract pages from, instead of unconditionally use kernel
pmap.

Submitted by: Jason Harmening <jason.harmening gmail com> (amd64 version)
PR: amd64/133592
Reviewed by: scottl (original patch), jhb
MFC after: 2 weeks


190708 05-Apr-2009 dchagin

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


190631 01-Apr-2009 kib

Add trivial implementation for the freebsd32_sysarch on ia64.
Fix comapt32 and LINT build on ia64.

Discussed with: jhb


189926 17-Mar-2009 kib

Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer
to the full path of the image that is being executed.
Increase AT_COUNT.

Remove no longer true comment about types used in Linux ELF binaries,
listed types contain FreeBSD-specific entries.

Reviewed by: kan


189771 13-Mar-2009 dchagin

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


188944 23-Feb-2009 thompsa

Change over the usb kernel options to the new stack (retaining existing
naming). The old usb stack can be compiled in my prefixing the name with 'o'.


188665 15-Feb-2009 thompsa

Add uslcom to the build too.

Reminded by: Michael Butler


188660 15-Feb-2009 thompsa

Switch over GENERIC kernels to USB2 by default.

Tested by: make universe


188453 10-Feb-2009 marcel

Mark the BSP as being awake. This supresses the message
that not all usable CPUs could be woken up...


188350 08-Feb-2009 imp

When bouncing pages, allow a new option to preserve the intra-page
offset. This is needed for the ehci hardware buffer rings that assume
this behavior.

This is an interim solution, and a more general one is being worked
on. This solution doesn't break anything that doesn't ask for it
directly. The mbuf and uio variants with this flag likely don't work
and haven't been tested.

Universe builds with these changes. I don't have a huge-memory
machine to test these changes with, but will be happy to work with
folks that do and hps if this changes turns out not to be sufficient.

Submitted by: alfred@ from Hans Peter Selasky's original


188274 07-Feb-2009 wkoszek

Don't forget to create opt_agp.h on ia64, which also uses agp(4).


188119 04-Feb-2009 jhb

Tweak the ia64 machine check handling code to not register new sysctl nodes
while holding a spin mutex. Instead, it now shoves the machine check
records onto a queue that is later drained to add sysctl nodes for each
record. While a routine to drain the queue is present, it is not currently
called.

Reviewed by: marcel


187381 18-Jan-2009 alc

Correct an error in revision 1.170 of this file. When get_pv_entry() is
forced to reclaim pv entries, the one pv entry that it returns should not
be freed.


186212 17-Dec-2008 imp

AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them.

Reviewed by: peter


185567 02-Dec-2008 ed

Remove "[KEEP THIS!]" from COMPAT_43TTY. It's not really that important.

Sgtty is a programming interface that has been replaced by termios over
the years. In June we already removed <sgtty.h>, which exposes the
ioctl()'s that are implemented by this interface. The importance of this
flag is overrated right now.


185169 22-Nov-2008 kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


185163 22-Nov-2008 marcel

Define mb(), rmb() and wmb() for real.


185162 22-Nov-2008 kmacy

- bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


184062 19-Oct-2008 marcel

Atomically increment the number of awoken APs as all APs will
be unleashed here.

Pointed out by: christian.kandeler@hob.de


183527 01-Oct-2008 peter

Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment. Textdump magic can be passed in.


183439 28-Sep-2008 marius

Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by: jhb
Reviewed by: arch, grehan, jhb


183397 27-Sep-2008 ed

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


183322 24-Sep-2008 kib

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


183299 23-Sep-2008 obrien

The kernel implemented 'memcmp' is an alias for 'bcmp'. However, memcmp
and bcmp are not the same thing. 'man bcmp' states that the return is
"non-zero" if the two byte strings are not identical. Where as,
'man memcmp' states that the return is the "difference between the
first two differing bytes (treated as unsigned char values" if the
two byte strings are not identical.

So provide a proper memcmp(9), but it is a C implementation not a tuned
assembly implementation. Therefore bcmp(9) should be preferred over memcmp(9).


181905 20-Aug-2008 ed

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


181875 19-Aug-2008 jhb

Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports
already define _KERNEL to get to this and I'm about to add hooks to
libkvm to access per-CPU data.

MFC after: 1 week


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


180533 15-Jul-2008 alc

Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is
specified when appropriate.

Reviewed by: scottl


180359 07-Jul-2008 delphij

Add HWPMC_HOOKS to GENERIC kernels, this makes hwpmc.ko work out
of the box.


180354 07-Jul-2008 marcel

Add inline function ia64_fc_i() to abstract inline assembly.
Use the new inline function in ia64_invalidate_icache().
While there, add proper synchronization so that we know
the fc.i instructions have taken effect when we return.


180209 03-Jul-2008 peter

Exclude .cvsignore files from $FreeBSD$ checking


179990 25-Jun-2008 ed

Remove the unused major/minor numbers from iodev and memdev.

Now that st_rdev is being automatically generated by the kernel, there
is no need to define static major/minor numbers for the iodev and
memdev. We still need the minor numbers for the memdev, however, to
distinguish between /dev/mem and /dev/kmem.

Approved by: philip (mentor)


179382 28-May-2008 marcel

Work-around a compiler optimization bug, that broke libthr. Massive
inlining resulted in constant propagation to the extend that cmpval
was known to the compiler to be URWLOCK_WRITE_OWNER (= 0x80000000U).
Unfortunately, instead of zero-extending the unsigned constant, it
was sign-extended. As such, the cmpxchg instruction was comparing
0x0000000080000000LU to 0xffffffff80000000LU and obviously didn't
perform the exchange.
But, since the value returned by cmpxhg equalled cmpval (when zero-
extended), the _thr_rtld_lock_release() function thought the exchange
did happen and as such returned as if having released the lock. This
was not the case. Subsequent locking requests found rw_state non-zero
and the thread in question entered the kernel and block indefinitely.

The work-around is to zero-extend by casting to uint64_t.


179256 23-May-2008 marcel

Account for IPI_PREEMPT. We don't want to call sched_preempt() with
interrupts disabled or with td_intr_nesting_level non-zero.


179229 23-May-2008 alc

The VM system no longer uses setPQL2(). Remove it and its helpers.


179190 22-May-2008 marcel

Create the bucket mutexes with MTX_NOWITNESS. There's now a
hard limit of 512 pending mutexes in the witness code and
we can easily have 1 million bucket mutexes initialized before
witness is up and running. Bumping the limit from 512 to 1M
is not really an option here...


179173 21-May-2008 marcel

We can call ia64_flush_dirty() when the corresponding process is
locked or not. As such, use PROC_LOCKED() to determine which case
it is and lock the process when not.


179081 18-May-2008 alc

Retire pmap_addr_hint(). It is no longer used.


178893 09-May-2008 alc

Add a stub for pmap_align_superpage() on machines that don't (yet)
implement pmap-level support for superpages.


178494 25-Apr-2008 marcel

Unbreak previous commit. While here, refactor the code a bit.


178471 25-Apr-2008 jeff

- Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.

Sponsored by: Nokia


178429 22-Apr-2008 phk

Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.


178372 21-Apr-2008 phk

Make genclock standard on all platforms.

Thanks to: grehan & marcel for platform support on ia64 and ppc.


178309 19-Apr-2008 marcel

Sanitize the malloc types: M_PMAP is not used in pmap.c, so don't
define it there. Don't use M_PMAP in mp_machdep.c; define M_SMP
instead.


178296 18-Apr-2008 marcel

Remove cruft we got from Alpha, which was probably inherited
from NetBSD. I.e. make it more like a FreeBSD header.


178222 15-Apr-2008 marcel

Use genclock for RTC handling. This eliminates the MD versions for
inittodr() and resettodr(). Have nexus double as the clock device,
because it's the firmware that provides RTC services. We could
create a special (pseudo-) device for it, but that wasn't superior
enough to actually do it. Maybe later...

Requested by: phk


178215 15-Apr-2008 marcel

Support and switch to the ULE scheduler:
o Implement IPI_PREEMPT,
o Set td_lock for the thread being switched out,
o For ULE & SMP, loop while td_lock points to blocked_lock for
the thread being switched in,
o Enable ULE by default in GENERIC and SKI,


178206 14-Apr-2008 marcel

Revision 1.9 changes the delivery mode from the magic constant 0
(i.e. fixed delivery) to SAPIC_DELMODE_LOWPRI. While the commit
log doesn't mention the change in behaviour, it is believed to be
deliberate. In the last 5.5 years this hasn't been a problem. Nor
do I think did it make any difference, but who knows. However, I
do know that it break SMP support for Montecito-based machines.
Switch back to fixed-CPU delivery so that SMP works again. This
gives me some time to look more closely at the problem, as well
as make sure the I-cache validation as it's implemented currently
is sufficient in SMP configurations...


178131 11-Apr-2008 jeff

- Pass the irq and not the vector to intr_event_create().

Reviewed by: marcel


178092 11-Apr-2008 jeff

- Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.

Reviewed by: jhb
Sponsored by: Nokia


178028 09-Apr-2008 marcel

Unbreak after removal of SI_SUB_MOUNT_ROOT.


177940 05-Apr-2008 jhb

Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).

Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64


177769 30-Mar-2008 marcel

Better implement I-cache invalidation. The previous implementation
was a kluge. This implementation matches the behaviour on powerpc
and sparc64.
While on the subject, make sure to invalidate the I-cache after
loading a kernel module.

MFC after: 2 weeks


177662 27-Mar-2008 dfr

Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.


177661 27-Mar-2008 jb

When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.


177642 26-Mar-2008 phk

The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].


177325 17-Mar-2008 jhb

Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.

Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}


177276 16-Mar-2008 pjd

Implement atomic_fetchadd_long() for all architectures and document it.

Reviewed by: attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)


177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


177215 15-Mar-2008 imp

BUS_DMA_ISA is left over from Alpha, and is not used in the tree at
all. The reference in ia64 code is due to cutNpaste in its history
and can safely be removed.

Revired by: cognet, raj, marcel, jhb and maybe one other whom I'm forgetting


177181 14-Mar-2008 jhb

Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:

intr_bind(rman_get_start(irq_res), cpu);

however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.

Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}


177157 13-Mar-2008 jhb

Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines. The existing code already handles
having two platforms: ACPI and legacy. However, the existing approach was
rather hardcoded and difficult to extend. These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device). This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it
can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
resource managers so that subclassed nexus(4) drivers can invoke it from
their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
attach routine.
- The ACPI driver no longer contains an new-bus identify method. Instead
it exposes a public function (acpi_identify()) which is a probe routine
that the MD nexus(4) drivers can use to probe for ACPI. All of the
probe logic in acpi_probe() is now moved into acpi_identify() and
acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
acpi_identify() and claims the nexus0 device if the probe succeeds. It
then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
This matches the previous behavior where the old acpi_identify() would
fail to add an acpi0 device again leaving you with no devices.

Discussed with: imp
Silence on: arch@


177126 12-Mar-2008 jeff

- Fix build breakage; there was a reference to a removed syscall in
a KASSERT(). Attempt to cleanup the comment to reflect reality.


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


176734 02-Mar-2008 jeff

- Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.

Sponsored by: Nokia


176346 16-Feb-2008 marcel

Re-sort options. While here:
o remove COMPAT_FREEBSD5
o add INVARIANTS
o add WITNESS


176286 14-Feb-2008 marcel

On Montecito processors, the instruction cache is in fact not
coherent with the data caches. Implement a quick fix to allow
us to boot on Montecito, while I'm working on a better fix in
the mean time.

Commit made on Montecito-based Itanium...


175959 04-Feb-2008 marcel

Allocate a stack for thread0 and switch to it before calling
mi_startup(). This frees up kstack for static PAL/SAL calls
and double-fault handling.


175768 28-Jan-2008 ru

Add a wrapper function that bound checks writes to the dump device.


175147 07-Jan-2008 jhb

Add COMPAT_FREEBSD7 and enable it in configs that have COMPAT_FREEBSD6.


175067 03-Jan-2008 alc

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


175066 03-Jan-2008 imp

Use correct function name in panic message


175065 03-Jan-2008 imp

Fix obsolete comment. pmap_remove_all is the function we're in.


174938 27-Dec-2007 alc

Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.


174898 25-Dec-2007 rwatson

Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.


174405 07-Dec-2007 jkoshy

Add stubs to unbreak LINT.


174326 06-Dec-2007 marcel

Add a BSD disklabel backend to g_part:
o Disklabels can have between 8 and 20 partitions (inclusive).
o No device special file is created for the raw partition.
o Switch ia64 to use this backend.
o No support for boot code yet.


174195 02-Dec-2007 rwatson

Break out stack(9) from ddb(4):

- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)


173988 27-Nov-2007 jhb

Remove the 'needbounce' variable from the _bus_dmamap_load_buffer()
routine. It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.

MFC after: 1 week
Reviewed by: scottl


173970 27-Nov-2007 jasone

Define atomic_readandclear_ptr.


173799 21-Nov-2007 scottl

Extend critical section coverage in the low-level interrupt handlers to
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.

Reviewed by: jhb, jeff, silby


173708 17-Nov-2007 alc

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


173615 14-Nov-2007 marcel

o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().

i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().

PR: ia64/118024


173600 14-Nov-2007 julian

generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.


173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


172692 16-Oct-2007 marcel

Set PTE_ACCESSED in the PTE and before inserting it in the VHPT.
This avoids back-to-back faults for all TLB misses. This can be
improved further in the future by also setting PTE_DIRTY for TLB
misses for write accesses.

MFC after: 1 week


172691 16-Oct-2007 marcel

The flushrs instruction must be the first in an instruction
group. GNU as(1) already made sure of that, but it's better
to actually have the code right.

MFC after: 1 week


172690 16-Oct-2007 marcel

Print instruction stops to improve analysis of dependency
violations.

MFC after: 1 week


172689 16-Oct-2007 marcel

Fix disassembly of the invala, itc, itr and hint instructions
by fixing the opcode ordering.

MFC after: 1 week


172332 26-Sep-2007 brueffer

Use the correct expanded name for SCTP.

PR: 116496
Submitted by: koitsu
Reviewed by: rrs
Approved by: re (kensmith)


172317 25-Sep-2007 alc

Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)


172189 15-Sep-2007 alc

It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)


171740 06-Aug-2007 marcel

Clear pending interrupts before we enable external interrupts.
Recently the AP in my Merced box seems to have grown a habit
of getting unexpected interrupts, such as redundant wake-ups
and legacy interrupts that require an INTA cycle.

While here, replace DELAY(0) with cpu_spinwait() so that it's
clear what we're doing as well as enable the code to take
advantage of cpu_spinwait() when it gets implemented.

Approved by: re (blanket)


171739 06-Aug-2007 marcel

Keep interrupts disabled while handling external interrupts.
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.

While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.

Further simplify assembly code by dealing with ASTs from C.

Approved by: re (blanket)


171735 05-Aug-2007 marcel

In ia64_set_rr(), don't perform data serialization. This allows
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().

Approved by: re (blanket)


171722 04-Aug-2007 marcel

Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add instruction-serialization after writing to cr.pta.

Delay enabling interrupts until after we setup the clocks and after
we program the task priority register.

Approved by: re (blanket)


171721 04-Aug-2007 marcel

Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to the region registers and
add instruction-serialization after writing to cr.pta.

Approved by: re (blanket)


171720 04-Aug-2007 marcel

Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to cr.tpr.

Approved by: re (blanket)


171719 04-Aug-2007 marcel

Add required data-serialization after writing to cr.itm and cr.itv.

Approved by: re (blanket)


171718 04-Aug-2007 marcel

Add ia64_srlz_d() and ia64_srlz_i() functions to aid in serialization.

Approved by: re (blanket)


171666 30-Jul-2007 marcel

o Switch to physical addressing before dereferencing the VHPT
bucket pointer. The virtual mapping may not be present in the
translation cache. This will result in a nested TLB fault at
a place we don't handle (and don't want to handle).
o Make sure there's a stop after the rfi instruction, otherwise
its behaviour is undefined.
o Make sure we switch back to virtual addressing before doing
a rfi. Behaviour is undefined otherwise.

Approved by: re (blanket)


171665 30-Jul-2007 marcel

Add option EXCEPTION_TRACING, which enables KTR-like functionality
for processor interruptions. This is especially useful to track
unexpected nested TLB faults.

Approved by: re (blanket)


171664 30-Jul-2007 marcel

Rework the interrupt code and add support for interrupt filtering
(INTR_FILTER). This includes:
o Save a pointer to the sapic structure and IRQ for every vector,
so that we can quickly EOI, mask and unmask the interrupt.
o Add locking to the sapic code now that we can reprogram a
sapic on multiple CPUs at the same time.
o Use u_int for the vector and IRQ. We only have 256 vectors, so
using a 64-bit type for it is rather excessive.
o Properly handle concurrent registration of a handler for the
same vector.

Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.

Approved by: re (blacket)


171663 30-Jul-2007 marcel

Explicitly map the VHPT on all processors. Previously we were
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.

Approved by: re (blanket)


171662 30-Jul-2007 marcel

Add casts to some of the more commonly used pointer-type atomic
operations. We really should be able to make those inline functions,
but this would break its use for sx_locks.

Approved by: re (blanket)


171553 23-Jul-2007 dwmalone

If clock_ct_to_ts fails to convert time time from the real time clock,
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).

Approved by: re
MFC after: 3 weeks


171463 16-Jul-2007 marcel

Restore the value of ar.rnat after the assignment to ar.bspstore.
The SDM states that writing to ar.bspstore invalidates the ar.rnat
register as a side-effect. This was interpreted as "bits in the
ar.rnat register that correspond to registers whose value is on
the stack are undefined'. Since we keep the kernel stack NaT-
aligned with the user stack (i.e. the lower 9 bits of the backing
store pointer remain unchanged when we switch to the kernel stack)
bits that need preserving would be preserved.

That interpretation is questionable. So, now, the interpretation
is more absolute: ar.rnat is undefined after writing to ar.bspstore.
As such, we write the saved value of ar.rnat back to ar.rnat after
writing to ar.bspstore.

Discussed with: christian.kandeler@hob.de
Approved by: re (kensmith)


171312 09-Jul-2007 marcel

dma_tag is a static structure. Testing for it being a NULL pointer
doesn't make sense. Rewrite to what was intended.

Correctly warned about by: GCC
Approved by: re (bmah)


170731 14-Jun-2007 delphij

Enable SCTP by default for GENERIC kernels in order to give it
more exposure. The current state of SCTP implementation is
considered to be ready for 32-bit platforms, but still need some
work/testing on 64-bit platforms.

Approved by: re (kensmith)
Discussed with: rrs


170652 13-Jun-2007 marcel

Enable GEOM_PART_MBR by default. On ia64 this replaces GEOM_MBR.


170519 10-Jun-2007 alc

Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] using one of these definitions.

Approved by: re


170507 10-Jun-2007 marcel

Work around a firmware bug in the HP rx2660, where in ACPI an I/O port
is really a memory mapped I/O address. The bug is in the GAS that
describes the address and in particular the SpaceId field. The field
should not say the address is an I/O port when it clearly is not.

With an additional check for the IA64_BUS_SPACE_IO case in the bus
access functions, and the fact that I/O ports pretty much not used
in general on ia64, make the calculation of the I/O port address a
function. This avoids inlining the work-around into every driver,
and also helps reduce overall code bloat.


170474 09-Jun-2007 marcel

Synchronize the instruction cache after writing to memory. This is
needed for breakpoints to work.


170473 09-Jun-2007 marcel

Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.


170444 09-Jun-2007 marcel

Physical memory regions can be larger than INT_MAX. Change size1
from an int to a long to avoid printing negative byte and page
counts.


170440 08-Jun-2007 rwatson

Enable AUDIT by default in the GENERIC kernel, allowing security event
auditing to be turned on without a kernel recompile, just an rc.conf
option.

Approved by: re (kensmith)
Obtained from: TrustedBSD Project


170403 07-Jun-2007 marcel

Remove remaining references to pc_curtid missed in previous commit.


170402 07-Jun-2007 marcel

Eliminate pmap_install(), which was used to wrap pmap_switch() and
grab sched_lock. This would serialize calls to pmap_switch from
cpu_switch(). With the introduction of thread_lock, this is not
possible anymore, because thread_lock is not a single lock. It
varies. Secondly and most importantly, it's not needed at all. The
only requirement for pmap_switch() is that it's not preempted
while in the middle of updating the CPU and PCPU. In other words,
it's a critical region. No locking required.


170390 07-Jun-2007 davidxu

Fix compiling error.


170359 06-Jun-2007 marcel

Include <sys/sched.h> for sched_throw().


170307 05-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170306 04-Jun-2007 jeff

Commit 13/14 of sched_lock decomposition.
- Add a new parameter to cpu_switch() that is used to release the lock on
the outgoing thread and properly acquire the lock on the incoming
thread. This parameter is not required for schedulers that don't do
per-cpu locking and architectures which do not support it may continue
to use the 4BSD scheduler. This feature is presently not supported
on ia64

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170305 04-Jun-2007 jeff

- Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170303 04-Jun-2007 jeff

Commit 10/14 of sched_lock decomposition.
- Use sched_throw() rather than replicating the same cpu_throw() code for
each architecture. This also allows the scheduler to use any locking it
may want to.
- Use the thread_lock() rather than sched_lock when preempting.
- The scheduler lock is not required to synchronize release_aps.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170291 04-Jun-2007 attilio

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


170162 31-May-2007 piso

In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb


170086 29-May-2007 yongari

Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by: scottl


170033 27-May-2007 alc

Eliminate an unused definition.


170026 27-May-2007 marcel

Have the processor defer all faults and exceptions for control
speculative loads. This at least makes control speculative loads
work. In the future we should analyze which faults/exceptions
we want to handle rather than defer to avoid having to call the
recovery code when it's not strictly necessary.


169846 22-May-2007 kan

Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.


169814 21-May-2007 marcel

When speculation fails (as determined by the chk instruction) the
processor is to jump to recovery code. This branching behaviour
may not be implemented by the processor and a Speculative Operation
fault is raised. The OS is responsible to emulate the branch.
Implement this, because GCC 4.2 uses advanced loads regularly.


169773 19-May-2007 marcel

Fix GCC warning: va = va += PAGE_SIZE contains pointless operation
va = va. Fix white space in nearby lines.


169760 19-May-2007 marcel

Add a level of indirection to the kernel PTE table. The old
scheme allowed for 1024 PTE pages, each containing 256 PTEs.
This yielded 2GB of KVA. This is not enough to boot a kernel
on a 16GB box and in general too low for a 64-bit machine.
By adding a level of indirection we now have 1024 2nd-level
directory pages, each capable of supporting 2GB of KVA. This
brings the grand total to 2TB of KVA.


169757 19-May-2007 marcel

Account for the fact that contigmalloc(9) can return a NULL pointer.
Fix the flags argument: M_WAITOK is not a valid flag. Its presence
leaves the indication that contigmalloc(9) will not return a NULL
pointer.

The use of contigmalloc(9) in this place is probably not a good idea
given the constraints. It's probably better to lift the constraints
and instead add a permanent mapping to the ITR. It's possible that
the first 256MB of memory is exhausted when we get here.

This fixes a kernel panic on a 16GB rx3600.


169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


169291 05-May-2007 alc

Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory. The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used. Previously,
only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR: 112194


168920 21-Apr-2007 sepotvin

Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc


168603 10-Apr-2007 pjd

Remove trailing '.' for consistency!


168594 10-Apr-2007 pjd

Add UFS_GJOURNAL options to the GENERIC kernel.

Approved by: re (kensmith)


167814 22-Mar-2007 jkim

Catch up with ACPI-CA 20070320 import.


167767 21-Mar-2007 jhb

Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().

One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first. nyan@ has already fixed all of the PC-98
drivers.


167429 11-Mar-2007 alc

Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file. Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb


167352 09-Mar-2007 mohans

Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.


167277 06-Mar-2007 scottl

Don't increment total_bounced when doing no-op dmamap_sync ops.


166945 24-Feb-2007 piso

Updated ia64 isa support with the new bus_setup_intr() syntax.

Approved by: re (implicit?)


166901 23-Feb-2007 piso

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


166860 21-Feb-2007 alc

Change pmap_protect() so that execute access can be removed without
simultaneously removing write access.


166810 18-Feb-2007 alc

Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.


166631 11-Feb-2007 marcel

Now that the free page queue mutex is a sleep mutex, we cannot call
vm_page_alloc() from within a critical section in pmap_growkernel().
Since the need for a critical section may never have existed in the
first place, simply get rid of it.

Discussed with: alc@


166604 09-Feb-2007 brooks

Include GEOM_LABEL in GENERIC. It's very useful and not well publicized
enough.

Approved by: pjd


166551 07-Feb-2007 marcel

Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.


165967 12-Jan-2007 imp

Remove 3rd clause, renumber, ok per email


165369 20-Dec-2006 davidxu

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


164395 18-Nov-2006 marcel

Since printf also has at least one critical section, we need to
initialize pc_curthread. While here, rename early_pcpu to pcpu0
to be conistent (compare thread0 and proc0).


164392 18-Nov-2006 marcel

Now that printf() needs the PCPU, set it up before we call printf().
Change the pc_pcb field from a pointer to struct pcb to struct pcb
so that sizeof(struct pcb) includes the PCB we use for IPI_STOP.
Statically declare early_pcb so that we don't have to allocate the
PCB for thread0. This way we can setup the PCPU before cninit()
and thus before we use printf().


164391 18-Nov-2006 marcel

Revert previous commit. PC_CONS_BUFR is not used nor needed by
assembly.


164250 13-Nov-2006 ru

Fix a comment.


164229 12-Nov-2006 alc

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


164049 06-Nov-2006 rwatson

Add missing includes of priv.h.


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163987 04-Nov-2006 jb

Remove the KDTRACE option again because of the complaints about having
it as a default.

For the record, the KDTRACE option caused _no_ additional source files
to be compiled in; certainly no CDDL source files. All it did was to
allow existing BSD licensed kernel files to include one or more CDDL
header files.

By removing this from DEFAULTS, the onus is on a kernel builder to add
the option to the kernel config, possibly by including GENERIC and
customising from there. It means that DTrace won't be a feature
available in FreeBSD by default, which is the way I intended it to be.

Without this option, you can't load the dtrace module (which contains
the dtrace device and the DTrace framework). This is equivalent to
requiring an option in a kernel config before you can load the linux
emulation module, for example.

I think it is a mistake to have DTrace ported to FreeBSD, but not
to have it available to everyone, all the time. The only exception
to this is the companies which distribute systems with FreeBSD embedded.
Those companies will customise their systems anyway. The KDTRACE
option was intended for them, and only them.


163972 04-Nov-2006 jb

Build in kernel support for loading DTrace modules by default. This
adds the hooks that DTrace modules register with, and adds a few functions
which have the dtrace_ prefix to allow the DTrace FBT (function boundary
trace) provider to avoid tracing because they are called from the DTtrace
probe context.

Unlike other forms of tracing and debug, DTrace support in the kernel
incurs negligible run-time cost.

I think the only reason why anyone wouldn't want to have kernel support
enabled for DTrace would be due to the license (CDDL) under which DTrace
is released.


163928 03-Nov-2006 marcel

Make sure kern_envp is never NULL. If we don't get a pointer to
the environment from the loader, use the static environment.


163858 01-Nov-2006 jb

Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by: scottl, jhb
MFC after: 2 weeks


163711 26-Oct-2006 jb

Remove the KSE option now that it's in DEFAULTS on these arches/machines.

The 'nooption' kernel config entry has to be used to turn KSE off now.
This isn't my preferred way of dealing with this, but I'll defer to
scottl's experience with the io/mem kernel option change and the grief
experienced over that.

Submitted by: scottl@


163710 26-Oct-2006 jb

Add 'options KSE' to the kernel config DEFAULTS on all arches/machines
except sun4v.

This change makes the transition from a default to an option more
transparent and is an attempt to head off all the compliants that are
likely from people who don't read UPDATING, based on experience with
the io/mem change.

Submitted by: scottl@


163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


163630 23-Oct-2006 ru

Move "device splash" back to MI NOTES and "files", it's MI.


163619 23-Oct-2006 marcel

o Eliminate nexus_print_resources(). Use resource_list_print_type()
instead.
o Eliminate nexus_print_all_resources(). Inline the function body
in nexus_print_child().


163603 22-Oct-2006 alc

Eliminate unnecessary PG_BUSY tests.


163535 20-Oct-2006 des

Move more MD devices and options out of MI NOTES.


163531 20-Oct-2006 des

The VGA_DEBUG option only exists on {amd64,i386,ia64}.
Also remove 'device io' from amd64 NOTES; DEFAULTS takes care of it.


163492 19-Oct-2006 marcel

Fix previous revision:
o day and mday are the same. No need to subtract 1 from mday.
o Set dow to -1 as clock_ct_to_ts() checks this field and
returns EINVAL on any day of the week but Sunday.


163449 17-Oct-2006 davidxu

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


163386 15-Oct-2006 hrs

Add a newline to the printf().

Spotted by: Peter Carah <pete@altadena.net>
MFC after: 3 days


163058 06-Oct-2006 marcel

Include freebsd32_signal.h now that signal-related definitions are
moved there.

Found by: ia64 tinderbox.


163041 05-Oct-2006 simon

- Remove SCHED_ULE from GENERIC to better avoid foot-shooting by
unsuspecting users.
- Add a comment in NOTES about experimental status of SCHED_ULE.
- Make warning about experimental status in sched_ule(4) a bit
stronger.

Suggested and reviewed by: dougb
Discussed on: developers
MFC after: 3 days


163016 04-Oct-2006 jb

PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.


162966 02-Oct-2006 phk

Use calendrical calculations from subr_clock.c instead of home-rolled.


162958 02-Oct-2006 phk

Second part of a little cleanup in the calendar/timezone/RTC handling.

Split subr_clock.c in two parts (by repo-copy):
subr_clock.c contains generic RTC and calendaric stuff. etc.
subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c. They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162658 26-Sep-2006 ru

Added COMPAT_FREEBSD6 option.


162487 21-Sep-2006 kan

Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and __builtin_va_start was present in all GCC version 3.1 and
later.


162361 16-Sep-2006 rwatson

Add audit hooks for ppc, ia64 system call paths.

Reviewed by: marcel (ia64)
Obtained from: TrustedBSD Project
MFC after: 3 days


161675 28-Aug-2006 davidxu

Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.


161628 25-Aug-2006 alc

Eliminate unused definitions. (They came from NetBSD.)

Discussed with: cognet, grehan, marcel


161223 11-Aug-2006 jhb

First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR). Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
additional parameter (relative to pmap_mapdev()) specifying the cache
mode for this mapping. Note that on amd64 only WB mappings are done with
the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
mappings rather than WB. Previously we relied on the BIOS setting up
MTRR's to enforce memio regions being treated as UC. This might make
hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
that used pmap_mapdev() to map non-device memory (such as ACPI tables)
to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
caching mode for a range of KVA.

Reviewed by: alc


160889 01-Aug-2006 alc

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


160813 29-Jul-2006 marcel

Remove sio(4) and related options from MI files to amd64, i386
and pc98 MD files. Remove nodevice and nooption lines specific
to sio(4) from ia64, powerpc and sparc64 NOTES. There were no
such lines for arm yet.
sio(4) is usable on less than half the platforms, not counting
a future mips platform. Its presence in MI files is therefore
increasingly becoming a burden.


160801 28-Jul-2006 jhb

Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.


160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


160773 27-Jul-2006 jhb

Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
section.
- Add a new explicit check before userret() to make sure we don't return
with any locks held. The advantage here is that we can include the
syscall number and name in syscall() whereas that info is not available
in userret().
- Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by
the more general checks just added.

MFC after: 2 weeks


160770 27-Jul-2006 jhb

Add KTR_SYSC tracing to the syscall() implementations that didn't have it
yet.

MFC after: 1 week


160764 27-Jul-2006 jhb

Add missing ptrace(2) system-call stops to various syscall()
implementations.

MFC after: 1 week


160444 17-Jul-2006 marcel

Move default GEOM classes from files.ia64, where they were marked
standard, to the DEFAULTS file.


160312 12-Jul-2006 jhb

Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.


160210 09-Jul-2006 mjacob

Make the firmware assist driver resident in
preparation for isp using it.


160105 05-Jul-2006 bde

Fixed FP_R*. fp{get_set}round() apparently never worked on ia64, since
the alpha values were used and are quite different.

Fixed some style bugs by copying from the i386 version where it is better.


160040 29-Jun-2006 marcel

Partial support for branch long emulation. This only emulates the
branch long jump and not the branch long call. Support for that is
forthcoming.


159971 27-Jun-2006 alc

Make several changes to pmap_enter_quick_locked():

1. Make the caller responsible for performing pmap_install(). This reduces
the number of times that pmap_install() is performed by
pmap_enter_object() from twice per page to twice overall.

2. Don't block if pmap_find_pte() is unable to allocate a PTE. If it did
block, then it might wind up mapping a cache page. Specifically, if
pmap_enter_quick_locked() slept when called from pmap_enter_object(), the
page daemon could change an active or inactive page into a cache page just
before it was to be mapped.

3. Bail out of pmap_enter_quick_locked() if pv entries aren't plentiful.
In other words, don't force the allocation of a pv entry if they aren't
readily available.

Reviewed by: marcel@


159964 26-Jun-2006 babkin

Backed out the change by request from rwatson.

PR: kern/14584


159927 25-Jun-2006 babkin

The common UID/GID space implementation. It has been discussed on -arch
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.

PR: kern/14584
Submitted by: babkin


159916 24-Jun-2006 marcel

Update to SDM 2.2:
o Add tf (test feature) instruction,
o Add vmsw (VM switch) instruction.

While here, update copyright.

MFC after: 1 week


159909 24-Jun-2006 marcel

Sync up with SDM 2.1:
o Add nop/hint formats F16, I18, M48 and X5,
o Add format M47 for ptc.e,
o Add hint instruction,
o Fix decoding of cmp8xchg16.


159850 22-Jun-2006 marcel

Identify the cual-core Montecito.

MFC after: 3 days


159651 15-Jun-2006 netchild

Remove COMPAT_43 from GENERIC (and other kernel configs). For amd64 there's
an explicit comment that it's needed for the linuxolator. This is not the
case anymore. For all other architectures there was only a "KEEP THIS".
I'm (and other people too) running a COMPAT_43-less kernel since it's not
necessary anymore for the linuxolator. Roman is running such a kernel for a
for longer time. No problems so far. And I doubt other (newer than ia32
or alpha) architectures really depend on it.

This may result in a small performance increase for some workloads.

If the removal of COMPAT_43 results in a not working program, please
recompile it and all dependencies and try again before reporting a
problem.

The only place where COMPAT_43 is needed (as in: does not compile without
it) is in the (outdated/not usable since too old) svr4 code.

Note: this does not remove the COMPAT_43TTY option.

Nagging by: rdivacky


159627 15-Jun-2006 ups

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


159537 12-Jun-2006 imp

Add the ability to subset the devices that UART pulls in. This allows
the arm to compile without all the extras that don't appear, at least
not in the flavors of ARM I deal with. This helps us save about 100k.

If I've botched the available devices on a platform, please let me
know and I'll correct ASAP.


159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


159159 02-Jun-2006 imp

EISA bus ia64 systems don't exist in reality. I'm told they may exist in
theory, but that it was OK to remove from NOTES.

OK'd by: marcel


159148 01-Jun-2006 alc

Correct a syntax error in the previous revision.


159130 01-Jun-2006 silby

After much discussion with mjacob and scottl, change bus_dmamem_alloc so
that it just warns the user with a printf when it misaligns a piece
of memory that was requested through a busdma tag.

Some drivers (such as mpt, and probably others) were asking for alignments
that could not be satisfied, but as far as driver operation was concerned,
that did not matter. In the theory that other drivers will fall into
this same category, we agreed that panicing or making the allocation
fail will cause more hardship than is necessary. The printf should
be sufficient motivation to get the driver glitch fixed.


159093 31-May-2006 mjacob

Since it's to all intents and purposes identical
code to amd64 && i386, match the recent changes
to bus_dmamem_alloc here.


158984 27-May-2006 marcel

Unbreak after previous commit. While here, improve function naming
consistency by s/ssc/ssc_/g.


158964 26-May-2006 phk

Update to new console api.


158711 17-May-2006 marius

Add le(4). I could actually only test it on alpha, i386 and sparc64 but
given that this includes the more problematic platforms I see no reason
why it shouldn't also work on amd64 and ia64.


158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


158459 11-May-2006 marcel

Fix braino in previous commit: Don't redefine OID_AUTO to something
not equal to -1, or at all for that matter.


158450 11-May-2006 phk

Remove more straggling CPU_ macro references


158445 11-May-2006 phk

Clean out sysctl machdep.* related defines.

The cmos clock related stuff should really be in MI code.


158124 28-Apr-2006 marcel

Rewrite of puc(4). Significant changes are:
o Properly use rman(9) to manage resources. This eliminates the
need to puc-specific hacks to rman. It also allows devinfo(8)
to be used to find out the specific assignment of resources to
serial/parallel ports.
o Compress the PCI device "database" by optimizing for the common
case and to use a procedural interface to handle the exceptions.
The procedural interface also generalizes the need to setup the
hardware (program chipsets, program clock frequencies).
o Eliminate the need for PUC_FASTINTR. Serdev devices are fast by
default and non-serdev devices are handled by the bus.
o Use the serdev I/F to collect interrupt status and to handle
interrupts across ports in priority order.
o Sync the PCI device configuration to include devices found in
NetBSD and not yet merged to FreeBSD.
o Add support for Quatech 2, 4 and 8 port UARTs.
o Add support for a couple dozen Timedia serial cards as found
in Linux.


157941 21-Apr-2006 marcel

In nexus_teardown_intr(), actually remove the handler.

MFC after: 1 day


157894 20-Apr-2006 imp

Set the rid of the resource obtained from rman_reserve_resource.


157680 12-Apr-2006 alc

Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge


157449 03-Apr-2006 marcel

Improve handling of IPI_STOP:
o use atomic operations to fiddle with stopped_cpus and started_cpus.
o disable interrupts while we're waiting to be started.
o remove logic relating to cpustop_restartfunc as it's not used.


157448 03-Apr-2006 marcel

Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.


157443 03-Apr-2006 peter

Remove the unused sva and eva arguments from pmap_remove_pages().


155922 22-Feb-2006 jhb

Close some races between procfs/ptrace and exit(2):
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
stop event earlier. After we have signalled that, we set P_WEXIT and
then wait for any processes with a hold on the vmspace via PHOLD to
release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is
invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
to zero.
- Change proc_rwmem() to require that the processing read from has its
vmspace held via PHOLD by the caller and get rid of all the junk to
screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
to clear an earlier single-step simualted via a breakpoint). We only
do one to avoid races. Also, by making the EINVAL error for unknown
requests be part of the default: case in the switch, the various
switch cases can now just break out to return which removes a _lot_ of
duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug
where a LWP ptrace command could return EINVAL with the proc lock still
held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
ptrace_clear_single_step() to always be called with the proc lock
held (it was a mixed bag previously). Alpha and arm have to drop
the lock while the mess around with breakpoints, but other archs
avoid extra lock release/acquires in ptrace(). I did have to fix a
couple of other consumers in kern_kse and a few other places to
hold the proc lock and PHOLD.

Tested by: ps (1 mostly, but some bits of 2-4 as well)
MFC after: 1 week


155680 14-Feb-2006 jhb

Fix the hw.realmem sysctl. The global realmem variable is a count of
pages, not a count of bytes. The sysctl handler for hw.realmem already
uses ctob() to convert realmem from pages to bytes. Thus, on archs that
were storing a byte count in the realmem variable, hw.realmem was inflated.

Reported by: Valerio daelli valerio dot daelli at gmail dot com (alpha)
MFC after: 3 days


155553 11-Feb-2006 marcel

Correct the spinlock nesting of the idle thread of the APs before we
save the MCA state of the AP. Saving the MCA state of the AP requires
us to allocate memory, which uses sleep locks.
Now that we correct the spinlock nesting of the AP without having
schedlock, avoid calling spinlock_exit(). Instead call critical_exit()
and manually clear the MD spinlock count.

MFC after: 3 days


155455 08-Feb-2006 phk

Simplify system time accounting for profiling.

Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.


155444 07-Feb-2006 phk

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


155410 07-Feb-2006 marcel

Allocate memory for the MCA state information with M_NOWAIT. We can
get a MCA event at any moment and it may not be safe to sleep.

MFC after: 3 days


155232 02-Feb-2006 marcel

Remove devices acpi & mem, as they are in defaults already.


154958 28-Jan-2006 marcel

s/DT_IA64_PLT_RESERVE/DT_IA_64_PLT_RESERVE/


154496 18-Jan-2006 marcel

o Add missing relocations.
o Minor white-space fixups.


154491 17-Jan-2006 marcel

s/R_IA64_/R_IA_64_/g as per the ia64 psABI.


154170 10-Jan-2006 phk

Move the old BSD4.3 tty compatibility from (!BURN_BRIDGES && COMPAT_43)
to COMPAT_43TTY.

Add COMPAT_43TTY to NOTES and */conf/GENERIC

Compile tty_compat.c only under the new option.

Spit out
#warning "Old BSD tty API used, please upgrade."
if ioctl_compat.h gets #included from userland.


154128 09-Jan-2006 imp

By popular demand, move __HAVE_ACPI and __PCI_REROUTE_INTERRUPT into
param.h. Per request, I've placed these just after the
_NO_NAMESPACE_POLLUTION ifndef. I've not renamed anything yet, but
may since we don't need the __.

Submitted by: bde, jhb, scottl, many others.


154017 04-Jan-2006 phk

Use ttyalloc() instead of ttymalloc()


153955 01-Jan-2006 imp

Define __HAVE_ACPI and/or __PCI_REROUTE_INTERRUPT, as appropriate for
each platform. These will be used in the pci code in preference to
the complicated #ifdefs we have there now.


153940 31-Dec-2005 netchild

MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)

MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)


153741 26-Dec-2005 sobomax

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


153666 22-Dec-2005 jhb

Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)


153504 18-Dec-2005 marcel

Make our ELF64 type definitions match standards. In particular this
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.

MFC after: 2 weeks


153179 06-Dec-2005 jhb

- Cleanup whitespace and extra ()s in vtophys() macros.
- Move vtophys() macros next to vtopte() where vtopte() exists to match
comments above vtopte().
- Remove references to the alternate address space in the comment above
vtopte(). amd64 never had the alternate address space, and i386 lost it
prior to PAE support being added.
- s/entires/entries/ in comments.

Reviewed by: alc


153168 06-Dec-2005 ru

Drop _MACHINE_ARCH and _MACHINE defines (not to be confused with
MACHINE_ARCH and MACHINE). Their purpose was to be able to test
in cpp(1), but cpp(1) only understands integer type expressions.
Using such unsupported expressions introduced a number of subtle
bugs, which were discovered by compiling with -Wundef.


153165 06-Dec-2005 ru

Fix -Wundef warnings from compiling GENERIC and LINT kernels of
all architectures.


152865 27-Nov-2005 ru

- Allow duplicate "machine" directives with the same arguments.
- Move existing "machine" directives to DEFAULTS.


152662 21-Nov-2005 jhb

Don't enable PUC_FASTINTR by default in the source. Instead, enable it
via the DEFAULTS kernel configs. This allows folks to turn it that option
off in the kernel configs if desired without having to hack the source.
This is especially useful since PUC_FASTINTR hangs the kernel boot on my
ultra60 which has two uart(4) devices hung off of a puc(4) device.

I did not enable PUC_FASTINTR by default on powerpc since powerpc does not
currently allow sharing of INTR_FAST with non-INTR_FAST like the other
archs.


152661 21-Nov-2005 jhb

Create DEFAULTS files for alpha, ia64, powerpc, and sparc64 and move
'device mem' over from GENERIC to DEFAULTS to be consistent with i386 and
amd64. Additionally, on ia64 enable ACPI by default since ia64 requires
acpi.


152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


152359 13-Nov-2005 alc

In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock
cannot possibly occur.


152224 09-Nov-2005 alc

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


152042 04-Nov-2005 alc

Begin and end the initialization of pvzone in pmap_init().
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().

Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.

Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.

Introduce the vm.pmap.pv_entries tunable on alpha and ia64.

Eliminate some unnecessary white space.

Discussed with: tegge (item #1)
Tested by: marcel (ia64)


152002 03-Nov-2005 alc

Remove the remaining spl*() calls. Add some assertions. Eliminate some
excessive white space.


151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


151885 30-Oct-2005 marcel

Remove a stray return statement in the interrupt dispatch function
that caused a premature exit after calling a fast interrupt handler
and bypassing a much needed critical_exit() and the scheduling of
the interrupt thread for non-fast handlers. In short: unbreak :-)


151658 25-Oct-2005 jhb

Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)

Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)


151543 21-Oct-2005 ade

Specifically panic() in the case where pmap_insert_entry() fails to
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.

Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.

Reviewed by: alc, jhb
MFC after: 1 week


151388 16-Oct-2005 phk

Make ttyconsolemode() call ttsetwater() so that drivers don't have to.


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


151006 06-Oct-2005 phk

Eliminate need for __RMAN_RESOURCE_VISIBLE

Reviewed by: marcel@


150663 28-Sep-2005 rwatson

Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by: bde


150631 27-Sep-2005 peter

Implement 32 bit getcontext/setcontext/swapcontext on amd64. I've added
stubs for ia64 to keep it compiling. These are used by 32 bit apps such
as gdb.


150627 27-Sep-2005 jhb

Add a new atomic_fetchadd() primitive that atomically adds a value to a
variable and returns the previous value of the variable.

Tested on: i386, alpha, sparc64, arm (cognet)
Reviewed by: arch@
Submitted by: cognet (arm)
MFC after: 1 week


150335 19-Sep-2005 rwatson

Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after: 1 week


150270 18-Sep-2005 csjp

Introduce a kernel config for the Mandatory Access Control framework.
This kernel config briefly describes some of the major MAC policies
available on FreeBSD. The hope is that this will raise the awareness
about MAC and get more people interested.

Discussed with: scottl


150008 11-Sep-2005 alc

Eliminate unused definitions.


150003 11-Sep-2005 obrien

Canonize the include of acpi.h.


149926 10-Sep-2005 marcel

Merge db_interface.c and db_trace.c into db_machdep.c.


149925 10-Sep-2005 marcel

Move the prototypes of db_md_set_watchpoint(), db_md_clr_watchpoint()
and db_md_list_watchpoints() to ddb/ddb.h.


149924 10-Sep-2005 marcel

Move the ia32_sigcode structure from ia32_sigtramp.c to ia32_signal.c.
It's a bit excessive to have it in a file of its own.


149923 10-Sep-2005 marcel

Remove redundant $FreeBSD$


149915 09-Sep-2005 marcel

Change the High FP lock from a sleep lock to a spin lock. We can
take the lock from interrupt context, which causes an implicit
lock order reversal. We've been using the lock carefully enough
that making it a spin lock should not be harmful.


149807 05-Sep-2005 marcel

Milestone: enable SMP by default.


149806 05-Sep-2005 marcel

o In pmap_remove_pte: always invalidate the page. Previously the page
was not invalidated if the PTE was not actually being removed. In
an UP kernel this didn't cause problems, because the new mapping
would preempt the old one. In an SMP kernel this could lead to the
use of stale translations when processes move between CPUs at the
"right" moment. This fixes the last of the obvious SMP problems
and it should be safe to enable SMP by default now.
o In pmap_remove_pte: minor code refactoring to avoid duplication.
o Test all PTE pointers against NULL. Don't use implicit boolean
tests.


149777 03-Sep-2005 marcel

o s/vhpt_size/pmap_vhpt_log2size/g
o s/vhpt_base/pmap_vhpt_base/g
o s/vhpt_bucket/pmap_vhpt_bucket/g
o Declare the above in <machine/pmap.h>
o Move the vm.stats.vhpt.* sysctls to machdep.vhpt.*
o Create a tunable machdep.vhpt.log2size, with corresponding sysctl.
The tunable allows the user to specify the VHPT size from the loader.
o Don't keep track of the number of PTEs in the VHPT. Calculate the
population when necessary by iterating the buckets and summing up
the length of the buckets.
o Don't perform the tpa instruction with a bucket lock held. The
instruction can (theoretically) fault and locking is not needed.


149770 03-Sep-2005 marcel

Fix collision chain termination checks. The result of IA64_PHYS_TO_RR7
is never 0, so one cannot test for a NULL pointer after a physical
address is translated into a virtual pointer with said macro. Instead,
keep the physical address around and test it against 0. Note that
this obviously implies that a PTE can never be allocated at physical
address 0. This isn't exactly guaranteed, but hasn't been a problem
so far. We test the physical address against 0 for as long as the ia64
port exists...


149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


149337 20-Aug-2005 stefanf

Move MINSIGSTKSZ from <machine/signal.h> to <machine/_limits.h> and rename
it to __MINSIGSTKSZ. Define MINSIGSTKSZ in <sys/signal.h>.

This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.

Discussed with: bde


149062 14-Aug-2005 marcel

Remove the execute permission for stacks.


149037 13-Aug-2005 marcel

o s/pmap_lpte_/pmap_/g
o Remove pmap_is_referenced(). It was already compiled-out.


149036 13-Aug-2005 marcel

Fix the problem with the IPI for the lazy context switching of the
high FP registers. It was not that the IPI got lost due to the
perceived unreliability of the IPI delivery, but rather that the
IPI was not assigned a vector (ugh). Sending a 0 vector to a CPU
results in a stray external interrupt.
Add a KASSERT to ipi_send() to catch this. The initialization of
the IPIs could be better, but it's not at all sure what the future
of the code is. Avoid wasting a lot of time on something that is
going to be rewritten anyway.


148807 06-Aug-2005 marcel

Improve SMP support:
o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
uses to look up translations it can't find in the TLB. As such,
the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
and best results are obtained when it's not shared between CPUs.
The collision chain (i.e. the hash bucket) is shared between CPUs,
as all buckets together constitute our collection of PTEs. To
achieve this, the collision chain does not point to the first PTE
in the list anymore, but to a hash bucket head structure. The
head structure contains the pointer to the first PTE in the list,
as well as a mutex to lock the bucket. Thus, each bucket is locked
independently of each other. With at least 1024 buckets in the VHPT,
this provides for sufficiently finei-grained locking to make the
ssolution scalable to large SMP machines.
o Add synchronisation to the lazy FP context switching. We do this
with a seperate per-thread lock. On SMP machines the lazy high FP
context switching without synchronisation caused inconsistent
state, which resulted in a panic. Since the use of the high FP
registers is not common, it's possible that races exist. The ia64
package build has proven to be a good stress test, so this will
get plenty of exercise in the near future.
o Don't use the local ID of the processor we want to send the IPI to
as the argument to ipi_send(). use the struct pcpu pointer instead.
The reason for this is that IPI delivery is unreliable. It has been
observed that sending an IPI to a CPU causes it to receive a stray
external interrupt. As such, we need a way to make the delivery
reliable. The intended solution is to queue requests in the target
CPU's per-CPU structure and use a single IPI to inform the CPU that
there's a new entry in the queue. If that IPI gets lost, the CPU
can check it's queue at any convenient time (such as for each
clock interrupt). This also allows us to send requests to a CPU
without interrupting it, if such would be beneficial.

With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.

The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.


148805 06-Aug-2005 marcel

Reduce the default MAXCPU from 16 to 4. This is in preparation of
allocating a VHPT per CPU. Since we don't yet know how many CPUs
are actually in the system at the time we need to allocate the
VHPTs, we allocate for MAXCPU processors. This can result in a
lot of wasted space for 2-way machines. So, for now, limit MAXCPU
to something smaller until we have something more dynamic.


148804 06-Aug-2005 marcel

For ia64_ptc_{e,g,ga,l}(), use instruction serialization. We
typically don't know what the TLB described and need to assume
that it affects the fetching of instructions.


148666 03-Aug-2005 jeff

- Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
Concept code from: Neal Fachan <neal@isilon.com>


148067 15-Jul-2005 jhb

Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables. This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after: 3 days
Tested on: i386, alpha, sparc64
Compiled on: ia64, powerpc, amd64
Kernel toolchain busted on: arm


147991 14-Jul-2005 kensmith

Add recently invented COMPAT_FREEBSD5 option.

MFC after: 3 days


147889 10-Jul-2005 davidxu

Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)


147773 05-Jul-2005 marcel

Enhance ia64_flush_dirty() to handle the case in which td != curthread.
This case is triggered with ptrace(2) and the PT_SETREGS function.
Change the return type of the function to int so that errors can be
passed on to the caller.

Approved by: re (scottl)


147745 02-Jul-2005 marcel

Implement functions calls from within DDB on ia64. On ia64 a function
pointer doesn't point to the first instruction of that function, but
rather to a descriptor. The descriptor has the address of the first
instruction, as well as the value of the global pointer. The symbol
table doesn't know anything about descriptors, so if you lookup the
name of a function you get the address of the first instruction. The
cast from the address, which is the result of the symbol lookup, to a
function pointer as is done in db_fncall is therefore invalid.
Abstract this detail behind the DB_CALL macro. By default DB_CALL is
defined as db_fncall_generic, which yields the old behaviour. On ia64
the macro is defined as db_fncall_ia64, in which a descriptor is
constructed to yield a valid function pointer.

While here, introduce DB_MAXARGS. DB_MAXARGS replaces the existing
(local) MAXARGS. The DB_MAXARGS macro can be defined by platforms to
create a convenient maximum. By default this will be the legacy 10.
On ia64 we define this macro to be 8, for 8 is the maximum number of
arguments that can be passed in registers. This avoids having to
implement spilling of arguments on the memory stack.

Approved by: re (dwhite)


147740 02-Jul-2005 marcel

Fix a buglet that was present in the ia64 code and that got inherited
by amd64 and i386: For buffered writes we collect data and write it
out a ${DEV_BSIZE}-sized block at a time. The fragsz variable is used
to keep track of how much data we have collected in the buffer so far
and it's reset to zero immediately after writing a block to the dump
device.
When the last, possibly partially filled buffer is flushed, we didn't
reset fragsz to 0 and as such would stop reflecting reality. Since we
currently only need to do buffered writes once, this isn't a problem.
However, when kernel dumps are made by hand (say by callling doadump
from within DDB), the improperly cleared state from the first call to
dumpsys causes the next call to dumpsys to create an invalid code file.
This change resets fragsz after flushing the partially filled buffer so
that it fixes the two problems at once.

Approved by: re (scottl)


147692 30-Jun-2005 peter

Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
sscanf fails. They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note
that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet. We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by: re


147640 27-Jun-2005 marcel

Handle B-unit break instructions. The break.b is unique in that the
immediate is not saved by the architecture. Any of the break.{mifx}
instructions have their immediate saved in cr.iim on interruption.
Consequently, when we handle the break interrupt, we end up with a
break value of 0 when it was a break.b. The immediate is important
because it distinguishes between different uses of the break and
which are defined by the runtime specification.
The bottomline is that when the GNU debugger replaces a B-unit
instruction with a break instruction in the inferior, we would not
send the process a SIGTRAP when we encounter it, because the value
is not one we recognize as a debugger breakpoint.

This change adds logic to decode the bundle in which the break
instruction lives whenever the break value is 0. The assumption
being that it's a break.b and we fetch the immediate directly out
of the instruction. If the break instruction was not a break.b,
but any of break.{mifx} with an immediate of 0, we would be doing
unnecessary work. But since a break 0 is invalid, this is not a
problem and it will still result in a SIGILL being sent to the
process.

Approved by: re (scottl)


147639 27-Jun-2005 marcel

Replace the existing copyright notice with my own. Over the years I've
changed this file so much that it's equivalent to a rewrite, and I'm not
talking about any of the cosmetic changes of course.

Approved by: re (scottl)


147638 27-Jun-2005 marcel

Cosmetic: s/u_int64_t/uint64_t/g

Approved by: re (scottl)


147504 20-Jun-2005 obrien

Add .cvsignore files just like in sys/<arch>/compiled, this keeps CVS from
questing kernel config files not in CVS.

Approved by: re(kensmith)


147322 12-Jun-2005 marcel

Define IPI_PREEMPT. Update a nearby comment while I'm here.


147217 10-Jun-2005 alc

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


147191 09-Jun-2005 jkoshy

MFP4:

- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.

- bug fixes & documentation.


146794 29-May-2005 marcel

Create nexus in configure_first() instead of in configure(). This
makes sure that sysinit tasks that run after configure_first(),
but before configure() have a nexus to hang devices off.


146791 29-May-2005 marcel

Call cninit_finish() in configure_final().


146734 29-May-2005 nyan

Remove bus_{mem,p}io.h and related code for a micro-optimization on i386
and amd64. The optimization is a trivial on recent machines.

Reviewed by: -arch (imp, marcel, dfr)


146214 14-May-2005 nyan

- Move bus dependent defines to {isa,cbus}_dmareg.h.
- Use isa/isareg.h rather than <arch>/isa/isa.h.

Tested on: i386, pc98


146037 10-May-2005 marcel

Don't define _MACHINE_BUS_MEMIO_H_ nor _MACHINE_BUS_PIO_H_.


145433 23-Apr-2005 davidxu

Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.


145389 22-Apr-2005 marcel

Sanity the RTC code:
o Remove the clock interface. Not only does it conflict with the MI
version when device genclock is added to the kernel, it was also
not possible to have more than 1 clock device. This of course would
have been a problem if we actually had more than 1 clock device.
In short: we don't need a clock interface and if we do eventually,
we should be using the MI one.
o Rewrite inittodr() and resettodr() to take into account that:
1) We use the EFI interface directly.
2) time_t is 64-bit and we do need to make sure we can determine
leap years from year 2100 and on. Add a nice explanation of
where leap years come from and why.
3) This rewrite happened in 2005 so any date prior to 1/1/2005
(either M/D/Y or D/M/Y) is bogus. Reprogram the EFI clock with
1/1/2005 in that case.
4) The EFI clock has a high probability of being correct, so
only (further) correct the EFI clock when the file system time
is larger. That should never happen in a time-synchronised world.
Complain when EFI lost 2 days or more.

Replace the copyright notice now that I (pretty much) rewrote all of
this file.


145332 20-Apr-2005 marcel

Add empty header (except of the multiple-inclusion protection) to
get hwpmc(4) to compile on this platform.


145253 18-Apr-2005 imp

Break out the definition of bus_space_{tag,handle}_t and a few other types
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).

Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb


145173 16-Apr-2005 marcel

Add a kpte command to DDB. It dumps the PTE of a KVA. This helps
to analyze faults and TLB/VHPT inconsistencies.


145137 16-Apr-2005 marcel

Return better "error" values for UWX_BOTTOM and UWX_ABI_FRAME in
unw_step(). Both errors denote the end of a stack trace (i.e. no
prior frame), but are otherwise not error conditions.
Have db_trace() return 0 when the trace ends due to one of these
return codes as they are really normal termination conditions.

This change especially improves the output of the "show thread"
command in DDB when there are threads in fork_trampoline() and
previously db_trace() would return an error, causing the show
command to emit '***'.


145092 15-Apr-2005 marcel

Initialize curthread before we save the APs MCA state. Saving the
MCA state requires a spin lock, which requires a valid curthread.
This change allows SMP kernels to boot into multi-user again.

While here, update the copyright notice and use __FBSDID for the
revision string.


144971 12-Apr-2005 jhb

Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.


144962 12-Apr-2005 marcel

Dot the i's:
1 Move the debug.clock_adjust_* sysctls to debug.clock.adjust_* to
make it easier to get only the clock statistics.
2 Make the sysctls read-only [suggested by Marius].
3 When determining the new clock adjustment, we checked for an error
either larger than 12.5% or smaller than 12.5%. We left out an error
of exactly 12.5%. For errors larger than 12.5% we adjust the clock
reload value in such a way that the next clock interrupt would be
early (as in premature). For errors less than 12.5% we stopped the
adjustment.
The current algorithm doesn't benefit from excluding an error of
exactly 12.5%. Change the code to stop adjusting the clock if the
error is *not* larger than 12.5% [suggested by Marius].

Discussed with: marius@


144637 04-Apr-2005 jhb

Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more


143985 22-Mar-2005 sobomax

Add USB Communication Device Class Ethernet driver. Originally written for
FreeBSD based on aue(4) it was picked by OpenBSD, then from OpenBSD ported
to NetBSD and finally NetBSD version merged with original one goes into
FreeBSD.

Obtained from: http://www.gank.org/freebsd/cdce/
NetBSD
OpenBSD


143867 20-Mar-2005 njl

s/SLIST/STAILQ to catch up with changes to resource lists.

Missed by: imp


143809 18-Mar-2005 murray

Add a comment to note that pseudo-device bpf is required for DHCP.
This is mentioned in the Handbook but it is not as obvious to new
users why bpf is needed compared to the other largely self-explanatory
items in GENERIC.

PR: conf/40855
MFC after: 1 week


143796 18-Mar-2005 iedowse

Split configure() into 3 separate steps like we do on other
architectures. This makes it possible to insert hooks before and
after the device attachment step.

Tested thanks to: marcel


143598 14-Mar-2005 scottl

Refactor the bus_dma header files so that the interface is described in
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.


143202 07-Mar-2005 scottl

Remove dead code.


143063 02-Mar-2005 joerg

netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.

Submitted by: netchild
Reviewed by: various developers on arch@, some time ago


143057 02-Mar-2005 marcel

Make sure fpswa_iface equals NULL when bootinfo.bi_fpswa equals 0.
We need to be able to test for the (possible) non-existence of the
FPSWA code.

PR: ia64/77591
Submitted by: Christian Kandeler (christian dot kandeler at hob dot de)
MFC after: 1 day


142956 01-Mar-2005 wes

Attempt to doff the pointy hat: implement 'hw.realmem' on remaining
architectures. Pointed out by O'Brien, ScottL via email.

Reviewed by: obrien (various)


142436 25-Feb-2005 delphij

Remove acpi_perf from {ARCH}/conf/NOTES, to make tinderbox happy.

Reported by: tinderbox
Inspired by: acpi_perf build structure removal commit


142107 19-Feb-2005 ru

Use a common multi-inclusion protection, and add such a
protection to alpha/include/exec.h.


141556 09-Feb-2005 marcel

s/descr/oid_descr/


141391 06-Feb-2005 phk

Since we are quite unlikely to ever face another platform which
uses the i8237 without trying to emulate the PC architecture move
the register definitions for the i8237 chip into the central include
file for the chip, except for the PC98 case which is magic.

Add new isa_dmatc() function which tells us as cheaply as possible
if the terminal count has been reached for a given channel.


141378 06-Feb-2005 njl

Finish the job of sorting all includes and fix the build by including
malloc.h before proc.h on sparc64. Noticed by das@

Compiled on: alpha, amd64, i386, pc98, sparc64


141369 05-Feb-2005 njl

Build cpufreq and acpi_perf on platforms that are likely to be able to
use them.


141248 04-Feb-2005 marcel

Include sys/bus.h before sys/cpu.h. The latter needs device_t.


141237 04-Feb-2005 njl

Add an implementation of cpu_est_clockrate(9). This function estimates the
current clock frequency for the given CPU id in units of Hz.


141085 31-Jan-2005 imp

nit in /*-


140891 27-Jan-2005 marcel

Fix handling of post increment: Either the first or second operand
is the register with the memory address, and it's that register's
value we need to increment or decrement.

MFC after: 3 days


140418 18-Jan-2005 scottl

Fix compile errors. Bah.


140315 15-Jan-2005 scottl

Fix an assignment that I missed in the last commit.


140311 15-Jan-2005 scottl

Add bus_dmamap_load_mbuf_sg() to ia64


140256 14-Jan-2005 jhb

- Remove some OBE comments regarding cpu_exit(). cpu_exit() is no longer
the last action of kern_exit(). Instead, it is a MD callout to cleanup
per-process state during exit.
- Add notes of concern to Alpha and ia64 about the possible need to drop
fp state in cpu_thread_exit() rather than in cpu_exit() since it is
per-thread state rather than per-process.


139790 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


139554 02-Jan-2005 marcel

Further enhance the handling of misaligned loads and stores:
o implement double-extended and single precision loads and stores,
o implement double precision stores,
o replace the machdep.unaligned_print sysctl with debug.unaligned_print
and change the default value to 0,
o replace the machdep.unaligned_sigbus sysctl with debug.unaligned_test,
o Remmove the fillfd() function. The function is trvial enough for
inline assembly.

The debug.unaligned_test sysctl is used to test the emulation of
misaligned loads and stores. When PSR.ac is 0, the CPU will handle
misaligned memory accesses itselfi and we don't get an exception
for it. When PSR.ac is 1, the process needs to be signalled and we
should not emulate. The sysctl takes effect when PSR.ac is 1 and
tells us that we should emulate and not send a signal.

PR: 72268
MFC after: 1 week


139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


138752 12-Dec-2004 marcel

Fix the last of the instability and the cause of the annoying
"vm_fault: fault on nofault entry, addr: %lx" panic. The problem was a
stale PTE in the TLB that marked the page as not present, even though
we had a good PTE in the VHPT. We typically don't yet insert PTEs in
the TLB. We do that lazily. The CPU will look for the PTE in the VHPT
when there's no PTE in the TLB. Unfortunately this doesn't handle the
case of the stale PTE in the TLB. The quick fix is to invalidate the
TLB (sloppily) when the VHPT doesn't contain a valid PTE. This is also
the only case that may cause a PTE in the TLB that marks a page as
non-present.


138674 11-Dec-2004 marcel

Use primitive types to avoid creating an artificial header dependency:
o s/u_long/unsigned long/
o s/uint32_t/unsigned int/g
o s/uint64_t/unsigned long/g

Trigger case: multimedia/mpeg2codec


138543 08-Dec-2004 marcel

Don't obtain the HCDP address directly from the bootinfo structure.
Use a function to keep the details at arms length from uart(4).


138253 01-Dec-2004 marcel

Change gdb_cpu_setreg() to not take the value to which to set the
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.

This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.

Tested on: alpha, amd64, i386, ia64, sparc64


138145 28-Nov-2004 marcel

Whitespace fixes:
o Remove a bogus comment that relates to alpha.
o s/u_int64_t/uint64_t/g
o Add bi_spare2 to make the internal padding explicit.
o Move BOOTINFO_MAGIC after the field it applies to.


138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


137978 21-Nov-2004 marcel

Remove struct ia64_itir and use a plain old uint64_t instead.


137914 20-Nov-2004 das

Remove UAREA_PAGES.

Reviewed by: arch@


137912 20-Nov-2004 das

U areas are going away, so don't allocate one for process 0.

Reviewed by: arch@


137906 20-Nov-2004 das

user.h is included only to get pcb.h, so use the latter directly instead.


137708 14-Nov-2004 marcel

Remove the BR tag. When the machine doesn't have the DIG64 HCDP
table with console settings, we now only need to know at which
address the UART lives. Leaving the baudrate unspecified results
in us using the baudrate at which the UART operates. This removes
one parameter that can interfere with a successful installation
out of the box.


137117 01-Nov-2004 jhb

- Change the ddb paging "support" to use a variable (db_lines_per_page) to
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.

MFC after: 1 month
Inspired by: kris


136809 23-Oct-2004 phk

Use bioq_takefirst()


136680 18-Oct-2004 phk

Add new function ttyinitmode() which sets our systemwide default
modes on a tty structure.

Both the ".init" and the current settings are initialized allowing
the function to be used both at attach and open time.

The function takes an argument to decide if echoing should be enabled.
Echoing should not be enabled for regular physical serial ports
unless they are consoles, in which case they should be configured
by ttyconsolemode() instead.

Use the new function throughout.


136521 14-Oct-2004 njl

Print flags in the nexus for child devices.


136366 11-Oct-2004 njl

Move the code for halting the CPU (acpi_cpu_c1) into machdep files.
This removes the last MD portion of acpi_cpu.c.

MFC after: 2 weeks


136183 06-Oct-2004 marcel

Add the Madison II, which is the second generation Madison. The Madison II
is model 2 in the Itanium 2 family and has up to 9MB of L3 cache and clocks
higher than 1.5Ghz. There's no LV variant AFAICT.


136070 03-Oct-2004 alc

The physical address stored in the vm_page is page aligned. There is no
need to mask off the page offset bits. (This operation made some sense
prior to i386/i386/pmap.c revision 1.254 when we passed a physical address
rather than a vm_page pointer to pmap_enter().)


136050 02-Oct-2004 alc

Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These
uses predate the change in the pmap_enter() interface that replaced the
page's physical address by the address of its vm_page structure. The
PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page
structure that was being passed in.


135832 26-Sep-2004 marcel

...And fix WITNESS builds: declare syscallnames.


135798 26-Sep-2004 marcel

Fix INVARIANTS build: Include <machine/cpu.h>.


135783 25-Sep-2004 marcel

Move the IA-32 trap handling from trap() to ia32_trap(). Move the
ia32_syscall() function along with it to ia32_trap.c. When COMPAT_IA32
is not defined, we'll raise SIGEMT instead.


135590 23-Sep-2004 marcel

Redefine a PTE as a 64-bit integral type instead of a struct of
bit-fields. Unify the PTE defines accordingly and update all
uses.


135589 22-Sep-2004 marcel

s/u_int#_t/uint#_t/g


135583 22-Sep-2004 marcel

For the atomic_{add|clear|set|subtract} family of inlines, return the
old or previous value instead of void. This is not as is documented
in atomic(9), but is API (and ABI) compatible and simply makes sense.
This feature will primarily be used for atomic PTE updates in PMAP/ng.


135581 22-Sep-2004 marcel

MFp4: various style fixes, including
o s/u_int/uint/g
o s/#define<sp>/#define<tab>/g
o indent macro definitions
o Improve vertical spacing
o Globally align line continuation character


135529 20-Sep-2004 jhb

- Add support for "paging" in stack trace output. That is, when you do
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.

MFC after: 1 month
Tested on: i386, alpha, sparc64


135453 19-Sep-2004 marcel

MFp4:
Completely remove the remaining EFI includes and add our own (type)
definitions instead. While here, abstract more of the internals by
providing interface functions.


135443 18-Sep-2004 alc

Release the page queues lock earlier in pmap_protect() and pmap_remove() in
order to reduce contention.


135405 17-Sep-2004 marcel

Provide our own FPSWA definitions, instead of depending on the Intel
EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers
conflict with the Intel ACPI headers (duplicate type definitions), so
are being phased out in the kernel.


135403 17-Sep-2004 marcel

Remove useless inclusion of <machine/fpu.h>


135262 15-Sep-2004 phk

Add new a function isa_dma_init() which returns an errno when it fails
and which takes a M_WAITOK/M_NOWAIT flag argument.

Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.

Problem uncovered by: tegge


135096 12-Sep-2004 marcel

Catch up with other platforms: switch the default scheduler to 4BSD.


134934 08-Sep-2004 scottl

Fix a problem with tag->boundary inheritence that has existed since day one
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.

This is a MT5 candidate.

Reviewed by: marcel
Submitted by: sos (in part)


134928 08-Sep-2004 marcel

Sync the busdma code with i386. The most tangible upshot is that
the alignment and boundary constraints are being respected, which
fixes the reported ATA problems with SiI chips.
I consider the busdma implementation worrisome nonetheless. Not
only is there too much MI code duplicated in MD files, there's a
lot of questionable code. I smell a wholesale, cross-platform
overhaul coming...

MT5 candidate.


134791 05-Sep-2004 julian

Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week


134648 02-Sep-2004 marcel

Add aac(4) and aacp(4). The driver is 64-bit clean for roughly a year
now and has been mentioned on the freebsd-ia64 list.


134571 31-Aug-2004 julian

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


134568 31-Aug-2004 julian

Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after: 2 days


134509 30-Aug-2004 alc

Remove unnecessary check for curthread == NULL.


134502 30-Aug-2004 marcel

s/ENTRY/ENTRY_NOPROFILE/g for particular functions that do not follow
the C calling convention or are otherwise not regular functions. This
allows us to boot a profiling kernel.


134410 27-Aug-2004 marcel

Catch up with the drive-by renaming of IA32 to COMPAT_IA32. Missed
11 days ago when all the other places were fixed and finally caught
by the tinderbox run...


134398 27-Aug-2004 marcel

Move the kernel-specific logic to adjust frompc from MI to MD. For
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
instruction of a functions implementation. It holds the address
of a function descriptor. Hence the user(), btrap(), eintr() and
bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
be layed-out contiguously. This can not be achieved on ia64 and is
generally just bad programming.

The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.

This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...

Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386


134393 27-Aug-2004 alc

The machine-independent parts of the virtual memory system always pass a
valid pmap to the pmap functions that require one. Remove the checks for
NULL. (These checks have their origins in the Mach pmap.c that was
integrated into BSD. None of the new code written specifically for
FreeBSD included them.)


134383 27-Aug-2004 andre

Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over. This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.


134289 25-Aug-2004 marcel

Get a step closer to profiling the kernel by fixing the definitions
of the MCOUNT_ENTER, MCOUNT_EXIT and MCOUNT_DECL defines. Also make
sure there's a prototype of _MCOUNT_DECL(). This allows us to build
a kernel. There are still unresolved symbols, so linking fails.


134287 25-Aug-2004 marcel

Make profiling actually work. The gcc compiler emits a call to the
_mcount() stub when profiling is enabled. Emit this code sequence
for assembly routines as welli (MCOUNT definition in <machine/asm.h>.
We do not pass the GOT entry however as the 4th argument, because it's
not used. The _mcount() stub calls __mcount(), which does the actual
work. Define _MCOUNT_DECL to define __mcount. We do not have an
implementation of mcount(), so we define MCOUNT as empty, but have a
weak alias to _mcount() in _mcount.S.
Note that the _mcount() stub in the kernel is slightly different from
the stub in userland. This is because we do not have to worry about
nested routines in the kernel.


134263 24-Aug-2004 njl

Catch up with i386 nexus.c rev 1.59: add bus_get_resource_list().


134261 24-Aug-2004 obrien

sr(4) definately won't work on IA64.


133888 16-Aug-2004 arun

The existing code fails some corner cases. Replace it with
ia64_bsp_adjust() which has been tested to work in all cases for
arbitrary (bsp, nslots) combinations.

reviewed by: marcel@


133879 16-Aug-2004 marcel

As I said: the previous commit was untested... Remove an #endif which
should have ceased to exist when its corresponding #if was removed.


133878 16-Aug-2004 marcel

Catch up with the drive-by renaming of IA32 to COMPAT_IA32. It must
have been rush hour...

While here, move COMPAT_IA32 from opt_global.h to opt_compat.h like on
amd64. Consequently, it's unsafe to use the option in pcb.h. We now
unconditionally have the ia32 specific registers in the PCB.

This commit is untested.


133875 16-Aug-2004 arun

ITC.{i,d} instructions use format M41 not M42.

reviewed by: marcel@


133711 14-Aug-2004 marcel

Allocate memory in the unwinder with M_NOWAIT. We may need to provide
backtraces with locks held.


133472 11-Aug-2004 marcel

In set_regs(), flush the dirty registers onto the backingstore before
we update the registers. That way we don't have any dirty registers to
worry about and also know that bsp=bspstore, which makes updating the
RSE related registers predictable.
This is not the end of it. We need more validity checks, but for now
this allows us to complete the gdb testsuite without crashing the
kernel.


133464 11-Aug-2004 marcel

Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc


133405 09-Aug-2004 marcel

Better preserve the original protection for the mappings we maintain.
The hardware always gives read access for privilege level 0, which
means that we cannot use the hardware access rights and privilege
level in the PTE to test whether there's a change in protection. So,
we save the original vm_prot_t in the PTE as well.
Add pmap_pte_prot() to set the proper access rights and privilege
level on the PTE given a pmap and the requested protection.

The above allows us to compare the protection in pmap_extract_and_hold()
which was missing. While in pmap_extract_and_hold(), add pmap locking.

While here, clean up most (i.e. all but one) PTE macros we inherited
from alpha. They were either unused, used inconsistently, badly named
or simply weren't beneficial. We save the wired and managed state of
the PTE in distinct (bit) fields.

While in pte.h, s/u_int64_t/uint64_t/g

pmap locking obtained from: alc@
feedback & review by: alc@


133291 08-Aug-2004 marcel

Implement single stepping when we leave the kernel through the EPC syscall
path. The basic problem is that we cannot set the single stepping flag
directly, because we don't leave the kernel via an interrupt return. So,
we need another way to set the single stepping flag.
The way we do this is by enabling the lower-privilege transfer trap, which
gets raised when we drop the privilege level. However, since we're still
running in kernel space (sec), we're not yet done. We clear the lower-
privilege transfer trap, enable the taken-branch trap and continue exiting
the kernel until we branch into user space.
Given the current code, there's a total of two traps this way before
we can raise SIGTRAP.


133286 07-Aug-2004 marcel

Slightly move labels around to make sure we call ast() on our way out
after a fork(2) in fork_trampoline(). By moving the epc_syscall_return
label immediately before the call to do_ast() in epc_syscall(), we not
only achieve that but also handle the detour through exception_return
when the frame corresponds to an asynchronous kernel entry. Hence, we
simplified fork_trampoline() as a side-effect.


133285 07-Aug-2004 marcel

De-inline gdb_cpu_signal() because we need to convert the trap vectors
related to breakpoints and single stepping into SIGTRAP so gdb(1) knows
why the remote target has stopped. In particular, gdb(1) needs to know
if the reason is something of its own doing.


133135 04-Aug-2004 arun

Use a 256MB TR instead of a 64MB TR to make sure that the kernel
text/data are covered on APs. This enables the kernel to boot on
a 4 way Intel Itanium-2 platform. This has a secondary effect of
keeping the TRs identical on BP and the APs.

reviewed by: marcel@


133087 03-Aug-2004 markm

Making a loadable null.ko for /dev/(null|zero) proved rather
unpopular, so remove this (mis)feature.

Encouragement provided by: jhb (and others)


133084 03-Aug-2004 mux

Instead of calling ia32_pause() conditionally on __i386__ or __amd64__
being defined, define and use a new MD macro, cpu_spinwait(). It only
expands to something on i386 and amd64, so the compiled code should be
identical.

Name of the macro found by: jhb
Reviewed by: jhb


133023 02-Aug-2004 marcel

Fix 2 typos in previous commit: both s/strct/struct/


133020 02-Aug-2004 marcel

Add the mem and null devices now that they are optional.


133019 02-Aug-2004 marcel

Sort the miscellaneous devices to restore ordering after the insertion
of the mem and null devices.


132965 01-Aug-2004 markm

Remove extraneous ';'.


132956 01-Aug-2004 markm

Break out the MI part of the /dev/[k]mem and /dev/io drivers into
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.


132897 30-Jul-2004 alc

- Add pmap locking to ia64's pmap_enter() and pmap_enter_quick(). (This
brings ia64 to parity with alpha, amd64, and i386 in this area.)
- Prevent a race in pmap_find_pte(): If pmap_find_pte() sleeps in
uma_zalloc(), another thread could allocate a pte at the same address.
Instead, sleep at a higher level and retry the lookup before retrying
the allocation.

Reviewed and tested by: marcel@


132871 30-Jul-2004 marcel

Fix -O builds with gcc 3.4 by defining ffs as __builtin_ffs instead of
creating an inline function that just calls __builtin_ffs.


132808 28-Jul-2004 phk

Move a relic to its correct location(s): Put nfs diskless initialization
calls with the code they call. (Yet another example of mindless copy&paste).


132700 27-Jul-2004 rwatson

Pass a thread argument into cpu_critical_{enter,exit}() rather than
dereference curthread. It is called only from critical_{enter,exit}(),
which already dereferences curthread. This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.

Head nodding: jhb, bmilekic


132626 25-Jul-2004 marcel

Work-around a gcc code generation bug for function descriptors
references (target/16559). This fixes SMP configurations.

Obtained from: arun@


132522 22-Jul-2004 alc

In pmap_mincore() create a private copy of the pte for use after the pmap
lock is released.


132487 21-Jul-2004 alc

Additional pmap locking

Tested by: marcel@


132482 21-Jul-2004 marcel

Unify db_stack_trace_cmd(). All it did was look up the thread given
the thread ID and call db_trace_thread().
Since arm has all the logic in db_stack_trace_cmd(), rename the
new DB_COMMAND function to db_stack_trace to avoid conflicts on
arm.
While here, have db_stack_trace parse its own arguments so that
we can use a more natural radix for IDs. If the ID is not a thread
ID, or more precisely when no thread exists with the ID, try if
there's a process with that ID and return the first thread in it.
This makes it easier to print stack traces from the ps output.

requested by: rwatson@
tested on: amd64, i386, ia64


132383 19-Jul-2004 das

Make FLT_ROUNDS correctly reflect the dynamic rounding mode.


132378 19-Jul-2004 alc

Add partial pmap locking.

Tested by: marcel@


132235 16-Jul-2004 alc

Remove unused fields from the pmap.


132226 15-Jul-2004 phk

Preparation commit for the tty cleanups that will follow in the near
future:

rename ttyopen() -> tty_open() and ttyclose() -> tty_close().

We need the ttyopen() and ttyclose() for the new generic cdevsw
functions for tty devices in order to have consistent naming.


132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


132170 15-Jul-2004 alc

A loop in pmap_remove() should use TAILQ_FOREACH_SAFE(), not
TAILQ_FOREACH(), because the loop deletes elements from the list.

Reviewed by: marcel@


132088 13-Jul-2004 davidxu

Add ptrace_clear_single_step(), alpha already has it for years, the function
will be used by ptrace to clear a thread's single step state.


132085 13-Jul-2004 alc

Simplify pmap_protect().


132082 13-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_remove_pages(). (The implementation of pmap_remove_pages() is
optional. If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().


131969 11-Jul-2004 marcel

Add options KDB and GDB. KDB takes on the function of what DDB used
to be. Both DDB and GDB specify which KDB backends to include.


131960 11-Jul-2004 marcel

Remove the now unused GDB stubs. See src/sys/gdb/* for the new KDB
backend.


131952 10-Jul-2004 marcel

Mega update for the KDB framework: turn DDB into a KDB backend.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.

The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.

With this change, ia64 has support for breakpoints.


131945 10-Jul-2004 marcel

Update for the KDB framework:
o ksym_start and ksym_end changed type to vm_offset_t.
o Make debugging support conditional upon KDB instead of DDB.
o Call kdb_enter() instead of breakpoint().
o Remove implementation of Debugger().
o Call kdb_trap() according to the new world order.

unwinder:
o s/db_active/kdb_active/g
o Various s/ddb/kdb/g
o Add support for unwinding from the PCB as well as the trapframe.
Abuse a spare field in the special register set to flag whether
the PCB was actually constructed from a trapframe so that we can
make the necessary adjustments.

md_var.h:
o Add RSE convenience macros.
o Add ia64_bsp_adjust() to add or subtract from BSP while taking
NaT collections into account.


131905 10-Jul-2004 marcel

Implement makectx(). The makectx() function is used by KDB to create
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.


131903 10-Jul-2004 marcel

Introduce the KDB debugger frontend. The frontend provides a framework
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.


131899 10-Jul-2004 marcel

Introduce the GDB debugger backend for the new KDB framework. The
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.


131840 08-Jul-2004 brian

Change the following environment variables to kernel options:

bootp -> BOOTP
bootp.nfsroot -> BOOTP_NFSROOT
bootp.nfsv3 -> BOOTP_NFSV3
bootp.compat -> BOOTP_COMPAT
bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit. It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone


131838 08-Jul-2004 marcel

MFamd64 (1.275):
Reduce the scope of the Giant lock being held for non-mpsafe syscalls.
There was way too much code being covered.


131821 08-Jul-2004 marcel

Better handle the break instruction trap. The runtime specification
has outlined which break numbers are software interrupts, debugger
breakpoints and ABI specific breaks. We mostly treated all break
numbers we didn't care about as debugger breakpoints.


131814 08-Jul-2004 brian

Change the following kernel options to environment variables:

BOOTP -> bootp
BOOTP_NFSROOT -> bootp.nfsroot
BOOTP_NFSV3 -> bootp.nfsv3
BOOTP_COMPAT -> bootp.compat
BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

bootp="YES"
bootp.nfsroot="YES"
bootp.nfsv3="YES"
bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.


131662 05-Jul-2004 alc

- Correct pmap_extract()'s return type. It should be vm_paddr_t, not
vm_offset_t.
- Convert pmap_extract() to the ANSI style of declaration.


131481 02-Jul-2004 jhb

Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by: scottl (with his re@ hat)


131381 30-Jun-2004 marcel

Unbreak build: define __RMAN_RESOURCE_VISIBLE
See also src/sys/sys/rman.h rev. 1.21.


131312 30-Jun-2004 njl

Add machdep quirks functions. On i386, this disables acpi on systems with
BIOS dates earlier than Jan 1, 1999. Add prototypes and quirks flags.


130965 23-Jun-2004 alc

- Remove unused definitions.
- Move a definition inside the scope of a #ifdef _KERNEL.


130764 20-Jun-2004 bde

Backed out previous commit. Blind substitution of dev_t by `struct cdev *'
was just wrong here because the dev_t's are user dev_t's.


130742 19-Jun-2004 alc

Remove dead code related to pv entry allocation.

Reviewed by: marcel@


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130362 11-Jun-2004 alc

Neither pmap_enter() nor pmap_enter_quick() should create pv entries for
unmanaged pages.

Tested by: marcel@


130344 11-Jun-2004 phk

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


130338 11-Jun-2004 alc

Reduce the number of preallocated pv entries and lpte entries in
pmap_init().

Tested by: marcel@


130077 04-Jun-2004 phk

Machine generated patch which changes linedisc calls from accessing
linesw[] directly to using the ttyld...() functions

The ttyld...() functions ar inline so there is no performance hit.


130028 03-Jun-2004 tjr

Remove checks for curthread == NULL - it can't happen.


130025 03-Jun-2004 phk

Add missing <sys/module.h> instances which were shadowed by the nested
include in <sys/kernel.h>


130023 03-Jun-2004 tjr

Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by: jhb


129944 01-Jun-2004 phk

Gainfully employ the new ttyioctl in the trivial cases.


129750 26-May-2004 tmm

Retire cpu_sched_exit(); it is not used any more.


129444 19-May-2004 bde

Moved most of the "MI" definitions and declarations from <machine/profile.h>
to <sys/gmon.h>. Cleaned them up a little by not attempting to ifdef
for incomplete and out of date support for GUPROF in userland, as in
the sparc64 version.


129393 18-May-2004 stefanf

<stdint.h> should define WINT_M{AX,IN} independent from whether WCHAR_MIN is
defined. Otherwise first including <wchar.h> and then <stdint.h> leads to no
WINT_M{AX,IN} at all.

PR: 64956
Approved by: das (mentor)


129351 17-May-2004 marcel

Fix typo in comment. While here, end the sentence with a period and
remove the empty line between the fdc and sio devices. The empty
line suggests that the comment applies to fdc only while it applies
to all following devices and options.

Typo spotted by: ru@


129323 17-May-2004 marcel

Unbreak build due to previous commit: now that elf_reloc_internal()
gets the relocation base passed in relocbase, we cannot declare a
local variable with the same name. Assume the argument holds the
same value as the local variable did...


129321 17-May-2004 marcel

filter out the fdc(4) and sio(4) devices and corresponding options.
Note that cy(4) uses COM_MULTIPORT, so we need to keep that option.


129282 16-May-2004 peter

Make a small revision to the api between the elf linker core and the
elf_reloc() backends for two reasons. First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).


129023 07-May-2004 marcel

Revert previous commit. We should not get any FP traps from within
the kernel. We can guarantee this by resetting the FP status register.
This masks all FP traps. The reason we did get FP traps was that we
didn't reset the FP status register in all cases.

Make sure to reset the FP status register in syscall(). This is one of
the places where it was forgotten.

While on the subject, reset the FP status register only when we trapped
from user space.


129022 07-May-2004 marcel

Make sure to sanitize the FP status register. Specifically this
masks all FP traps, which should not happen in the kernel.


128990 06-May-2004 njl

Make unnecessary globals static and remove unused includes.

Pointed out by: cscout


128979 05-May-2004 njl

Add an MI implementation of the ACPI global lock routines and retire the
individual asm versions. The global lock is shared between the BIOS and
OS and thus cannot use our mutexes. It is defined in section 5.2.9.1 of
the ACPI specification.

Reviewed by: marcel, bde, jhb


128857 03-May-2004 marcel

Floating-point faults and exceptions can happen in the kernel too.
Do not panic when it happens; handle them.

Run into by: das


128850 03-May-2004 marcel

Catch- and cleanup:
o Fix and improve comments and references,
o Add PFIL_HOOKS, UFS_ACL and UFS_DIRHASH,
o Switch from SCHED_4BSD to SCHED_ULE,
o Remove SCSI_DELAY (there's no SCSI support),


128838 02-May-2004 obrien

Spell Ethernet correctly.


128790 01-May-2004 marcel

Verify the MADT checksum before using the table.

Submitted by: njl


128629 25-Apr-2004 das

Hide FLT_EVAL_METHOD and DECIMAL_DIG in pre-C99 compilation
environments.

PR: 63935
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>


128508 21-Apr-2004 njl

Don't check for NULL, device_get_softc() always succeeds.


128393 18-Apr-2004 alc

MFamd64
Simplify the sf_buf implementation. In short, make it a veneer
over the direct virtual-to-physical mapping.


128105 11-Apr-2004 alc

Remove a comment that refers to avail_start and avail_end as these
variables no longer exist.


128097 10-Apr-2004 alc

- pmap_kenter_temporary() is unused by machine-independent code. Therefore,
move its declaration to the machine-dependent header file on those
machines that use it. In principle, only i386 should have it.
Alpha and AMD64 should use their direct virtual-to-physical mapping.
- Remove pmap_kenter_temporary() from ia64. It is unused. Approved
by: marcel@


128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127922 06-Apr-2004 alc

Remove avail_end. As of yesterday, it is unused.


127875 05-Apr-2004 alc

Remove avail_start on those platforms that no longer use it. (Only amd64
does anything with it beyond simple initialization.)


127869 05-Apr-2004 alc

Remove unused arguments from pmap_init().


127788 03-Apr-2004 alc

In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not. Add a new parameter so that the caller can specify which is
the case.

Reported by: dillon


127494 27-Mar-2004 marcel

MFi386: correctly calculate the top-of-stack when a kthread is created
with a larger kernel stack. Remove inclusion of opt_kstack_pages.h now
that it's unused.


127253 21-Mar-2004 marcel

In breakpoint(), use a different immediate to make sure we can
distinguish between debugger inserted breakpoints and fixed
breakpoints. While here, make sure the break instruction never
ends up in the last slot of a bundle by forcing it to be an
M-unit instruction. This makes it easier for use to skip over
it.


127241 20-Mar-2004 alc

- Add uiomove_fromphys() implementations to alpha and ia64. These only
differ trivially from amd64.
- Correct a spelling error in a comment.


127239 20-Mar-2004 marcel

Introduce the cpumask_t type. The purpose of the type is to create a
level of abstraction for any and all CPU mask and CPU bitmap variables
so that platforms have the ability to break free from the hard limit
of 32 CPUs, simply because we don't have more bits in an u_int. Note
that the type is not supposed to solve massive parallelism, where
the number of CPUs can be larger than the width of the widest integral
type. As such, cpumask_t is not supposed to be a compound type. If
such would be necessary in the future, we can deal with the issues
then and there. For now, it can be assumed that the type is integral
and unsigned.

With this commit, all MD definitions start off as u_int. This allows
us to phase-in cpumask_t at our leasure without breaking anything.
Once cpumask_t is used consistently, platforms can switch to wider
(or smaller) types if such would be beneficial (or not; whatever :-)

Compile-tested on: i386


127219 20-Mar-2004 marcel

Replace uint64_t with unsigned long in struct dbreg.


127217 20-Mar-2004 marcel

Remove the last traditional hints. These hints only served the purpose
for uart(4) to figure out which device to use as console. Use this file
to define hw.uart.console instead so that we don't have to put it in
the default loader.conf, which makes it hard to override.


127146 17-Mar-2004 jmg

sync comment with i386's isa.c.. This removes a comment that is YEARS
old...


127086 16-Mar-2004 alc

Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.


126919 13-Mar-2004 scottl

Now that contigfree() does not require Giant, don't grab it in busdma.


126825 10-Mar-2004 marcel

Identify the Deerfield processor. Deerfield is a low-voltage variant
based on the Madison core and targeting the low end of the spectrum.
Its clock frequency is 1Ghz, whereas Madison starts at 1.3Ghz. Since
the CPUID information is the same for Madison and Deerfield, we use
the clock frequency to identify the processor.
Supposedly the Deerfield only uses 62W, which seems to be less than
modern Xeon processors (about 70W) and about half what a Madison would
need.


126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


126716 07-Mar-2004 alc

Integrate the code from pmap_pinit2() into pmap_pinit(), leaving
pmap_pinit2() empty.

Approved by: marcel


126715 07-Mar-2004 alc

Remove unused declarations. (Some time ago, these variables became fields
of vm/vm.h's struct kva_md_info.)


126649 05-Mar-2004 le

Fix syntax errors and wrong function prototypes in several MD header
files when using non-GNUC compilers.

PR: kern/58515
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>
Approved by: grog (mentor), obrien


126106 22-Feb-2004 marcel

Do not pre-map the I/O port space. On the Intel Tiger 4 this conflicts
with a memory mapped I/O range that's immediately before it and is
not 256MB aligned. As a result, when an address is accessed in the
memory mapped range and a direct mapping is added for it, it overlaps
with the pre-mapped I/O port space and causes a machine check.

Based on a patch from: arun@


126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


126078 21-Feb-2004 phk

Device megapatch 3/6:

Add missing D_TTY flags to various drivers.

Complete asserts that dev_t's passed to ttyread(), ttywrite(),
ttypoll() and ttykqwrite() have (d_flags & D_TTY) and a struct tty
pointer.

Make ttyread(), ttywrite(), ttypoll() and ttykqwrite() the default
cdevsw methods for D_TTY drivers and remove the explicit initializations
in various drivers cdevsw structures.


126076 21-Feb-2004 phk

Device megapatch 1/6:

Free approx 86 major numbers with a mostly automatically generated patch.

A number of strategic drivers have been left behind by caution, and a few
because they still (ab)use their major number.


125975 18-Feb-2004 phk

Change the disk(9) API in order to make device removal more robust.

Previously the "struct disk" were owned by the device driver and this
gave us problems when the device disappared and the users of that device
were not immediately disappearing.

Now the struct disk is allocate with a new call, disk_alloc() and owned
by geom_disk and just abandonned by the device driver when disk_create()
is called.

Unfortunately, this results in a ton of "s/\./->/" changes to device
drivers.

Since I'm doing the sweep anyway, a couple of other API improvements
have been carried out at the same time:

The Giant awareness flag has been flipped from DISKFLAG_NOGIANT to
DISKFLAG_NEEDSGIANT

A version number have been added to disk_create() so that we can detect,
report and ignore binary drivers with old ABI in the future.

Manual page update to follow shortly.


125112 27-Jan-2004 marcel

Sort PFIL_HOOKS.


124935 24-Jan-2004 jeff

- Recruit some new ULE users by making it the default scheduler in GENERIC.
ULE will be in a probationary period to determine whether it will be left
as the default in 5.3 which would likely mean the rest of the 5.x series.


124919 24-Jan-2004 nectar

Add PFIL_HOOKS to the GENERIC kernel configuration, primarily so
that one can load the IPFilter module (which requires PFIL_HOOKS).

Requested by: Many, for over a year


124739 20-Jan-2004 marcel

Fix handling of FP traps:
o For traps, the cr.iip register points to the next instruction to
execute on interrupt return (modulo slot). Since we need to get
the bundle of the instruction that caused the FP fault/trap, make
sure we fetch the previous bundle if the next instruction is in
fact the first in a bundle.
o When we call the FPSWA handler, we need to tell it whether it's
a trap or a fault (first argument). This was hardcoded to mean a
fault.

Also, for FP faults, when a fault is converted to a trap, adjust the
cr.iip and cr.ipsr registers to point to the next instruction. This
makes sure that the SIGFPE handler gets a consistent state.


124737 20-Jan-2004 marcel

s/framep/tf/g -- this normalizes on the use of tf to point to the
trapframe and improves grep-ability.


124478 13-Jan-2004 des

Whitespace nit.


124296 09-Jan-2004 nectar

Provide sysarch(2) prototypes in the MD sysarch.h headers. While I'm
at it, use the ANSI C generic pointer type for the second argument,
thus matching the documentation.

Remove the now extraneous (and now conflicting) function declarations
in various libc sources. Remove now unnecessary casts.

Reviewed by: bde


124092 03-Jan-2004 davidxu

Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr


123929 28-Dec-2003 silby

Track three new sendfile-related statistics:
- The number of times sendfile had to do disk I/O
- The number of times sfbuf allocation failed
- The number of times sfbuf allocation had to wait


123920 28-Dec-2003 silby

Move the declaration of sfbufspeak and sfbufsused to mbuf.h,
and use imax instead of max, as sfbufspeak and sfbufsused
are signed.

Submitted by: bde


123884 27-Dec-2003 silby

Track current and peak sfbuf usage, export the values via sysctl.


123819 24-Dec-2003 marcel

Don't use NULL with integral types.


123799 24-Dec-2003 peter

Return AE_OK for stub functions returning ACPI_STATUS, not NULL


123791 24-Dec-2003 peter

GC the unused <machine/kse.h> file.


123742 23-Dec-2003 peter

Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.


123629 18-Dec-2003 peter

Add missing #include "opt_compat.h" so that the compatability function
freebsd4_freebsd32_sigreturn() is defined when expected. This should
unbreak the tinderbox. Sorry.


123528 14-Dec-2003 marcel

In set_mcontext(), take into account that kse_switchin(2) will
eventually be passed an async. context as well as a syscall
context.
While here, fix a serious bug in that if the trapframe is a
syscall frame, but we're restoring an async context, we need
to clear the FRAME_SYSCALL flag so that we leave the kernel
via exception_restore.


123423 11-Dec-2003 peter

Assimilate ia64 back into the fold with the common freebsd32/ia32 code.
The split-up code is derived from the ia64 code originally.

Note that I have only compile-tested this, not actually run-tested it.
The ia64 side of the force is missing some significant chunks of signal
delivery code.


123420 10-Dec-2003 peter

Fix last second typo.


123419 10-Dec-2003 peter

Use gcc's superior ffs() builtin.


123418 10-Dec-2003 peter

Use ffs(x) == popcnt(x ^ (x - 1)) to implement 64 bit ffsl(). gcc's
ffs() builtin uses this already but truncates the upper 32 bits.


123346 09-Dec-2003 marcel

Don't panic for misalignment traps when the onfault handler is set.
Not all transfers between kernel and user space are byte oriented
and thus alignment safe. Especially fuword*() and suword*() are
sensitive to alignment but in general more optimal than block copies.
By catching the misalignment trap we avoid pessimizing the common
case of properly aligned memory accesses which we would do if we
were to use byte copies or adding tests for proper alignment.

Note that the expectation that the kernel produces aligned pointers
is unchanged. This change therefore relates to possible unaligned
pointers generated in userland.


123326 09-Dec-2003 njl

Use the ACPI-CA definitions for the various APIC tables instead of our
own.


123288 08-Dec-2003 obrien

Move the bktr(4) <arch>/include/ioctl_{bt848,meteor}.h files to dev/bktr
as these ioctl's aren't MD. This also means they are installed in
/usr/include/dev/bktr now. Also provide compatability wrappers for
where these headers lived in 4.x.


123255 07-Dec-2003 marcel

Simplify the contexts created by the kernel and remove the related
flags. We now create asynchronous contexts or syscall contexts only.
Syscall contexts differ from the minimal ABI dictated contexts by
having the scratch registers saved and restored because that's where
we keep the syscall arguments and syscall return values.
Since this change affects KSE, have it use kse_switchin(2) for the
"new" syscall context.


123223 07-Dec-2003 imp

Ooops. These are still used by the bktr driver. David O'Brien has
plans for dealing, but I'll let him deal.

Pointy hat to: imp@


123213 07-Dec-2003 imp

Remote meteor driver. It hasn't compiled in over 3 years. If someone
makes it compile again, and can test it, we can restore the driver to
the tree.


122947 21-Nov-2003 jhb

- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.

Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)


122918 20-Nov-2003 marcel

Set the ACPI processor Id in the PCPU structure so that CPU idling
on SMP systems has a chance of working. This was a loose end of the
implementation of the ACPI Cx idle states. Since our logical CPU Id
is the ACPI processor Id, we do not need to jump through hoops to
obtain it.

Approved: re@ (jhb)


122841 17-Nov-2003 peter

Widen the enable/disable helper function's argument in line with the
ithread_create() changes etc. This should be mostly a NOP.


122829 17-Nov-2003 bde

Fixed a pedantic syntax error (a stray semicolon at the end of
PCPU_MD_FIELDS).


122821 16-Nov-2003 alc

- Remove unnecessary synchronization from sf_buf_init(). (There is only
one active CPU when sf_buf_init() is performed.)


122780 16-Nov-2003 alc

- Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.


122763 15-Nov-2003 njl

Add the pc_acpi_id PCPU member. The new acpi_cpu driver uses this to
dereference the softc.


122525 12-Nov-2003 marcel

Remove ia64_highfp_load() now that it's unused.


122518 12-Nov-2003 marcel

Further work-out the handling of the high FP registers. The most
important change is in cpu_switch() where we disable the high FP
registers for the thread that we switch-out if the CPU currently
has its high FP registers. This avoids that the high FP registers
remain enabled for the thread even when the CPU has unloaded them
or the thread migrated to another processor.
Likewise, when we switch-in a thread of that has its high FP
registers on the CPU, we enable them. This avoids an otherwise
harmless, but unnecessary trap to have them enabled.

The code that handles the disabled high FP trap (in trap()) has
been turned into a critical section for the most part to avoid
being preempted. If there's a race, we bail out and have the
processor trap again if necessary.

Avoid using the generic ia64_highfp_save() function when the
context is predictable. The function adds unnecessary overhead.
Don't use ia64_highfp_load() for the same reason. The function
is now unused and can be removed.

These changes make the lazy context switching of the high FP
registers in an UP kernel functional.


122480 11-Nov-2003 marcel

Save and restore the high FP registers in {g|s}_mcontext(). Note
that we currently do not keep track of whether the thread has
actually used the high FP registers before. If not, we should
not save them in the context which automaticly means that we
also would not restore them from the context. For now, do it
unconditionally so that we can reach functional completeness.


122479 11-Nov-2003 marcel

Fix a nasty bug that got exposed when the sendsig() and sigreturn()
functions switched to using {g|s}et_mcontext(). The problem is that
sigreturn(), being a syscall, can be given an async. context (i.e.
one corresponding to an interrupt or trap). When this happens, we
try to return to user mode via epc_syscall_return with a trapframe
that can only be used to return to user mode via exception_restore.

To fix this, we check the frame's flags immediately prior to
epc_syscall_return and branch to exception_restore for non-syscall
frames. Modify the assertion in set_mcontext() to check that if
there's a mismatch, it's because of sigreturn().


122389 10-Nov-2003 marcel

In get_mcontext(), do not update bspstore and ndirty in the trapframe.
Only update them in the newly created context to reflect the state
after copying the dirty registers onto the user stack. If we were to
update the trapframe, we lose the state at entry into the kernel. We
may need that after we create the context, such as for KSE upcalls.

We have to update the trapframe after writing the dirty registers to
the user stack for signal delivery to work. But this is best done in
sendsig() itself where it applies, not in get_mcontext() where it's
done unconditionally.


122373 09-Nov-2003 marcel

When a thread is being swapped-out, save the high FP registers. We
have a pointer in the PCPU to the PCB of the thread that currently
has its high FP registers loaded.


122368 09-Nov-2003 marcel

Use get_mcontext() to construct the signal context in sendsig() and
use set_mcontext() to restore the context in sigreturn(). Since we
put the syscall number and the syscall arguments in the trapframe
(we don't save the scratch registers for syscalls, which allows us
to reuse the space to our advantage), create a MD specific flag so
that we save the scratch registers even for syscalls. We would not
be able to restart a syscall otherwise.

The signal trampoline does not need to flush the regiters anymore,
because get_mcontext() already handles that. In fact, if we set up
the context correctly, we do not need to have a trampoline at all.
This change however only minimally changes the trampoline code. In
follow-up commits this can be further optimized.

Note that normally we preserve cfm and iip in the trapframe created
by the EPC syscall path when we restore a context in set_mcontext()
because those fields are not normally set for a synchronuous context.
The kernel puts the return address and frame info of the syscall
stub in there. By preserving these fields we hide this detail from
userland which allows us to use setcontext(2) for user created
contexts. However, sigreturn() is commonly called from the trampoline,
which means that if we preserve cfm and iip in all cases, we would
return to the trampoline after the sigreturn(), which means we hit
the safety net: we call exit(2). So, we do not preserve cfm and iip
when we have a synchronous context that also has scratch registers
(the uncommon context created by sendsig() only), under the assumption
that if such a context is created in userland, something special is
going on and the use of cfm and iip is then just another quirk. All
this is invisible in the common case.


122364 09-Nov-2003 marcel

Change the clear_ret argument of get_mcontext() to be a flags argument.
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.

This change is triggered by a need on ia64 to have another knob for
get_mcontext().


122333 08-Nov-2003 marcel

Remove the atkbd, psm, sc and vga devices. Most ia64 boxes out there
are zx1 based machines and they don't particularly like it when we
poke at them with PC legacy code. The atkbd and psm devices were
disabled in the hints file so that one could enable them on machines
that support legacy devices, but that's not really something you can
expect from a first-time installer. This still leaves syscons (sc)
and the vga device, which were enabled by default and wrecking havoc
anyway. We could disable them by default like the atkbd and psm
devices, but there's really no point in pretending we're in a better
shape that way.


122266 07-Nov-2003 scottl

Document the lockfunc and lockfuncarg arguments to bus_dma_tag_create() in
the busdma headers.


122245 07-Nov-2003 jhb

Regen.


122243 07-Nov-2003 jhb

Sync with global syscalls.master. ptrace(), dup(), pipe(), ktrace(),
ia32_sigaltstack(), sysarch(), issetugid(), utrace(), and ia32_sigaction()
are MP safe.


122162 06-Nov-2003 marcel

Add support for unaligned ld2, st2, st4 and st8. While here, make
sure we handle stacked registers properly by taking into account
that:
1. bspstore points after the frame (due to cover),
2. we need to adjust for intermediate NaT collections.


121933 03-Nov-2003 marcel

Handle unaligned 4-byte loads. While in the neighborhood, remove the
cr.isr sanity check. We actually encounter insanities, which very
likely means that the insanity check itself is insane. Remove an empty
comment while I'm at it.


121926 03-Nov-2003 marcel

Add a bogus definition of __va_list for use by lint. Make it visible
only when lint is defined to protect builds with non-GNU compilers.


121892 02-Nov-2003 marcel

Remove headers copied from i386 and either useless or wrong on ia64.
An example of useless is bios.h. An example of wrong is msdos.h (due
to the use of long for 32-bit fields).

display.h cannot be removed because it's used by syscons. That header
however has no platform dependency and shouldn't really be here.

Removal if these headers may cause build failures in the ports tree.
It's the ports that need fixing in that case.

Tested with: buildworld, LINT


121635 28-Oct-2003 marcel

When switching the RSE to use the kernel stack as backing store, keep
the RNAT bit index constant. The net effect of this is that there's
no discontinuity WRT NaT collections which greatly simplifies certain
operations. The cost of this is that there can be up to 504 bytes of
unused stack between the true base of the kernel stack and the start
of the RSE backing store. The cost of adjusting the backing store
pointer to keep the RNAT bit index constant, for each kernel entry,
is negligible.

The primary reasons for this change are:
1. Asynchronuous contexts in KSE processes have the disadvantage of
having to copy the dirty registers from the kernel stack onto the
user stack. The implementation we had so far copied the registers
one at a time without calculating NaT collection values. A process
that used speculation would not work. Now that the RNAT bit index
is constant, we can block-copy the registers from the kernel stack
to the user stack without having to worry about NaT collections.
They will be in the right place on the user stack.
2. The ndirty field in the trapframe is now also usable in userland.
This was previously not the case because ndirty also includes the
space occupied by NaT collections. The value could be off by 8,
depending on the discontinuity. Now that the RNAT bit index is
contants, we have exactly the same number of NaT collection points
on the kernel stack as we would have had on the user stack if we
didn't switch backing stores.
3. Debuggers and other applications that use ptrace(2) can now copy
the dirty registers from the kernel stack (using ptrace(2)) and
copy them whereever they want them (onto the user stack of the
inferior as might be the case for gdb) without having to worry
about NaT collections in the same way the kernel doesn't have to
worry about them.

There's a second order effect caused by the randomization of the
base of the backing store, for it depends on the number of dirty
registers the processor happened to have at the time of entry into
the kernel. The second order effect is that the RSE will have a
better cache utilization as compared to having the backing store
always aligned at page boundaries. This has not been measured and
may be in practice only minimally beneficial, if at all measurable.


121622 27-Oct-2003 marcel

The previous commit removed both clause 3 and clause 4 from the UCB
license. Only clause 3 has been revoked. Restore the fourth clause
as clause 3.

Pointed out by: das@

Remove my name as a copyright holder since I don't use a BSD license
compatible or comparable to the UCB license. I choose not to add a
complete second license for my work for aesthetic reasons, nor to
replace the UCB license on grounds of rewriting more than 90% of the
source files. The rewrite can also be seen as an enhancement and since
the files were practically empty, it's rather trivial to have changed
90% of the files.


121600 27-Oct-2003 marcel

Add support for userland to access I/O port space. This is primarily
added for XFree86. There are 2 reasons for doing this with sysarch():
1. The memory mapped I/O space is not at a fixed physical address. An
application has to use some interface to get the base address. It
gets worse if the machine has multiple memory mapped I/O spaces.
2. Access to the memory mapped I/O space needs to happen through a
translation that is flagged as uncachable. There's no interface
that allows a process to do uncached memory I/O, other than though
/dev/mem (possibly).

So, until we either disallow direct access to I/O or bus space from
userland or have a better way of doing this, sysarch() has the least
negative impact on existing interfaces.


121459 24-Oct-2003 marcel

Remove unused header. See also ia64/disasm/disasm.h.


121457 24-Oct-2003 marcel

Remove ia64_pack_bundle() and ia64_unpack_bundle(). They are not
used anymore.


121456 24-Oct-2003 marcel

Remove unused file. db_disasm() has been implemented in db_interface.c
now.


121454 24-Oct-2003 marcel

Implement db_disasm() by using the new disassembler. Temporarily
unimplement db_write_breakpoint() and db_clear_breakpoint().


121452 24-Oct-2003 arun

Use a TR of size 1 << IA64_ID_PAGE_SHIFT instead of 16M to avoid
overlapping TR/TC entries (which results in a machine check). Note
that we don't look at the size of the memory descriptor, because
it doesn't guarantee non-overlap.

With this change, a UP kernel could boot on a Intel Tiger4 machine
with the following options:

options LOG2_ID_PAGE_SIZE=26 # 64M
options LOG2_PAGE_SIZE=14 # 16K

Approved by: marcel


121449 24-Oct-2003 marcel

Don't use fuword() or suword() unconditionally. They explicitly
disallow reading or writing.


121448 24-Oct-2003 marcel

Remove two unused fields in the operand structure (o_read & o_write).


121416 23-Oct-2003 marcel

Cleanup. Remove the md_flags for threads. It's not used. The flags
we had were bogus.
While here, reassign the copyright to the Project. There's nothing
in this files that originates from NetBSD, especially now that the
FreeBSD/alpha bits have been removed, but even then the amount of
inherited code that we actually used was nil.


121415 23-Oct-2003 marcel

Reimplement unaligned_fixup() using the new disassembler and a
mcontext_t for the register values. Currently only ld8 and ldfd
instructions are handled as those are the ones we need now (a
misaligned ld8 occurs 4 times in ntpd(8) and a misaligned ldfd
occurs once in mozilla 1.4 and 1.5). Other instructions are added
when needed.


121413 23-Oct-2003 marcel

Remove unused include of <machine/inst.h>


121412 23-Oct-2003 marcel

Remove prototype of unaligned_fixup() and fix a nearby style(9)
bug.


121411 23-Oct-2003 marcel

Add prototypes for spillfd() and unaligned_fixup().


121410 23-Oct-2003 marcel

Add spillfd(). This function loads a double-precision FP register
at the first address and spills it to the second address. This
allows unaligned_fixup() to update the context of the process in
a way that assures proper rounding.
Similar functions for single-and extended-precision are added when
needed.


121404 23-Oct-2003 marcel

Add a new disassembler that improves over the previous disassembler
in that it provides an abstract (intermediate) representation for
instructions. This significantly improves working with instructions
such as emulation of instructions that are not implemented by the
hardware (e.g. long branch) or enhancing implemented instructions
(e.g. handling of misaligned memory accesses). Not to mention that
it's much easier to print instructions.

Functions are included that provide a textual representation for
opcodes, completers and operands.

The disassembler supports all ia64 instructions defined by revision
2.1 of the SDM (Oct 2002).


121294 21-Oct-2003 marcel

Remove md_bspstore from the MD fields of struct thread. Now that
the backing store is at a fixed address, there's no need for a
per-thread variable.


121268 20-Oct-2003 marcel

Put the RSE backing store at a fixed address. This change is triggered
by libguile that needs to know the base of the RSE backing store. We
currently do not export the fixed address to userland by means of a
sysctl so user code needs to hardcode it for now. This will be revisited
later.

The RSE backing store is now at the bottom of region 4. The memory stack
is at the top of region 4. This means that the whole region is usable
for the stacks, giving a 61-bit stack space.

Port: lang/guile (depended of x11/gnome2)


121228 18-Oct-2003 njl

Add the cpu_idle_hook() function pointer so that other idlers can be
hooked at runtime. Make C1 sleep (e.g., HLT) be the default. This
prepares the way for further ACPI sleep states.


121148 17-Oct-2003 marcel

Implement cpu_idle() on ia64. We put the processor in a lightweight
halt state that minimizes power consumption while still preserving
cache and TLB coherency. Halting the processor is not conditional at
this time. Tested with UP and SMP kernels.


120937 09-Oct-2003 robert

Implement preliminary support for the PT_SYSCALL command to ptrace(2).


120928 09-Oct-2003 marcel

With BETA 5 of libuwx some of the application registers are renamed
from UWX_REG_MUMBLE to UWX_REG_AR_MUMBLE. Compatibility defines are
present in libuwx. Change the names here so that we don't depend on
compatibility defines.

Note that there's now an UWX_REG_PFS and an UWX_REG_AR_PFS and the
former is not a compatibility define for the latter AFAICT. Change
to UWX_REG_AR_PFS as that seems to be the one we need to handle.


120914 08-Oct-2003 marcel

Include <sys/smp.h> for the prototype of smp_rendezvous().


120831 06-Oct-2003 bms

Move pmap_resident_count() from the MD pmap.h to the MI pmap.h.
Add a definition of pmap_wired_count().
Add a definition of vmspace_wired_count().

Reviewed by: truckman
Discussed with: peter


120722 03-Oct-2003 alc

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


120683 03-Oct-2003 marcel

Swap the syscall caller frame info (i.e. the return pointer and
frame marker) and the syscall stub frame info in the trap frame.
Previously we stored the stub frame info in (rp,pfs) and the
caller frame info in (iip,cfm). This ends up being suboptimal
for the following reasons:
1. When we create a new context, such as for an execve(2), we had
to set the (rp,pfs) pair for the entry point when using the
syscall path out of the kernel but we need to set the (iip,cfm)
pair when we take the interrupt way out. This is mostly just
an inconsistency from the kernel's point of view, but an ugly
irregularity from gdb(1)'s point of view.
2. The getcontext(2) and setcontext(2) syscalls had to swap the
(rp,pfs) and (iip,cfm) pairs to make the context compatible
with one created purely in userland.

Swapping the (rp,pfs) and (iip,cfm) pairs is visible to signal
handlers that actually peek at the mcontext_t and to gdb(1).
Since this change is made for gdb(1) and we don't care about
signal handlers that peek at the mcontext_t because we're still
a tier 2 platform, this ABI breakage is academic at this moment
in time.

Note that there was no real reason to save the caller frame info
in (iip,cfm) and the stub frame info in (rp,pfs).


120540 28-Sep-2003 marcel

Drop any and all support for varargs. There's no history to worry
about because we're still tier 2 and our current compiler, as well
as future compilers will not support varargs. This is mostly a
no-op in practice, because <sys/varargs.h> should already cause
compile failures.


120464 26-Sep-2003 phk

Set cn_name, not cn_dev


120422 25-Sep-2003 peter

Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.


120375 23-Sep-2003 nyan

Implement the bus_space_map() function to allocate resources and initialize
a bus_handle, but currently it does only initializing a bus_handle.


120296 20-Sep-2003 marcel

Fix the last remaining problem encountered by KSE: apparently it is
not guaranteed that the RSE writes the NaT collection immediately,
sort of atomically, to the backing store when it writes the register
immediately prior to the NaT collection point. This means that we
cannot assume that the low 9 bits of the backingstore pointer do not
point to the NaT collection. This is rather a surprise and I don't
know at this time if it's a bug in the Merced or that it's actually
a valid condition of the architecture. A quick scan over the sources
does not indicate that we depend on the false assumption elsewhere,
but it's something to keep in mind.

The fix is to write the saved contents of the ar.rnat register to
the backingstore prior to entering the loop that copies the dirty
registers from the kernel stack to the user stack.


120294 20-Sep-2003 marcel

Move uma_small_alloc() and uma_small_free() to uma_machdep.c. These
functions reference UMA internals from <vm/uma_int.h>, which makes
them highly unwanted in non-UMA specific files.

While here, prune the includes in pmap.c and use __FBSDID(). Move
the includes above the descriptive comment.

The copyright of uma_machdep.c is assigned to the project and can
be reassigned to the foundation if and when when such is preferrable.


120252 19-Sep-2003 marcel

Fix the most significant KSE breakage caused by not restoring the
restart instruction bits in the PSR. As such, we were returning
from interrupt to the instruction in the bundle that caused us
to enter the kernel, only now we're returning to a completely
different bundle.

While close here: add two KASSERTs to make sure that we restore
sync contexts only when entered the kernel through a syscall and
restore an async context only when entered the kernel through an
interrupt, trap or fault.

While not exactly here, but close enough: use suword64() when we
copy the dirty registers from the kernel stack to the user stack.
The code was intended to be be replaced shortly after being added,
but that was a couple of weeks ago. I might as well avoid that it
is a source for panics until it's replaced.


120250 19-Sep-2003 marcel

Revamp trap(): make it more explicit which kinds of traps/faults we
can get (or not) and what we do with them. This fixes the behaviour
for NaT consumption and speculation faults in that we now don't panic
for user faults.

Remove the dopanic label and move the code to a function. This makes
it easier in the simulator to set a breakpoint.

While here, remove the special handling of the old break-based syscall
path and move it to where we handle the break vector. While here,
reserve a new break immediate for KSE. We currently use the old break-
based syscall to deal with restoring async contexts. However, it has
the side-effect of also setting the signal mask and callong ast() on
the way out. The new break immediate simply restores the context and
returns without calling ast().


120222 19-Sep-2003 marcel

Change TRAPF_USERMODE and CLOCKF_USERMODE to not test for CPL == 3,
but for CPL != 0. For some reason yet unknown it is possible for the
CPL to be 2. This would previously be counted as kernel mode, which
resulted in nasty panics. By changing the test it is now treated as
user mode, which is more correct. We still need to figure out how it
is possible that the privilege level can be 2 (or 1 for that matter),
because it's not used by us. We only use 3 (user mode) and 0 (kernel
mode).


120212 19-Sep-2003 marcel

Include "opt_kstack_pages.h". We export KSTACK_PAGES to assembly and
better have the right value.


119999 12-Sep-2003 alc

Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from: tegge


119970 10-Sep-2003 marcel

Rewrite the SAPIC initialization to always program the RTEs with what
we think is the correct trigger mode and polarity. This allows us to
implement BUS_CONFIG_INTR() as an update of the RTE in question.
Consequently, we can trust the RTE when we enable an interrupt and
avoids that we need to know about the trigger mode and polarity at
that time.


119946 10-Sep-2003 jhb

Move the definitions for ACPI MADT table entries not present in the ACPICA
distribution to a MI header so it can be shared with other architectures.


119906 09-Sep-2003 marcel

Introduce IA64_ID_PAGE_{MASK|SHIFT|SIZE} and LOG2_ID_PAGE_SIZE. The
latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn
determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.

The constants are used instead of the literal hardcoding (in its
various forms) of the size of the direct mappings created in region
6 and 7. The default and probably only workable size is still 256M,
but for kicks we use 128M for LINT.


119869 08-Sep-2003 alc

Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by: tegge


119868 08-Sep-2003 wpaul

Take the support for the 8139C+/8169/8169S/8110S chips out of the
rl(4) driver and put it in a new re(4) driver. The re(4) driver shares
the if_rlreg.h file with rl(4) but is a separate module. (Ultimately
I may change this. For now, it's convenient.)

rl(4) has been modified so that it will never attach to an 8139C+
chip, leaving it to re(4) instead. Only re(4) has the PCI IDs to
match the 8169/8169S/8110S gigE chips. if_re.c contains the same
basic code that was originally bolted onto if_rl.c, with the
following updates:

- Added support for jumbo frames. Currently, there seems to be
a limit of approximately 6200 bytes for jumbo frames on transmit.
(This was determined via experimentation.) The 8169S/8110S chips
apparently are limited to 7.5K frames on transmit. This may require
some more work, though the framework to handle jumbo frames on RX
is in place: the re_rxeof() routine will gather up frames than span
multiple 2K clusters into a single mbuf list.

- Fixed bug in re_txeof(): if we reap some of the TX buffers,
but there are still some pending, re-arm the timer before exiting
re_txeof() so that another timeout interrupt will be generated, just
in case re_start() doesn't do it for us.

- Handle the 'link state changed' interrupt

- Fix a detach bug. If re(4) is loaded as a module, and you do
tcpdump -i re0, then you do 'kldunload if_re,' the system will
panic after a few seconds. This happens because ether_ifdetach()
ends up calling the BPF detach code, which notices the interface
is in promiscuous mode and tries to switch promisc mode off while
detaching the BPF listner. This ultimately results in a call
to re_ioctl() (due to SIOCSIFFLAGS), which in turn calls re_init()
to handle the IFF_PROMISC flag change. Unfortunately, calling re_init()
here turns the chip back on and restarts the 1-second timeout loop
that drives re_tick(). By the time the timeout fires, if_re.ko
has been unloaded, which results in a call to invalid code and
blows up the system.

To fix this, I cleared the IFF_UP flag before calling ether_ifdetach(),
which stops the ioctl routine from trying to reset the chip.

- Modified comments in re_rxeof() relating to the difference in
RX descriptor status bit layout between the 8139C+ and the gigE
chips. The layout is different because the frame length field
was expanded from 12 bits to 13, and they got rid of one of the
status bits to make room.

- Add diagnostic code (re_diag()) to test for the case where a user
has installed a broken 32-bit 8169 PCI NIC in a 64-bit slot. Some
NICs have the REQ64# and ACK64# lines connected even though the
board is 32-bit only (in this case, they should be pulled high).
This fools the chip into doing 64-bit DMA transfers even though
there is no 64-bit data path. To detect this, re_diag() puts the
chip into digital loopback mode and sets the receiver to promiscuous
mode, then initiates a single 64-byte packet transmission. The
frame is echoed back to the host, and if the frame contents are
intact, we know DMA is working correctly, otherwise we complain
loudly on the console and abort the device attach. (At the moment,
I don't know of any way to work around the problem other than
physically modifying the board, so until/unless I can think of a
software workaround, this will have do to.)

- Created re(4) man page

- Modified rlphy.c to allow re(4) to attach as well as rl(4).

Note that this code works for the sample 8169/Marvell 88E1000 NIC
that I have, but probably won't work for the 8169S/8110S chips.
RealTek has sent me some sample NICs, but they haven't arrived yet.
I will probably need to add an rlgphy driver to handle the on-board
PHY in the 8169S/8110S (it needs special DSP initialization).


119867 07-Sep-2003 marcel

Untangle the code in this file to improve understandability. Both
ia64_count_cpus() and ia64_probe_sapics() called a single function
to do the the actual work. The difference in behaviour was handled
in that function and was further complicated by adding bootverbose
related code. As such, even the simplest of changes was hard to
comprehend.

Untangling has been done by increasing code duplication and using
a more naive style of coding. FWIW, the object file is slightly
smaller than before, so things aren't as bad as it may seem.

Triggered by: a simple fix on the P4 branch that never got merged.


119861 07-Sep-2003 alc

MFamd64/i386
Add necessary page locking to pmap_mincore().


119830 07-Sep-2003 marcel

MFp4: Revamped GENERIC (and hints). This is some much more pleasant
to look at...


119828 07-Sep-2003 marcel

Replace sio(4) with uart(4). Remove the sio(4) hints and only add
those hints used by uart(4) for the determination of the serial
console in the absence of the HCDP table.


119787 05-Sep-2003 marcel

Fix a place where I forgot to change the code that checks whether
we return to kernel or userland. This triggered a panic in a KSE
application when TDF_USTATCLOCK was set in the case userland was
interrupted, but we never called ast() on our way out. As such,
we called ast() at some other time. Unfortunately, TDF_USTATCLOCK
handling assumes running in the interrupt thread. This was not
the case anymore.

To avoid making the same mistake later, interrupt() now returns
to its caller whether we interrupted userland or not. This avoids
that we have to duplicate the check in assembly, where it's bound
to fall off the scope. Now we simply check the return value and
call ast() if appropriate.

Run into this: davidxu


119649 01-Sep-2003 marcel

Use pmap_steal_memory() for the msgbuf instead of trying to squeeze
it in the last chunk (phys_avail block). The last chunk very often is
not larger than one or two pages, resulting in a msgbuf that's too
small to hold a complete verbose boot.
Note that pmap_steal_memory() will bzero the memory it "allocates".
Consequently, ia64 will never preserve previous msgbufs. This is not
a noticable difference in practice. If the msgbuf could be reused,
it was invariably too small to have anything preserved anyway.


119624 01-Sep-2003 marcel

Use direct mapped KVA for the sf_buf allocator, as made possible
by the previous commit. While here, fix a typo, reformat comments
and fix a long line.

Tested with: ftpd


119563 29-Aug-2003 alc

Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.


119531 28-Aug-2003 njl

Minor style cleanups.


119469 25-Aug-2003 marcel

Change LOG2_PAGE_SIZE from 14 to 15 bits. This will cause the CTASSERT
in vm_page.h to be reached and thus slightly increases the overall
coverage of LINT on ia64.


119377 23-Aug-2003 marcel

Add the bits for a LINT kernel. It has been verified to compile. We
may need to polish this.


119347 23-Aug-2003 marcel

Remove PAGE_SIZE_4K, PAGE_SIZE_8K and PAGE_SIZE_16K and replace them
with LOG2_PAGE_SIZE. A single option is better to LINT than multiple
mutual exclusive ones.


119337 23-Aug-2003 marcel

Remove unused inclusion of opt_acpi.h


119200 21-Aug-2003 jhb

Regen.


119199 21-Aug-2003 jhb

Swap sigaction/sigreturn since they are in the wrong order.

Noticed indirectly by: peter


119159 20-Aug-2003 marcel

Undo the mistake made in revision 1.77 of trap.c and which was the
ultimate trigger for the follow-up fixes in revisions 1.78, 1.80,
1.81 and 1.82 of trap.c. I was simply too pre-occupied with the
gateway page and how it blurs kernel space with user space and
vice versa that I couldn't see that it was all a load of bollocks.

It's not the IP address that matters, it's the privilege level that
counts. We never run in user space with lifted permissions and we
sure can not run in kernel space without it. Sure, the gateway page
is the exception, but not if you look at the privilege level. It's
user space if you run with user permissions and kernel space otherwise.

So, we're back to looking at the privilege level like it should be.
There's no other way.

Pointy hat: marcel


119015 17-Aug-2003 gordon

Fixup the ELF branding information to point to the new home of rtld.


119004 16-Aug-2003 marcel

In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.


118990 16-Aug-2003 marcel

Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.

ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).

alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.

powerpc: MD prototypes left in cpu.h. Comment added.

Suggested by: bde
Tested with: make universe (pc98 incomplete)


118981 16-Aug-2003 marcel

Fix a range check bug. Don't left-shift the integer argument 'data'.
Sign extension happens after the shift, not before so that boundary
cases like 0x40000000 will not be caught properly.
Instead, right shift ndirty. It is guaranteed to be a multiple of 8.
While here, do some manual code motion and code commoning.

Range check bug pointed out by: iedowse


118935 15-Aug-2003 marcel

Fix the generation of coredumps. We did not take the dirty registers
that were on the kernel stack into account. For now we write them
out to the register stack of the process before creating the dump.
This however is not the final solution. The problem is that we may
invalidate the coredump by overwriting vital information due to an
invalid backing store pointer. Instead we need to write the dirty
registers to an unused region of VM which will result in a seperate
segment in the coredump. For now we can at least get to all the
registers from a coredump.


118934 15-Aug-2003 marcel

Add an instruction group break after the move to application register
and the move to control register to avoid dependency violations when
these functions are used. Note that explicit data and instruction
serialization also need to be in a subsequent instruction group.
This too requires that we have an igrp break here.


118933 15-Aug-2003 marcel

Introduce two machine specific ptrace(2) requests: PT_GETKSTACK and
PT_SETKSTACK. These requests allow the tracing process to access the
dirty registers of the traced process that are on the kernel stack.

Note that there's currently no way to access the rnat register for
those dirty registers that are not (yet) covered by a nat collection
point. The interface for this is still being slept on.

Also note that implied by these requests is the division of work:
The tracing process has to keep track of where registers are spilled
and is responsible to figure out where the NaT bit of the stacked
registers are at any time during the execution of the traced process.
The kernel provides the interfaces but will not abstract the fact
that the register stack can be split. This model does not follow
the approach taken in Linux where PT_PEEK and PT_POKE deals with
this automagically.


118853 13-Aug-2003 marcel

Don't use VM_MIN_KERNEL_ADDRESS to check if the faulting address is
in user space or kernel space. VM_MIN_KERNEL_ADDRESS starts after the
gateway page, which means that improper memory accesses to the gateway
page while in user mode would panic the kernel. Use VM_MAX_ADDRESS
instead. It ends before the gateway page. The difference between
VM_MIN_KERNEL_ADDRESS and VM_MAX_ADDRESS is exactly the gateway page.


118851 13-Aug-2003 marcel

Put an instruction group break between the move to ar.rnat and the
move to ar.rsc. The RSE must be in enforced lazy mode when writing
to RSE modifyable registers. In this case we restore the RSE NaT
collection register ar.rnat. I have seen 2 general exception faults
on pluto1 now that indicate that the move to ar.rsc has already
happened prior to the move to ar.rnat, meaning that the RSE is not
in enforced lazy mode anymore. The ia64 dependency and instruction
ordering rules seem to allow having both registers written to in
the same instruction group, provided ar.rsc is written to later than
ar.rnat (based on the ordering semantics). It appears that we may
be pushing our luck. For now, put them in seperate cycles (by means
of the instruction group break). If we ever get a general exception
fault on the move to ar.rnat again, we have definite proof that
something else is fishy.


118848 12-Aug-2003 imp

Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon


118818 12-Aug-2003 marcel

Extend identifycpu():
o Differentiate between CPU family and CPU model. There are multiple
Itanium 2 models and it's nice to differentiate between them.
o Seperately export the CPU family and CPU model with sysctl.
o Merced is the only model in the Itanium family.
o Add Madison to the Itanium 2 family. We already knew about McKinley.
o Print the CPU family between parenthesis, like we do with the i386
CPU class.

My prototype now identifies itself as:
CPU: Merced (800.03-Mhz Itanium)

pluto1 and pluto2 will eventually identify themselves as:
CPU: McKinley (900.00-Mhz Itanium 2)


118811 12-Aug-2003 marcel

Cleanup prototypes in cpu.h, including fswintrberr and any references
to it. Sort the remaining prototypes in cpu.h.

No functional change.


118798 11-Aug-2003 marcel

Cleanup and style(9) fixes. No functional change.


118739 10-Aug-2003 marcel

o move cpu_reset() from vm_machdep.c to machdep.c.
o reorder cpu_boot(), cpu_halt() and identifycpu().

No functional change.


118717 10-Aug-2003 marcel

Now that we can ignore up to 8KB of dirty registers, remove the RSE
magic from exec_setregs(). In set_mcontext() we now also don't have
to worry that we entered the kernel with more that 512 bytes of
dirty registers on the kernel stack. Note that we cannot make any
assumptions anymore WRT to NaT collection points in exec_setregs(),
so we have to deal with them now.


118640 08-Aug-2003 marcel

MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry().


118607 07-Aug-2003 jhb

Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort. In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by: bde (kern_ktrace.c)


118590 07-Aug-2003 marcel

Better define the flags in the mcontext_t and properly set the flags
when we create contexts. The meaning of the flags are documented in
<machine/ucontext.h>. I only list them here to help browsing the
commit logs:
_MC_FLAGS_ASYNC_CONTEXT
_MC_FLAGS_HIGHFP_VALID
_MC_FLAGS_KSE_SET_MBOX
_MC_FLAGS_RETURN_VALID
_MC_FLAGS_SCRATCH_VALID

Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)


118588 07-Aug-2003 marcel

o Fix cut-n-paste whitespace corruption in previous commit
o For trap-based upcalls the argument (the kse_mailbox) to
the UTS must be written onto the kernel stack, not the
user stack. While here, deal with the fact that we may
be at a NaT collection point.


118566 06-Aug-2003 marcel

In cpu_set_upcall_kse(), create the upcall according to the entry
path into the kernel. Normally it's due to a syscall, but one can
also be created as the result of a clock interrupt (for example).
This now even more looks like exec_setregs().

While here, add an assert that we don't expect more than 8KB of
dirty registers on the kernel stack.


118563 06-Aug-2003 marcel

o In revision 1.45 of exception.S we changed exception_restore to
unconditionally restore ar.k7 (kernel memory stack) and ar.k6
(kernel register stack). I don't know what I was smoking then,
but if you unconditionally restore ar.k6, you also want to
compute its value unconditionally. By having the computation
predicated and dependent on whether we return to user mode, we
would end up writing junk (= invalid value for ar.bspstore) if
we would return to kernel mode. But the whole point of the
unconditional restoration was that there is a grey area where
we still need to have ar.k6 restored. If we restore with a junk
value, we would end up wedging the machine on the next interrupt.
So, unconditionally calculate the value we unconditionally write
to ar.k6.

o The previous braino was found while making the following change:
We used to clear the lower 9 bits of the value we write to ar.k6.
The meaning being that we know that the kernel register stack is
at least 512 byte aligned and simply clearing the lower 9 bits
allows us to return to a context of which we don't have dirty
registers on the kernel stack, even though the context that
entered the kernel does have dirty registers on the kernel stack.
By masking-off the lower bits, we correctly obtain the base of
the register stack without having to worry that we didn't actually
reached the base while unwinding it.
The change is to mask off the lower 13 bits, knowing that the
kernel register stack is always 8KB aligned. The advantage is that
we don't have to worry anymore if there's more than 512 bytes of
dirty registers on the kernel stack. A situation that frequently
occurs. In exec_setregs() in machdep.c:1.147 or older, we had to
deal with that situation by copying the active portion of the
register stack down in multiples of 512 bytes. Now that we mask off
the lower 13 bits we don't have to do that at all. Contemporary
IPF processors have a register file that can hold up to 96 stacked
registers (=784 bytes [incl. 2 NaT collections]). With no indication
that register files grow beyond a couple of hundred registers, we
should not have to worry about it anymore... and yes, 640KB is
enough for everybody :-)
This change helps setcontext(2) and cpu_set_upcall_kse() in that
they can return to completely different contexts without having to
mess with the kernel stack. Of course exec_setregs() doesn't need
to do that anymore as well.


118503 05-Aug-2003 marcel

o Put the syscall return registers in the context. Not only do we
need this for swapcontext(), KSE upcalls initiated from ast()
also need to save them so that we properly return the syscall
results after having had a context switch. Note that we don't
use r11 in the kernel. However, the runtime specification has
defined r8-r11 as return registers, so we put r11 in the context
as well. I think deischen@ was trying to tell me that we should
save the return registers before. I just wasn't ready for it :-)

o The EPC syscall code has 2 return registers and 2 frame markers
to save. The first (rp/pfs) belongs to the syscall stub itself.
The second (iip/cfm) belongs to the caller of the syscall stub.
We want to put the second in the context (note that iip and cfm
relate to interrupts. They are only being misused by the syscall
code, but are not part of a regular context).
This way, when the context is switched to again, we return to
the caller of setcontext(2) as one would expect.

o Deal with dirty registers on the kernel stack. The getcontext()
syscall will flush the RSE, so we don't expect any dirty registers
in that case. However, in thread_userret() we also need to save
the context in certain cases. When that happens, we are sure that
there are dirty registers on the kernel stack.
This implementation simply copies the registers, one at a time,
from the kernel stack to the user stack. NAT collections are not
dealt with. Hence we don't preserve NaT bits. A better solution
needs to be found at some later time.
We also don't deal with this in all cases in set_mcontext. No
temporay solution is implemented because it's not a showstopper.
The problem is that we need to ignore the dirty registers and we
automaticly do that for at most 62 registers. When there are more
than 62 dirty registers we have a memory "leak".

This commit is fundamental for KSE support.


118450 04-Aug-2003 marcel

Fix logic bug in the previous commit. Any region less than 5 is a
user space region. Hence, we need to test if 5 is greater than the
region; not greater equal.
This bug caused us to call ast() while interrupting kernel mode.


118443 04-Aug-2003 jhb

- Since td_critnest is now initialized in MI code, it doesn't have to be
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.


118414 04-Aug-2003 marcel

Cleanup the clock code. This includes:
o Remove alpha specific timer code (mc146818A) and compiled-out
calibration of said timer.
o Remove i386 inherited timer code (i8253) and related acquire and
release functions.
o Move sysbeep() from clock.c to machdep.c and have it return
ENODEV. Console beeps should be implemented using ACPI or if no
such device is described, using the sound driver.
o Move the sysctls related to adjkerntz, disable_rtc_set and
wall_cmos_clock from machdep.c to clock.c, where the variables
are.
o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't
bother faking a stathz that's 1/8 of that. Keep it simple: hz
defaults to HZ and stathz equals hz. This is also how it's done
for sparc64.
o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj)
to calculate ITC skew and corrections. On average, we adjust the
ITC match register once every ~1500 interrupts for a duration of
2 consequtive interruprs. This is to correct the non-deterministic
behaviour of the ITC interrupt (there's a delay between the match
and the raising of the interrupt).
o Add 4 debugging sysctls to monitor clock behaviour. Those are
debug.clock_adjust_edges, debug.clock_adjust_excess,
debug.clock_adjust_lost and debug.clock_adjust_ticks. The first
counts the individual adjustment cycles (when the skew first
crosses the threshold), the second counts the number of times the
adjustment was excessive (any non-zero value is to be considered
a bug), the third counts lost clock interrupts and the last counts
the number of interrupts for which we applied an adjustment
(debug.clock_adjust_ticks / debug.clock_adjust_edges gives the
avarage duration of an individual adjustment -- should be ~2).

While here, remove some nearby (trivial) left-overs from alpha and
other cleanups.


118402 04-Aug-2003 marcel

Fix handling of external interrupts: we weren't calling ast() when
interrupting user mode. The net effect of this bug is that a clock
interrupt does not cause rescheduling and processes are not
preempted. It only takes a "while (1);" to render the machine
useless.

This bug was introduced by the context changes and EPC syscall code.
Handling of ASTs was moved to C for clarity and ease of maintenance,
but was not added for the external interrupt case.

This needs to be revisited. We now have calls to do_ast() in trap(),
break_syscall() and ivt_External_Interrupt(). A single call in
exception_restore covers these 3 places without duplication. This
is where we handled ASTs prior to the overhaul, except that the
meat has been moved to do_ast(), a C function. This was the goal
to begin with.

Pointy hat: marcel


118382 03-Aug-2003 obrien

Style sync.


118333 02-Aug-2003 marcel

Don't use uint64_t. Use unsigned long instead. One is supposed to use
ucontext_t without having to include headers other than <ucontext.h>.


118296 01-Aug-2003 marcel

Write the preserved registers to (and read them from) struct reg and
struct fpreg.


118244 31-Jul-2003 bmilekic

Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.


118239 31-Jul-2003 peter

Deal with 'options KSTACK_PAGES' being a global option.


118238 31-Jul-2003 peter

Cosmetic: fix some disorder of #include "opt_...." files


118237 31-Jul-2003 peter

Remove leftover relic of pmap_new_thread() etc.


118081 27-Jul-2003 mux

- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.


118050 26-Jul-2003 marcel

Remove prototype of ia64_pa_access(). The function has been moved to
mem.c where it's been made static.


118048 26-Jul-2003 marcel

Avoid using __aligned(16). Instead define the jmp_buf in terms of
long doubles. This gives us 16-byte alignment. Add a CTASSERT for
the size of the jmp_buf to detect ABI breakages.


118046 26-Jul-2003 marcel

Unbreak ia64 builds now -Werror is enabled again. Avoid obsolete
memory operand construct.


118034 25-Jul-2003 marcel

Revert previous commit. We don't use setjmp()/longjmp() for context
switching anymore, so there's no need to save and restore GP. This
change breaks threaded applications linked against libc_r. Pull the
tier 2 card again: relink. This will link against libthr instead.


118024 25-Jul-2003 alc

MFi386 revision 1.416
Add vm object locking to pmap_prefault().

Note: powerpc and sparc64 do not implement this function.


117999 25-Jul-2003 marcel

Remove __aligned(16) from the definition of struct _ia64_fpreg. It's
a non-standard construct. Instead, redefine struct _ia64_fpreg as a
union and put a long double in it. On ia64 and for LP64, this is
defined by the ABI to have 16-byte alignment. For ILP32 a long double
has 4-byte alignment, but we don't support ILP32.

Note that the in-memory image of a long double does not match the in-
memory image of spilled FP registers. This means that one cannot use
the fpr_flt field to interpet the bits. For this reason we continue
to use an aggregate type.


117998 25-Jul-2003 marcel

Remove INVARIANT* and WITNESS. This makes the simulator much more
pleasant to use.


117993 25-Jul-2003 marcel

Move ia64_pa_access() from machdep.c to mem.c and declare it static.
It's only used in mem.c and cannot accidentally be used elsewhere
this way.


117984 25-Jul-2003 marcel

Disable the single-step trap on a debug related trap, including of
course the single-step trap itself.


117908 23-Jul-2003 marcel

We sloppily created an array for the high FP registers (f32-f127),
but this just created a weird inconsistency when porting gdb(1).
Instead, we name each high FP register seperately, like we do for
all the other registers.


117608 15-Jul-2003 marcel

Rename thread_siginfo to cpu_thread_siginfo.


117499 13-Jul-2003 marcel

Enable the high FP registers when we call the FPSWA handler and disable
them again afterwards. This fixes a disabled FP fault while in the FPSWA
handler.
While here, merge the FP fault and FP trap handling code to reduce code
duplication. Where code was different, it was not sure it should be.

Trigger case: ports/math/atlas


117467 12-Jul-2003 marcel

Add logic to trace across/over a trapframe. We have ABI markers in
our unwind information for functions that are entry points into the
kernel. When stepping to the next frame, the unwinder will let us
know when sych a marker was encountered. We use this to stop the
current unwind session, query the trapframe and restart a new
unwind session based on the new trapframe.

The implementation is a bit sloppy, but at this time there are
bigger fish to fry.


117437 11-Jul-2003 marcel

Add a body directive before the first instruction in epc_syscall().
This results in a zero length prologue and a body that covers the
whole function. This is more correct.


117436 11-Jul-2003 marcel

Remove a gratuitous align directive after the endp directive for
IVT entries.


117267 05-Jul-2003 marcel

Don't call malloc() and free() while in the debugger and unwinding
to get a stacktrace. This does not work even with M_NOWAIT when we
have WITNESS and is generally a bad idea (pointed out by bde@). We
allocate an 8K heap for use by the unwinder when ddb is active. A
stack trace roughly takes up half of that in any case, so we have
some room for complex unwind situations. We don't want to waste too
much space though. Due to the nature of unwinding, we don't worry
too much about fragmentation or performance of unwinding while in
the debugger. For now we have our own heap management, but we may
be able to leverage from existing code at some later time.

While here:
o Make sure we actually free the unwind environment after unwinding.
This fixes a memory leak.
o Replace Doug's license with mine in unwind.c and unwind.h. Both
files don't have much, if any, of Doug's code left since the EPC
syscall overhaul and the import of the unwinder.
o Remove dead code.
o Replace M_NOWAIT with M_WAITOK for all remaining malloc() calls.


117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


117161 02-Jul-2003 ru

The .s files were repo-copied to .S files.

Approved by: marcel
Repocopied by: joe


117142 02-Jul-2003 marcel

The use of SYSINIT requires the inclusion of <sys/kernel.h>


117139 01-Jul-2003 mux

Make this even closer to other busdma backends.


117133 01-Jul-2003 mux

Sync bounce pages support with the alpha backend. More precisely:
o use a mutex to protect the bounce pages structure.
o use a SYSINIT function to initialize the bounce pages structures
and thus avoid a race condition in alloc_bounce_pages().
o add support for the BUS_DMA_NOWAIT flag in bus_dmamap_load().
o remove obsolete splhigh()/splx() calls.
o remove printf() about incorrect locking in busdma_swi() and sync
busdma_swi() with the one of the alpha backend.
o use __FBSDID.


117129 01-Jul-2003 mux

Honor the boundary of the busdma tag when allocating bounce pages.
This was fixed in revision 1.5 of alpha/alpha/busdma_machdep.c and
was never fixed in other busdma backends using bounce pages.


117126 01-Jul-2003 scottl

Mega busdma API commit.

Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.

sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.

If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.

Reviewed by: tmm, gibbs


117045 29-Jun-2003 alc

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


117022 29-Jun-2003 alc

- Remove the calls to pmap_install() from pmap_object_init_pt(); they are
redundant. Discussed with: marcel
- MFi386: Add vm object locking to pmap_object_init_pt().


116971 28-Jun-2003 marcel

Implement cpu_set_upcall_kse(). Elementary testing shows that this
function behaves correctly in principle, but is not expected to be
100% complete. In any case, with this commit we have KSE ported
enough to start runtime testing with threaded applications and fix
whatever bugs or omissions we encounter. Yay!


116958 28-Jun-2003 davidxu

Add a machine depended function thread_siginfo, SA signal code
will use the function to construct a siginfo structure and use
the result to export to userland.

Reviewed by: julian


116907 27-Jun-2003 scottl

Do the first and mostly mechanical step of adding mutex support to the
bus_dma async callback scheme. Note that sparc64 does not seem to do
async callbacks. Note that ia64 callbacks might not be MPSAFE at the
moment. Note that powerpc doesn't seem to do async callbacks due to
the implementation being incomplete.

Reviewed by: mostly silence on arch@


116570 19-Jun-2003 marcel

Add TLS related relocation.


116510 18-Jun-2003 alc

Fix a performance bug in all of the various implementations of
uma_small_alloc(): They always zeroed the page regardless of what the
caller requested.


116361 15-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


116328 14-Jun-2003 alc

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


116324 14-Jun-2003 marcel

Remove kernel event tracing. The overhead is significant when running
under ski.


116227 12-Jun-2003 marcel

Make sure pcpu->pc_pcb is pointing to a 16-byte aligned address. The
PCB contains FP registers, whose alignment must be 16 bytes at least.
Since the PCB pointed to by pc_pcb is immediately after the PCPU
itself, round-up the size of thge PCPU to a multiple of 16 bytes. The
PCPU is page aligned.

This fixes a misalignment trap caused by stopping a CPU in a SMP
kernel, such as been done when entering the debugger.

Reported by: Alan Robinson <alan.robinson@fujitsu-siemens.com>


116188 11-Jun-2003 peter

GC unused cpu_wait() function


115999 08-Jun-2003 jmallett

Note that scbus is required for SCSI, not just "required" in general.

Submitted by: Edward Kaplan (tmbg37 on IRC)
Reviewed by: rwatson (in principle)


115937 07-Jun-2003 marcel

pmap_find_vhpt() has been observed to return a NULL pointer when
the caller assumes this to not happen by means of performing an
indirection without checking the return value. Add KASSERTs to
force a kernel with INVARIANTS to panic. This is a short-term
measure. The pmap code is scheduled to be overhauled.


115936 07-Jun-2003 marcel

If we get a fault in the gateway page, which would happen if we try
to deliver a signal and the RSE backing store has been exhausted or
the backing store pointer has been clobbered, we need to make sure
we call userret() and do_ast() when we exit from trap(). Not adjusting
the local variable 'user' in this case will prevent the faulty process
from being terminated and we end up in an infinite fault repetition.

Faulty process provided by: bento


115916 06-Jun-2003 marcel

Use TRAPF_USERMODE() to replace an equivalent check in trap(). While
here, amend the related comment.


115913 06-Jun-2003 marcel

Have TRAPF_USERMODE() take into account that the gateway page is not
always kernel space. It should be treated as user space when run with
user privileges (which is the case for the signal trampolines). This
fixes its only use in a KASSERT in subr_trap.c.


115859 04-Jun-2003 marcel

Fix the dreaded double counting that was present on alpha as well and
got fixed two weeks after the ia64 version was copied from the alpha
version (see rev 1.32 of sys/alpha/alpha/mem.c). As such, we were
missing the same continue as on alpha.

While here, add a default case for the device minor switch and do
some general style(9) cleanups.

WARNING: this file still has bugs. When reading from region 6 or
region 7, we don't validate the physical address. One can trivially
cause a machine check by trying to read from address 0xFFFFFFFFFFFFFFF0
or something that uses the unimplemented physical address bits.

Reported by: Alan Robinson <alan.robinson@fujitsu-siemens.com>


115858 04-Jun-2003 marcel

Change the second (and last) argument of cpu_set_upcall(). Previously
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.

Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.

This change officially marks the ability on ia64 for 1:1 threading.

Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64


115652 01-Jun-2003 marcel

Improve set_mcontext:
o Don't copy psr verbatim from the user supplied context. Only allow
userland to change the processor settings that are part of the user
mask.


115651 01-Jun-2003 marcel

Improve on cpu_set_upcall:
o Use pcb and tf for the new pcb and the new trapframe and use pcb0
for the old (current) pcb. The mix of pcb, pcb2 and tf was slightly
confusing.
o Don't define td->td_frame here. It has already been set previously
by cpu_thread_setup. Add a KASSERT to make sure pcb and tf are both
non-NULL.
o Make sure the number of dirty registers is 0 for the new thread.
There are no user registers on the backing store because we heven't
enter userland yet.


115605 01-Jun-2003 marcel

Implement cpu_thread_setup(). This is mostly the same as on i386,
except for the fact that trapframes have a size recorded in it
that we set here too. We need this for proper thread setup.

Pointed out by: mtm


115573 31-May-2003 marcel

Now that we have the signal trampolines in the gateway page and the
gateway page is considered kernel space, we can panic when we should
only SIGSEGV. Hence, add the additional constraint that for page
faults we also require running with kernel privileges. The gateway
page is the only kernel code running with user privileges, iso this
is a correct way to exclude the gateway page from kernel land.

We do not currently exclude the gateway page for other faults as it
is not always the right way to do it. Further tuning will happen on
a case by case bases.


115570 31-May-2003 marcel

Implement cpu_set_upcall(). Required by libthr and used by
thr_create(2). This implementation is so far only compile tested.
But since this is also the last of the functions required to
support libthr, we're now functionally complete (for some weird
definition of functionally; and complete). Runtime testing can
commence.


115566 31-May-2003 marcel

Implement set_mcontext() and get_mcontext(). Just as for sendsig() and
sigreturn(), we cheat and assume the preserved registers are still
on-chip and unmodified. This is actually the case, but more by accident
than by design. We need to use unwinding eventually or explicitly
compile the kernel in a way that the compiler steers clear from using
the preserved registers completely.


115564 31-May-2003 marcel

Make the regset pointers const pointers for the context restore functions.
This works better with set_mcontext() and is more precise in general.


115563 31-May-2003 marcel

Some ia32 related finetuning for the EPC syscall path:
o The SDM states that flushing the RSE in the cycle prior to the
call to ia32 code yields the best performance. We don't really
care to much about performance here, but we do the same anyway.
I'm being paranoia and conservative here.
o Only initialize the ia32 state registers, not the registers used
as scratch by the ia32 engine. This saves a couple of loads from
the trapframe, but also helps debugging: we don't clobber useful
debugging data (engineering hints :-)
o Make sure all general registers constituting ia32 state have been
initialized. If there's no useful to be loaded from the trapframe,
clear the register. This avoids accidentally leaking NaT bits.
o Make sure we set ar.k6 prior to clobbering ar.bspstore and also
set ar.k7 prior to setting sp. This fixes a race seen for ia64
native code as well (and previously fixed too).


115558 31-May-2003 marcel

Make sure we have all the dirty registers in user frames on the
backing store before we discard them. It is possible that we
enter the kernel (due to an execve in this case) with a lot of
dirty user registers and that the RSE has only partially spilled
them (to make room for new frames). We cannot move the backing
store pointer down (to discard user registers) when not all of
the user registers are on the backing store.
So, we flush the register stack IFF this happens. Unconditionally
doing the flush is too costly, because the condition in which we
need to flush is very rare.

This change appears to fix the SIGSEGV that sometimes happen for
newly executed processes and so far also appears to fix the last
of the corruption. It is possible, although not likely, that this
change prevents some other bug from happening, even though it is
itself not a fix. Hence the uncertainty. We'll know in a couple
of months I guess :-)


115416 30-May-2003 hmp

Rename BUS_DMAMEM_NOSYNC to BUS_DMA_COHERENT.

The current name is confusing, because it indicates to
the client that a bus_dmamap_sync() operation is not
necessary when the flag is specified, which is wrong.

The main purpose of this flag is to hint the underlying
architecture that DMA memory should be mapped in a coherent
way, but the architecture can ignore it. But if the
architecture does supports coherent mapping of memory, then
it makes bus_dmamap_sync() calls cheap.

This flag is the same as the one in NetBSD's Bus DMA.

Reviewed by: gibbs, scottl, des (implicitly)
Approved by: re@ (jhb)


115378 29-May-2003 marcel

Move the sysctls of the misalignment handler to where they belong
and use OID_AUTO instead of fixed IDs.

Approved by: re@ (blanket)


115376 29-May-2003 marcel

Fix what I think is a cut-n-paste bug: use OID_AUTO for the
print_usertrap sysctl instead of CPU_UNALIGNED_PRINT. The
latter is used already.

Approved by: re@ (blanket)


115344 27-May-2003 marcel

A flushrs must be the first in an instruction group.

Approved by: re@ (blanket)


115343 27-May-2003 scottl

Bring back bus_dmasync_op_t. It is now a typedef to an int, though the
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.

Approved by: re (bmah)


115342 27-May-2003 marcel

Have the unwinder allocate memory with M_NOWAIT. The unwinder is
used by DDB and we cannot know in advance whether it's save to
sleep. It often enough isn't. We may want to pre-allocate space
to cover the most common cases without having to use malloc at
all, but that requires some analysis. We leave that for later.

Approved by: re@ (blanket)


115341 27-May-2003 marcel

Fix fu{byte|word*} and su{byte|word*}:
o If the address was not within user space we jumped to fusufault
where we would clear pcb_onfault and return 0. There are two
bugs here:
1. We never got to the point where we assigned the address of
pcb_onfault to r15, which means that we would clobber some
random memory location, including I/O space or ROM.
2. We're supposed to return -1 on error.
o Make sure we have proper memory ordering for setting pcb_onfault,
doing the memory access to user space and clearing pcb_onfault.
For the fu* family of functions this means that we need a mf
instruction, because we don't have acquire semantics on stores
and release semantics on loads (hence st;ld cannot be ordered
without intermediate mf).

While here, implement casuptr() so that we are a (small) step
closer to supporting libthr and deobfuscate the non-implementation
of {f|s}uswintr.

Approved by: re@ (blanket)


115339 26-May-2003 marcel

Revision 1.99 of this file changed the allocation request from
VM_ALLOC_INTERRUPT to VM_ALLOC_SYSTEM. There was no mention of
this in commit log as it was considered harmless. Guess what:
it does harm. WITNESS showed that we can not safely grab the
page queue lock in vm_page_alloc() in all cases as we may have
to sleep on it. Revert the request to VM_ALLOC_INTERRUPT to
circumvent this. We panic if vm_page_alloc returns 0. I'm not
entirely happy about this, but we have bigger fish to fry.

Approved by: re@ (blanket)


115298 25-May-2003 marcel

Now that we define user mode as any IP address that isn't in the
kernel's VA regions, we cannot limit the use of break-based
syscalls to user mode only. The signal trampolines are in the
gateway page, which is mapped into the process address space in
region 5 and thus is kernel space.

We don't special case the gateway page here. Allow break-based
syscalls from anywhere in the kernel VA space.

Approved by: re@ (blanket)


115296 24-May-2003 marcel

Fix a source of instability specific to an EPC userland. We return
to userland with interrupts disabled until we restore PSR. However,
it has been observed that interrupts do actually happen before they
are enabled again. This is a bit surprising and I don't know yet
what's going on exactly. Nevertheless, the code was not crafted
carefully enough to allow interrupts to happen and we could
clobber the kernel stack of another thread when interrupts did
happen.

This is what happens: we restore the (memory) stack pointer (sp)
and the register stack base prior to restoring ar.k6 and ar.k7.
This is not a problem if interrupts don't happen between setting
sp/ar.bspstore and ar.k6/ar.k7. Alas, interrupts can happen.
Since sp/ar.bspstore already point to the userland stacks, we
need to switch to the kernel stack in interrupt. However, ar.k6
and ar.k7 have not been set, which means that we were switching
to some unrelated kstack and happily clobbered the trapframe
present there if the thread to which the kstack belonged was
in kernel mode or otherwise we could have our trapframe clobbered
if that other thread enters the kernel. Nasty either way.

We now carefully restore ar.k6 prior to restoring ar.bspstore and
likewise for ar.k7 and sp. All we need is the guarantee that an
interrupt does not clobber ar.k6 or ar.k7 before we're back in
userland. That has been achieved by restoring ar.k6/ar.k7
unconditionally (see exception.s)

While here, remove the disabling of interrupts on EPC entry. It
was added as a way to "resolve" the crashes until it was understood
what was going on. I think I achieved the latter, so we can remove
the patch. Note that setting up a trapframe with interrupts
enabled has it's own share of corner cases, but it's better to
properly fixed those than to keep a mostly wrong patch around
because we're afraid to remove it...

Approved by: re@ (blanket)


115295 24-May-2003 marcel

Be more careful how we restore interrupts. Don't rewrite most of the
PSR only to achieve setting PSR.i back to it's previous value. It
makes it impossible to change any of the 30+ other unrelated bits
when done between intr_disable() and intr_restore(). That's bad.

Instead have intr_disable() return 1 when interrupts were previously
enabled and 0 otherwise and only enable interrupts in intr_restore()
when given a non-0 value.

This change specifically disallows using intr_restore() to disable
interrupts. The reason is simple: interrupts only need to be restored
after they are being disabled, which means that intr_restore() is
called with interrupts disabled and we only need to enable them if
they were previously enabled.

This change does not fix any bugs, other than that it bugged me...

Approved by: re@ (blanket)


115294 24-May-2003 marcel

Consistently us the same metric to differentiate between kernel mode
and user mode. We need to take into account that the EPC syscall path
introduces a grey area in which one can argue either way, including a
third: neither.

We now use the region in which the IP address lies. Regions 5, 6 and 7
are kernel VA regions and if the IP lies any any of those regions we
assume we're in kernel mode. Hence, we can be in kernel mode even if
we're not on the kernel stack and/or have user privileges. There're
gremlins living in the twilight zone :-)

For the EPC syscall path this particularly means that the process
leaves user mode the moment it calls into the gateway page. This
makes the most sense because from a process' point of view the call
represents a request to the kernel for some service and that service
has been performed if the call returns. With the metric we picked,
this also means that we're back in user mode IFF the call returns.

Approved by: re@ (blanket)


115291 24-May-2003 marcel

Unconditionally restore ar.k7 (memory stack) and ar.k6 (register stack)
when returning from an interrupt. Both registers are used on interrupt
to switch to the right kernel stack, but other than that they are not
used. This means we only have to make sure they contain proper values
while in user mode. As such, we conditionally restored these registers
based on whether we returned to userland or not. A nice property of
conditionally restoring ar.k6 and ar.k7 is that it introduces two
invariants: ar.k6 always points to the bottom of the kernel stack and
ar.k7 always points to the top of the kernel stack (immediately below
the PCB we have there).

However, the EPC syscall path introduces an irregularity: there's no
"thin red line" between user and kernel. There's a grey area that's a
couple of instructions wide. Any interruption in that grey area is
bound to see an inconsistent state. One such state is that we're in
kernel space for all practical purposes, but we still need to have
ar.k6 and ar.k7 restored as if we're in userland.

Thus: restore ar.k6 and ar.k7 unconditionally at the cost of losing
a valuable invariant. Both registers now hold the extend of the
usable portion of the kernel stack at any interrupt nesting, which
when in userland mean the bottom and the top of the kstack.


115276 24-May-2003 marcel

Fix an alpha inheritance bug:

On alpha, PAL is involved in context management and after wiring
the CPU (in alpha_init()) a context switch was performed to tell
PAL about the context. This was bogusly brought over to ia64
where it introduced bugs, because we restored the context from
a mostly uninitialized PCB.

The cleanup constitutes:
o Remove the unused arguments from ia64_init().
o Don't return from ia64_init(), but instead call mi_startup()
directly. This reduces the amount of muckery in assembly and
also allows for the next bullet:
o Save our currect context prior to calling mi_startup(). The
reason for this is that many threads are created from thread0
by cloning the PCB. By saving our context in the PCB, we have
something sane to clone. It also ensures that a cloned thread
that does not alter the context in any way will return to
the saved context, where we're ready for the eventuality with
a nice, user unfriendly panic().

The cleanup fixes at least the following bugs:
o Entering mi_startup() with the RSE in enforced lazy mode.
o Re-execution of ia64_init() in certain "lab" conditions.

While here, add proper unwind directives to __start() so that
the unwind knows it has reached the bottom of the (call) stack.

Approved by: re@ (blanket)


115274 23-May-2003 marcel

Fix a (new) source of instability:

When interrupting a kernel context, we don't need to switch stacks
(memory nor register). As such, we were also not restoring the
register stack pointer (ar.bspstore). This, however, fails to be
valid in 1 situation: when we interrupt a register stack switch as
is being done in restorectx(). The problem is that restorectx()
needs to have ar.bsp == ar.bspstore before it can assign the new
value to ar.bspstore. This is achieved by doing a loadrs prior to
assigning to ar.bspstore. If we take an interrupt in between the
loadrs and the assignment and we don't make sure we restore the
ar.bspstore prior to returning from the interrupt, we switch
stacks with possibly non-zero dirty registers, which means that
the new frame pointer (ar.bsp) will be invalid.

So, instead of jumping over the restoration of the register frame
pointer and related registers, we conditionalize it based on whether
we return to kernel context or user context. A future performance
tweak is possible by only restoring ar.bspstore when returning to
kernel mode *and* when the RSE is in enforced lazy mode. One cannot
assume ar.bsp == ar.bspstore if the RSE is not in enforced lazy mode
anyway.

While here (well, not quite) don't unconditionally assign to
ar.bspstore in exception_save. Only do that when we actually switch
stacks. It can only harm us to do it unconditionally.

Approved by: re@ (blanket)


115270 23-May-2003 marcel

In swapctx(), put the RSE in enforced lazy mode before we flush the
register stack. There's nothing really wrong with flushing before
putting the RSE in enforced lazy mode, provided you don't depend on
ar.bspstore being equal to ar.bsp when the RSE has been put in
enforced lazy more. The small window between the flush and setting
the RSE may be sufficient to have the RSE eagerly increase the dirty
region (and hence cause ar.bspstore != ar.bsp) or have an interrupt
that may even get the laziest RSE to do something.

Anyway: we don't depend on ar.bspstore being equal to ar.bsp, so
nothing was and is broken. But the code was non-intuitive and
easily confuses. This is a source of future bugs.

Note: the advantage of not depending on ar.bspstore is that there's
some recilience against an interrupted flushrs. Clobbering is limited
to stacked register contents only, not to RSE address clobbering.

Approved: re@ (blanket)


115179 20-May-2003 marcel

o Fix a definite bogon: the dirty bity fault, instruction access
failt and data access fault install the PTE in question into
the VHPT table. However, a post-increment was missing and we
wrote the raw PTE data into the pagesize/access key field.
This leaves a corrupt VHPT entry.
o While here, remove the explicit cache purge. Insertion into
the translation implicitly purges any overlapping entries.
o Make sure there's a cycle break between the itc and the rfi.
o Whitespace fixes.


115178 20-May-2003 marcel

Rename the "IA64 ITC" counter to "ITC" counter. We don't call the
"TSC" counter on i386 "I386 TSC".

Approved by: re@ (blanket)


115176 20-May-2003 marcel

Prevent corruption of the VHPT collision chain by protecting it with
a mutex. The only volatile chain operations are insertion and deletion
but since updating an existing PTE also updates the VHPT entry itself,
and we have the VHPT mutex in both other cases, we also lock when we
update an existing PTE even though no chain operation is involved.
Note that we perform the insertion and deletion careful enough that
we don't need to lock traversals. If we need to lock traversals, we
also need to lock from the exception handler, which we can't without
creating a trapframe.

We're now able to withstand a -j8 buildworld. More work is needed to
withstand Murphy fields. In other words: we still have a bogon...

Approved by: re@ (blanket)


115164 19-May-2003 kan

sys/sys/limits.h:

- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).

sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:

- Style fixes.

Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)


115152 19-May-2003 marcel

Turn pmap_install_pte() into a critical section. We better not get
interrupted while writing into the VHPT table. While here, make sure
memory accesses a properly ordered. Tag invalidation must happen
first so that the hardware VHPT walker will not be able to match
this entry while we're updating it and we have to make sure the new
new tag gets written only after the PTE is completely updated.

Approved by: re (blanket)


115149 19-May-2003 marcel

Unconditionally set pcb_current_pmap. WIP versions of the code
previously committed cleared pcb_current_pmap prior to changing
the region registers, but that was removed before committing.
Since we don't normally (at all?) pass a NULL pointer, the bug
was mostly harmless. Fix it while I'm here...

I'm here because we need to have data serialization after writing
to the region registers. Not doing so was likely the cause of the
hangs we were experiencing. General exceptions in cpu_switch may
also be caused by the lack of serialization.

Approved by: re (blanket)


115148 19-May-2003 marcel

pmap_install() needs to be atomic WRT to context switching. Protect
switching user regions (region 0-4) with schedlock. Avoid unnecessary
recursion on schedlock by moving the core functionality to another
function (pmap_switch()) where we assert schedlock is held. Turn
pmap_install() into a wrapper that grabs schedlock. This minimizes
the number of callsites that need to be changed.
Since we already have schedlock in cpu_switch() and cpu_throw(),
have them call pmap_switch() directly. These were also the only two
calls to pmap_install() outside pmap.c, so make pmap_install() static
and remove its prototype from pmap.h

Approved by: re (blanket)


115094 17-May-2003 marcel

Remove unused files. cpu_switch() and cpu_throw(), normally in swtch.s,
can be found in machdep.c.

Approved: re@


115084 16-May-2003 marcel

Revamp of the syscall path, exception and context handling. The
prime objectives are:
o Implement a syscall path based on the epc inststruction (see
sys/ia64/ia64/syscall.s).
o Revisit the places were we need to save and restore registers
and define those contexts in terms of the register sets (see
sys/ia64/include/_regset.h).

Secundairy objectives:
o Remove the requirement to use contigmalloc for kernel stacks.
o Better handling of the high FP registers for SMP systems.
o Switch to the new cpu_switch() and cpu_throw() semantics.
o Add a good unwinder to reconstruct contexts for the rare
cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o The EPC syscall doesn't preserve registers it does not need
to preserve and places the arguments differently on the stack.
This affects libc and truss.
o The address of the kernel page directory (kptdir) had to
be unstaticized for use by the nested TLB fault handler.
The name has been changed to ia64_kptdir to avoid conflicts.
The renaming affects libkvm.
o The trapframe only contains the special registers and the
scratch registers. For syscalls using the EPC syscall path
no scratch registers are saved. This affects all places where
the trapframe is accessed. Most notably the unaligned access
handler, the signal delivery code and the debugger.
o Context switching only partly saves the special registers
and the preserved registers. This affects cpu_switch() and
triggered the move to the new semantics, which additionally
affects cpu_throw().
o The high FP registers are either in the PCB or on some
CPU. context switching for them is done lazily. This affects
trap().
o The mcontext has room for all registers, but not all of them
have to be defined in all cases. This mostly affects signal
delivery code now. The *context syscalls are as of yet still
unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)


115063 16-May-2003 marcel

o In pmap_install, don't prevent switching the pmap if we're
switching to kernel_pmap. The pmap is not special enough.
o Clear the active bit on the pmap we're switching out.
o Fix some nearby style(9) bugs.

Approved by: re@


115059 16-May-2003 marcel

Indent a comment. This makes 1.100.

Still approved by: re@ (blanket)


115058 16-May-2003 marcel

Turn pmap_growkernel() into a critical section. While here, initialize
kernel_vm_end in pmap_bootstrap. Don't delay the initialization until
we need to grow the kernel VM space. This BTW happens twice before
we enter either single- or multi-user mode. Don't adjust kernel_vm_end
while growing based on whether the KPT contains a non-NULL entry. We
trust kernel_vm_end to be correct and we make sure it's still correct
after growing.
Define virtual_avail and virtual_end in terms of VM_MIN_KERNEL_ADDRESS
and VM_MAX_KERNEL_ADDRESS (resp). Don't hardcode region knowledge.


115057 16-May-2003 marcel

Revamp the RID allocation code:
o Limit the size of the region ID map to 64KB. This gives a bitmap
that is large enough to keep track of 2^19 numbers. The minimal map
size is 32KB. The reason we limit the map size is that processor
models may have implemented a 24-bit region ID, which would give
a 2MB bitmap while the maximum number of allocations is always
less than PID_MAX*5, which is less than 2^19.
o Allocate all region IDs up-front. The slight downside of reserving
more RIDs then a process needs (3 for ia64 native and 1 for ia32)
is preferable over the call to pmap_ensure_rid() where RIDs are
allocated on demand. On SMP systems this may lead to a race
condition.
o When allocating a region ID, don't use arc4random(). We're not
interested in randomness or uniform distribution across the
spectrum. We only need uniqueness. Random numbers may easily
collide when the number of allocated RIDs is high, creating a
possibly unbounded retry rate.


115056 16-May-2003 marcel

Move the conditional definition of KSTACK_MAX_PAGES up ahead where
it's more visible.

Approved by: re@ (blanket)


115019 15-May-2003 marcel

This file creates register sets based on the runtime specification.
The advantage of using register sets is that you don't focus on each
register seperately, but instead instroduce a level of abstraction.
This reduces the chance of errors, and also simplifies the code.
The register sers form the basis of everything register.
The sets in this file are:

struct _special
contains all of the control related registers, such as instruction
pointer and stack pointer. It also contains interrupt specific registers
like the faulting address. The set is roughly split in 3 groups. The
first contains the registers that define a context or thread. This is
the only group that the kernel needs to switch threads. The second group
contains registers needed in addition to the first group needed to switch
userland threads. This group contains the thread pointer and the FP control
register. The third group contains those registers we need for execption
handling and are used on top of the first two groups.

struct _callee_saved, struct _callee_saved_fp
These sets contain the preserved registers, including the NaT after
spilling. The general registers (including branch registers) are
seperated from the FP registers for ptrace(2).

struct _caller_saved, struct _caller_saved_fp
These sets contain the scratch registers based on SDM 2.1, This means that
both ar.csd and ar.ccd are included here, even though they contain ia32
segment register descriptions. We keep seperate NaT bits for scratch and
preserved registers, because they are never saved/restored at the same
time.

struct _high_fp
The upper 96 FP registers that can be enabled/disabled seperately on
the CPU from the lower 32 FP registers. Due to the size of this set,
we treat them specially, even though they are defined as scratch
registers.

CVS ----------------------------------------------------------------------


115018 15-May-2003 marcel

This file contains elementary context related functions used to
save and restore "sets" of registers in various places.
The restorectx and swapctx functions are used by cpu_switch()
and deal with the special registers, as well as the preserved
registers.
The *callee_saved* functions are used to save and restore the
preserved registers (integer and floating-point). They are
useful for signal delivery and ptrace support.
The save_high_fp and restore_high_fp functions are used to
"load" and "unload" to and from the CPU as part of lazy context
switching.
The ia32 specific context functions have been kept with the ia32
code.

Approved by: re@ (blanket)


115017 15-May-2003 marcel

This file contains the code that implements the syscall path based
on the epc instruction. The epc instruction, given the permissions
of the page in which the epc is located, allows the privilege level
to be increased with little or no overhead. The previous privilege
level is recorded in the current frame marker and is restored by
a regular (function) return.
Since the epc instruction has to live in a page with non-standard
properties, we hardwire a "gateway" page in the address space. The
address of the gateway page is exported to userland in ar.k7. This
allows us to rewire the page without breaking the ABI.
The syscall stubs in libc are regular function calls that slightly
differ from the normal runtime. The difference is mostly to simplify
the stubs themselves by by moving some of the logic to the kernel.
The libc stubs call into the gateway page (offset 0), from where the
kernel trampolines to the code that sets up a minimal trapframe and
arranges to execute from the kernel stack.
The way back is basicly the same. The kernel returns to the gateway
page, whereby privilege is dropped, and jumps back to the syscall
stub.
Only the special registers are saved in the trapframe. None of the
scratch registers are preserved and since the kernel follows the
same runtime model, none of the preserved registers are saved.
Future enhancements can include the implementation of lightweight
syscalls, where kernel functions are performed without setting up
a trapframe. Good candidates are the *context syscalls for example.

Now that there's a gateway page from which code can be executed in
a non-privileged context, we also have the ideal place to put the
signal trampolines. By moving the signal trampolines from the user
stack to the gateway page, we open up the doors to unexecutable
stacks. The gateway page contains signal trampolines for both the
"legacy" break-based syscall code and the new and improved epc-
based syscall code.

Approved: re@ (blanket)


114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


114678 04-May-2003 kan

Style fixes.
Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they
were marked for deprecation ever since SUSv1 at least.
Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is
supported.
Restore a lost comment in MI _limits.h file and remove it from
sys/limits.h where it does not belong.


114616 03-May-2003 marcel

Fix c99 victim: the accepted character '0 most now be types as '0'.


114553 02-May-2003 marcel

Option KADB does not exist. It came from alpha, where it still exists.


114347 30-Apr-2003 marcel

Kill MID_MACHINE, its a.out specific, the only platform that supports
it is i386. All of the other platforms should remove it too.
-- peter@


114305 30-Apr-2003 jhb

Range check the syscall number before looking it up in the syscallnames[]
array.

Submitted by: pho


114216 29-Apr-2003 kan

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


114208 29-Apr-2003 marcel

Revamp the newbus functions:
o do not use the in* and out* functions. These functions are used by
legacy drivers and thus must have ia32 compatible behaviour. Hence,
they need to have fences. Using these functions for newbus would
then pessimize performance.
o remove the conditional compilation of PIO and/or MEMIO support. It's
a PITA without having any significant benefit. We always support them
both. Since there are no I/O ports on ia64 (they are simulated by the
chipset by translating memory mapped I/O to predefined uncacheable
memory regions) the only difference between PIO and MEMIO is in the
address calculation. There should be enough ILP that can be exploited
here that making these computations compile-time conditional is not
worth it. We now also don't use the read* and write* functions.
o Add the missing *_8 variants. They were missing, although not missed.
It's for completeness.
o Do not add the fences that were present in the low-level support
functions here. We're using uncacheable memory, which means that
accesses are in program order. Change the barrier implementation
to not only do a memory fence, but also an acceptance fence. This
should more reliably synchronize drivers with the hardware. The
memory fence enforces ordering, but does not imply visibility (ie
the access does not necessarily have happened). This is what the
acceptance deals with.

cpufunc.h cleanup:
o Remove the low-level memory mapped I/O support functions. They are
not used. Keep the low-level I/O port access functions for legacy
drivers and add fences to ensure ia32 compatibility.
o Remove the syscons specific functions now that we have moved the
proper definitions where they belong.
o Replace the ia64_port_address() and ia64_memory_address() functions
with macros. There's a bigger change inline functions get inlined
when there aren't function callsi and the calculations are simply
enough to do it with macros.

Replace the one reference to ia64_memory address in mp_machdep.c to
use the macro.


114029 25-Apr-2003 jhb

- Push down Giant into the sysarch() calls that still need Giant.
- Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is
invalid.


114017 25-Apr-2003 jhb

Regen.


114016 25-Apr-2003 jhb

Oops, the thr_* and jail_attach() syscall entries should be NOPROTO rather
than STD.


113998 25-Apr-2003 deischen

Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared. The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets. Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code. The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by: jake
Reviewed by: bde (months ago)


113989 24-Apr-2003 jhb

Regen.


113987 24-Apr-2003 jhb

Fix the thr_create() entry by adding a trailing \. Also, sync up the
MP safe flag for thr_* with the main table.


113941 23-Apr-2003 kan

Add a new sys/limits.h file which in turn depends on machine/_limits.h
to get actual constant values. This is in preparation for machine/limits.h
retirement.

Discussed on: standards@
Submitted by: Craig Rodrigues <rodrigc@attbi.com> (*)
Modified by: kan


113859 22-Apr-2003 jhb

- Replace inline implementations of sigprocmask() with calls to
kern_sigprocmask() in the various binary compatibility emulators.
- Replace calls to sigsuspend(), sigaltstack(), sigaction(), and
sigprocmask() that used the stackgap with calls to the corresponding
kern_sig*() functions instead without using the stackgap.


113833 22-Apr-2003 davidxu

Remove single threading detecting code, these code really should be
replaced by thread_user_enter(), but current we don't want to enable
this in trap.


113831 22-Apr-2003 marcel

Don't use the tpa instruction to implement pmap_kextract. The tpa
instruction requires that a translation is present in the TC. This
may trigger a TLB miss and a subsequent call to vm_fault().
This implementation is deliberately non-inline for debugging and
profiling purposes. Partial or full inlining should eventually be
done.

Valuable insights by: jake


113803 21-Apr-2003 simokawa

Add FireWire drivers to GENERIC.


113686 18-Apr-2003 jhb

Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().


113540 16-Apr-2003 marcel

Add the EHCI host controller.


113350 10-Apr-2003 mux

I deserve a big pointy hat for having missed all those references
to bus_dmasync_op_t in my last commit.


113347 10-Apr-2003 mux

Change the operation parameter of bus_dmamap_sync() from an
enum to an int and redefine the BUS_DMASYNC_* constants as
flags. This allows us to specify several operations in one
call to bus_dmamap_sync() as in NetBSD.


113275 09-Apr-2003 mike

o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.

Reviewed by: rwatson, tjr


113255 08-Apr-2003 des

Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header. Use it instead of hand-
rolled versions wherever applicable.

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


113247 08-Apr-2003 marcel

Remove COMPAT_FREEBSD4. It's impossible because FreeBSD 4 does not
run on ia64 at all.


113181 06-Apr-2003 marcel

Remove the 32KB VHPT section from the kernel image. We don't really
use it because we allocate a VHPT based on the size of the physical
memory and even if the allocated VHPT is 32KB, we don't use the in-
image section for it. Since the VHPT must be naturally aligned, we
save 48K on average (due to alignment).
Consequently, we start off with the VHPT disabled (it is assumed
the VHPT is disabled because the EFI loader runs without memory
address translation and thus has no need to setup the VHPT). It's
probably a good idea to explicitly disable the VHPT if we make the
use of the VHPT optional.


113160 06-Apr-2003 marcel

Also set the access bit in the PTE when we get a data dirty bit fault.
This avoids an immediate access bit fault when we serviced the dirty
bit fault in case the access bit is unset. This typically happens for
newly allocated memory that's being zeroed and thus very common.


113140 05-Apr-2003 marcel

Include <geom/geom_disk.h> and stop including <sys/disk.h>. The
former gives us 'struct disk'.


113090 04-Apr-2003 des

Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector. Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.


112946 01-Apr-2003 phk

Use bioq_flush() to drain a bio queue with a specific error code.
Retain the mistake of not updating the devstat API for now.

Spell bioq_disksort() consistently with the remaining bioq_*().

#include <geom/geom_disk.h> where this is more appropriate.


112908 01-Apr-2003 jeff

- Add thr and umtx system calls.


112898 01-Apr-2003 jeff

- Define a new md function 'casuptr'. This atomically compares and sets
a pointer that is in user space. It will be used as the basic primitive
for a kernel supported user space lock implementation.
- Implement this function in x86's support.s
- Provide stubs that return -1 in all other architectures. Implementations
will follow along shortly.

Reviewed by: jake


112896 31-Mar-2003 jeff

- Add a placeholder for sigwait


112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


112883 31-Mar-2003 jeff

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


112882 31-Mar-2003 jeff

- Use sigexit() instead of twiddling the signal mask, catch, ignore, and
action bits to allow SIGILL to work as expected. This brings this file in
line with other architectures.


112721 27-Mar-2003 das

Correct LDBL_* constants based on values from i386.


112569 25-Mar-2003 jake

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


112498 22-Mar-2003 ru

Remove bitrot associated with `maxusers'.

Submitted by: bde


112436 20-Mar-2003 mux

Use atomic operations to increment and decrement the refcount
in busdma tags. There are currently no tags shared accross
different drivers so this isn't needed at the moment, but it
will be required when we'll have a proper newbus method to get
the parent busdma tag.


112312 16-Mar-2003 jake

Made the prototypes for pmap_kenter and pmap_kremove MD. These functions
are machine dependent because they are not required to update the tlb when
mappings are added or removed, and doing so is machine dependent.
In addition, an implementation may require that pages mapped with pmap_kenter
have a backing vm_page_t, which is not necessarily true of all physical
pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the
physical address in order to make this requirement clear.


112235 14-Mar-2003 mux

Bah, get it right this time and add sys/lock.h before sys/mutex.h.


112215 14-Mar-2003 mux

Oops, add missing includes. Pass me the pointy hat.

Reported by: jake


112196 13-Mar-2003 mux

Grab Giant around calls to contigmalloc() and contigfree() so
that drivers converted to be MP safe don't have to deal with it.


112195 13-Mar-2003 mux

Memory allocated with contigmalloc() should be freed with
contigfree(), not with free().


112051 10-Mar-2003 marcel

Fix two rounds of breakages and cleanup. Remove the sccdebug sysctl
while I'm here and garbage collect dead code (ssc_clone). Define
d_maxsize as DFLTPHYS for now because that's what it will be if we
don't define it.


111979 08-Mar-2003 phk

Centralize the devstat handling for all GEOM disk device drivers
in geom_disk.c.

As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.


111897 05-Mar-2003 marcel

Fix threaded applications on ia64 that are linked dynamicly. We did
not save (restore) the global pointer (GP) in the jmpbuf in setjmp
(longjmp) because it's not needed in general. GP is considered a
scratch register at callsites and hence is always restored after a
call (when it's possible that the call resolves to a symbol in a
different loadmodule; otherwise GP does not have to be saved and
restored at all), including calls to setjmp/longjmp. There's just
one problem with this now that we use setjmp/longjmp for context
switching: A new context must have GP defined properly for the
thread's entry point. This means that we need to put GP in the
jmpbuf and consequently that we have to restore is in longjmp.
This automaticly requires us to save it as well.

When setjmp/longjmp isn't used for context switching, this can be
reverted again.


111894 05-Mar-2003 marcel

ABI breaker: Move the J_SIGMASK field in the jmpbuf before
the J_SIG0 field. While here, rename J_SIG0 to J_SIGSET and
remove J_SIG1. The main reason for this change is that the
128-bit sigset_t is now aligned on a 16-byte boundary, which
allows us to use 16-byte atomic loads and stores on CPUs that
support it. The removal of J_SIG1 is done to avoid confusion:
it is never accessed and should not be. Renaming J_SIG0 to
J_SIGSET is the icing on the cake that's better done now than
later.


111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


111697 01-Mar-2003 alc

MFi386 revision 1.88
Remove some long unused declarations.


111587 27-Feb-2003 davidxu

Needn't kse.h


111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


111524 26-Feb-2003 mux

Correctly set BUS_SPACE_MAXSIZE in all the busdma backends.
It was bogusly set to 64 * 1024 or 128 * 1024 because it was
bogusly reused in the BUS_DMAMAP_NSEGS definition.


111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


111194 20-Feb-2003 phk

Change the console interface to pass a "struct consdev *" instead of a
dev_t to the method functions.

The dev_t can still be found at struct consdev *->cn_dev.

Add a void *cn_arg element to struct consdev which the drivers can use
for retrieving their softc.


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111036 17-Feb-2003 julian

Fix missed patch in last commit


111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


111031 17-Feb-2003 marcel

Define _ALIGNBYTES to be 15. This should have been done right away.


111030 17-Feb-2003 marcel

Print two new processor features:
o Spontaneous deferral (A feature required by dutch railways :-)
o 16-byte atomic operations (ld, st, cmpxchg)


111028 17-Feb-2003 jeff

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


111024 17-Feb-2003 jeff

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


111002 16-Feb-2003 phk

Remove #include <sys/dkstat.h>


110959 15-Feb-2003 marcel

Fix misuse of Maxmem in the calculation of the VHPT size. Maxmem
is already in pages, so we should not convert from bytes to pages.
The result of this bug was bad scaling of the VHPT relative to the
available memory.

Submitted by: Arun Sharma <arun@sharma-home.net>


110831 13-Feb-2003 obrien

Fix the style of the SCHED_4BSD commit.


110784 13-Feb-2003 alc

MFi386
Remove kptobj. Instead, use VM_ALLOC_NOOBJ.


110566 08-Feb-2003 mike

Implement fpclassify():
o Add a MD header private to libc called _fpmath.h; this header
contains bitfield layouts of MD floating-point types.
o Add a MI header private to libc called fpmath.h; this header
contains bitfield layouts of MI floating-point types.
o Add private libc variables to lib/libc/$arch/gen/infinity.c for
storing NaN values.
o Add __double_t and __float_t to <machine/_types.h>, and provide
double_t and float_t typedefs in <math.h>.
o Add some C99 manifest constants (FP_ILOGB0, FP_ILOGBNAN, HUGE_VALF,
HUGE_VALL, INFINITY, NAN, and return values for fpclassify()) to
<math.h> and others (FLT_EVAL_METHOD, DECIMAL_DIG) to <float.h> via
<machine/float.h>.
o Add C99 macro fpclassify() which calls __fpclassify{d,f,l}() based
on the size of its argument. __fpclassifyl() is never called on
alpha because (sizeof(long double) == sizeof(double)), which is good
since __fpclassifyl() can't deal with such a small `long double'.

This was developed by David Schultz and myself with input from bde and
fenner.

PR: 23103
Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
(significant portions)
Reviewed by: bde, fenner (earlier versions)


110335 04-Feb-2003 harti

Fix a problem in bus_dmamap_load_{mbuf,uio} when the first mbuf or the first
uio segment is empty. In this case no dma segment is create by
bus_dmamap_load_buffer, but the calling routine clears the first flag.
Under certain combinations of addresses of the first and second mbuf/uio
buffer this leads to corrupted DMA segment descriptors. This was already
fixed by tmm in sparc64/sparc64/iommu.c.

PR: kern/47733
Reviewed by: sam
Approved by: jake (mentor)


110296 03-Feb-2003 jake

Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway. This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by: jhb, tmm
Tested on: i386, sparc64


110255 03-Feb-2003 marcel

Don't use the 'c' partition for mounting root. A disklabel is very
likely not present under the simulator. If multiple partitions are
present on the virtual disk, then the 'a' partition would be the
most logical choice. Nowadays partitions are GPT based, which would
make the assumption of a disklabel even more questionable. Given
all the possible scenarios, assuming a raw "device" seems best.


110232 02-Feb-2003 alfred

Consolidate MIN/MAX macros into one place (param.h).

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


110229 02-Feb-2003 phk

We don't need sscopen() and sscclose().

Register sscstrategy directly, instead of using a cdevsw{} for the purpose.

Tested by: marcel


110227 02-Feb-2003 marcel

Export IA32 from opt_ia32.h to assembly so that we can eliminate
saving and restoring ia32 specific registers when switching
context and ia32 support has not been compiled-in. The primary
reason for this change is that one of the ia32 registers (ar.fcr)
is wrongly marked as invalid by the simulator. Now that we avoid
using the register when possible, usability is improved. The
secundary reason is that it saves us 7 loads and stores.

Note that the PCB will continue to have room for these registers,
irrespective of the IA32 option. There are no benefits that make
it worthwhile.


110211 01-Feb-2003 marcel

Remove special casing for running in the simulator from the kernel
and instead add platform, firmware and EFI stubs to the loader.
The net effect of this change is that besides a special console and
disk driver, the kernel has no knowledge of the simulator. This has
the following advantages:
o Simulator support is much harder to break,
o It's easier to make use of more feature complete simulators.
This would only need a change in the simulator specific loader,
o Running SMP kernels within the simulator. Note that ski at this
time does not simulate IPIs, so there's no way to start APs.

The platform, firmware and EFI stubs describe the following hardware:
o 4 CPU Itanium,
o 128 MB RAM within the 4GB address space,
o 64 MB RAM above the 4GB address space.

NOTE: The stubs in the skiloader describe a machine that should in
parts be defined by the simulator. Things like processor interrupt
block and AP wakeup vector cannot be choosen at random because they
require interpretation by the simulator. Currently the simulator is
ignorant of this.

This change introduces an unofficial SSC call SSC_SAL_SET_VECTORS
which is ignored by the simulator.

Tested with: ski (version 0.943 for linux)


110202 01-Feb-2003 joe

Put replace spaces with tabs in keeping with the rest of the file.


110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


110084 30-Jan-2003 phk

Remove D_CANFREE from sscdisk.c.

I belive it got here by copy&paste and I see no signs in the source
code that BIO_DELETE was dealt with correctly and can only wonder
what kind of trouble this may have caused.


109911 27-Jan-2003 julian

Unbreak SMP cases for these architectures.
statclock_process() changed arguments.
note: it may be worth checking if curkse is needed on these architectures..
(and if so, why?)


109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


109865 26-Jan-2003 jeff

- Introduce the SCHED_ULE and SCHED_4BSD options for compile time selection
of the scheduler.
- Add SCHED_4BSD as the scheduler for all kernel config files in cvs.


109799 24-Jan-2003 dfr

Fix pmap_extract so that it doesn't panic if the user types
'cat /proc/pid/map'

Submitted by: Arun Sharma <arun.sharma@intel.com>


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109615 21-Jan-2003 jeff

- Add a VM_WAIT in the appropriate cases where vm_page_alloc() fails and flags
indicate that uma_small_alloc should not. This code should be refactored so
that there is not so much cross arch duplication.

Reviewed by: jake
Spotted by: tmm
Tested on: alpha, sparc64
Pointy hat to: jeff and everyone who cut and pasted the bad code. :-)


109605 21-Jan-2003 jake

Resolve relative relocations in klds before trying to parse the module's
metadata. This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by: Hartmut Brandt <brandt@fokus.gmd.de>
PR: 46732
Tested on: alpha (obrien), i386, sparc64


109558 20-Jan-2003 phk

We need neither <sys/diskslice.h> nor <sys/disklabel.h> here.


109490 18-Jan-2003 mux

Don't try to free() map in bus_dmamap_destroy() when it's
set to &nobounce_dmamap. A similar bug was fixed by wpaul
in revision 1.19 of sys/alpha/alpha/busdma_machdep.c.


109342 16-Jan-2003 dillon

Merge all the various copies of vm_fault_quick() into a single
portable copy.


109340 15-Jan-2003 dillon

Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy. Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).


108759 06-Jan-2003 marcel

Move ia64_sapics and ia64_sapic_count from interrupt.c to sapic.c
and declare them extern in interrupt.c. This eliminates the need
for ia64_add_sapic(), which is called from sapic.c.
While here, reformat ia64_enable() in interrupt.c to improve
indentation and add a sysctl (machdep.apic) to dump the I/O APIC
entries currently programmed into all I/O APICs. The latter can
help analyze interrupt problems.
Note that the sysctl is not intended as a userland (software)
interface. It may be changed in the future to include counters
so that vmstat -i can make use of it. It may also be removed...


108757 06-Jan-2003 peter

Move the itm reload to a single place rather than having two identical
copies of the reload. Note that we use the precomputed itm_reload value
so that we can avoid a division in the kernel. The ia64 cpu does not
have integer divide, so this would have been done by a floating point
operation.


108756 06-Jan-2003 marcel

Replace the hardcoding of 255 as the clock interrupt vector with
CLOCK_VECTOR and define it as 254, not 255. Vector 255 is already
in use as the AP wakeup vector on the HP rx2600.

This needs to be made more dynamic. The likelyhood of vector 254
being in use is pretty small, but we already have code to assign
vectors to IPIs (see sal.c) and it's preobably better to have a
centralized "vector manager" that hands out vectors based on
some imput (like priority).


108751 06-Jan-2003 marcel

Manually inline handleclock(). There's only a single caller and
handleclock itself is trivial.

While here, replace (itc_frequency+hz/2)/hz with itm_reload for
consistency. There's now a single place where we determine the
ITM reload value.


108749 06-Jan-2003 marcel

Count interrupts as soon as possible. This makes sure interrupts are
counted even when there are no handlers.


108737 05-Jan-2003 marcel

Don't hardcode the address of the local (S)APIC (aka processor
interrupt block). We use the previously hardcoded address as a
default only, but will otherwise use whatever ACPI tells us.
The address can be found in the MADT table header or in the
LAPIC override table entry.


108734 05-Jan-2003 marcel

Bump the number of interrupts from 65 to 257. This is a waste of
space most of the time, but handles machines with lots of I/O
(S)APICs. We cannot make this more dynamic without breaking the
interface with vmstat. Hence, we need to fix the interface first.


108733 05-Jan-2003 marcel

Handle 3-digit interrupt numbers (vectors). While here, change the
name of unused entries from "intr XXX" to "#XXX". This makes it
easier to debug interrupt problems, because vmstat can be hacked
more easily to dump all interrupt entries that are in use and not
those that have had interrupts.


108730 05-Jan-2003 marcel

Make all memory I/O addresses (explicitly) 64-bit. Memory mapped
devices aren't necessarily mapped within 4GB. I/O port addresses
are offsets into the memory mapped I/O port space, which is not
larger than 16MB. No need to convert those to 64 bit types.


108728 05-Jan-2003 marcel

Provide a null-implementation for bus_space_unmap, like i386.
bus_space_unmap is required for puc(4).


108690 05-Jan-2003 marcel

Adopt, adapt and improve:
o Make the URL of the handbook match reality
o Improve some comments (either wording or formatting)
o Sync with i386: comment-out DDB, INVARIANTS, INVARIANT_SUPPORT
o Add some more SCSI/RAID controllers:
ahd, mpt, asr, ciss, dpt, iir, mly, ida
o Remove support for the parallel port
o Add NICs: em, bge
o Remove NICs: ste, tl, tx, vr, wb
o Enable USB support again, except of the UHCI host controller.
UHCI still hangs the BigSur (=HP i2000) machines, and makes
them useless. The OHCI controller works fine. Note that newer
ia64 boxes based on the Intel host controllers (UHCI or EHCI)
still won't have USB support. We really need to import the
EHCI host controller from NetBSD...


108643 04-Jan-2003 alc

Hold the page queues lock around pmap_remove_pte() in pmap_enter().

Submitted by: Arun Sharma <adsharma@unix-os.sc.intel.com>


108619 03-Jan-2003 marcel

Make this build and sync-up:
o Add COMPAT_FREEBSD4
o Remove NO_GEOM
o Remove commented out options.


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108409 29-Dec-2002 rwatson

Synchronize to kern/syscalls.master:1.139.

Obtained from: TrustedBSD Project


108175 22-Dec-2002 tjr

MB_LEN_MAX is not MD, move it to the MI limits.h.


108049 18-Dec-2002 marcel

More MFp4: DIG64 structures.


108026 18-Dec-2002 marcel

Export the physical address of the RSDP to userland by means
of the `machdep.acpi_root' sysctl. This is required on ia64
because the root pointer hardly ever, if at all, lives in the
first MB of memory and also because scanning the first MB of
memory can cause machine checks.
This provides a save and reliable way for ACPI tools to work
with the tables if ACPI support is present in the kernel. On
ia64 ACPI is non-optional.


107963 17-Dec-2002 marcel

Check that the dump device is large enough. Otherwise we could
end up with a dump offset that's smaller than the start of the
dump device and either clobber data in preceding partitions or
try to write beyond the end of the medium (unsigned wrap).

Implement legacy behaviour to never write to the first 64KB as
that is where metadata (ie disklabels) may reside.


107923 16-Dec-2002 marcel

Regen: swapoff


107922 16-Dec-2002 marcel

Change swapoff from MNOPROTO to UNIMPL. The former doesn't work.


107913 15-Dec-2002 dillon

This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after: 3 weeks


107849 14-Dec-2002 alfred

SCARGS removal take II.


107839 13-Dec-2002 alfred

Backout removal SCARGS, the code freeze is only "selectively" over.


107838 13-Dec-2002 alfred

Remove SCARGS.

Reviewed by: md5


107719 10-Dec-2002 julian

Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by: re (jhb)


107685 08-Dec-2002 marcel

Use one of the bi_spare entries for the DIG64 HCDP table address.
The HCDP table is one (non-proprietary) way for the platform to
inform the OS about headless operation. This field would normally
hold the address as can be found by scanning the EFI system table,
which we also pass to the kernel. The apparent duplication allows
us to synthesize a HCDP table in the loader by whatever means we
can think of, including relocating the platform table into pre-
mapped address space. In short: it gives us more freedom.

Approved by: re (blanket)


107684 08-Dec-2002 marcel

Disable SMP. It reduces the chance that the kernel boots. On top
of that, there's some nasty process corruption when running with
SMP.

Note that this was already in effect for the 5.0-RC1 kernels in
the form of a local patch.

Approved by: re (blanket)


107481 02-Dec-2002 alc

MFi386
Hold the page queues lock around vm_page_unhold() in vunmapbuf().

Approved by: re (blanket)


107395 29-Nov-2002 marcel

Implement bus_space_subregion(). Identical to i386.

Approved by: re (carte blanc)


107394 29-Nov-2002 marcel

Better handle sparse physical memory: Don't use the address range
as a measure for available memory to scale the VHPT. Instead, use
the previously determined Maxmem.

Approved by: re (carte blanc)


107206 24-Nov-2002 marcel

MFp4:
Add function map_port_space() to map the memory mapped I/O port
range as uncacheable virtual memory and call it prior to probing
for a console. This removes the dependency on the loader to have
done this for us. Note that this change does not include doing
the same for APs.

Approved by: re (blanket)


107205 24-Nov-2002 marcel

Fix comparison that caused a 1-off bug. This appeared harmless for
the kernel itself, but SAL on Itanium2 machines spontaneously
rebooted the machine.

Approved by: re (blanket)
Submitted by: Arun Sharma <adsharma@unix-os.sc.intel.com>


107180 22-Nov-2002 mux

Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit(). This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit(). We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by: re@
Suggested by: jhb


107028 17-Nov-2002 alc

MFi386 r1.369
- Clear the PG_WRITEABLE flag in pmap_page_protect() if write access is
being removed. Return immediately if write access is being removed and
PG_WRITEABLE is already clear.


106993 16-Nov-2002 deischen

Regenerate after adding syscalls.


106989 16-Nov-2002 deischen

Add *context() syscalls to ia64 32-bit compatability table as requested
in kern/syscalls.master.


106977 16-Nov-2002 deischen

Add getcontext, setcontext, and swapcontext as system calls.
Previously these were libc functions but were requested to
be made into system calls for atomicity and to coalesce what
might be two entrances into the kernel (signal mask setting
and floating point trap) into one.

A few style nits and comments from bde are also included.

Tested on alpha by: gallatin


106965 15-Nov-2002 peter

Do not assume that time_t is an int.

Approved by: re (jhb)


106964 15-Nov-2002 peter

Test the water. Make time_t long (64 bit) on ia64 since we do not have
to worry about ABI vs released systems yet. This is mostly transparent
since there is no significant exposure in the syscall interface. The
things that go wrong are mostly userland stuff - time(&intvariable).

Reviewed by: dfr, marcel
Approved by: re (jhb)


106838 13-Nov-2002 alc

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


106755 11-Nov-2002 marcel

ia64 ABI breaker:
Don't force 16-byte alignment at run-time. Do it at compile-time.
This saves us the pointer fiddling by the setjmp functions and
reduces complexity. While here, increase the jmp_buf by 16 bytes
to an even 512 bytes. Coincidentally, due to the way alignment
was handled prior to this change, the jmp_buf has not changed in
size, but only in how the space is used. Prior to this change
the 16 bytes were reserved for enforcing alignment; now they are
reserved by us for future extensions.
Therefore, this ABI breaker is relatively save: the failure is
always an alignment trap.


106753 11-Nov-2002 alc

- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit()
if we're removing write access from the page's PTEs.
- Export pmap_remove_all() on alpha, i386, and ia64. (It's already
exported on sparc64.)


106751 11-Nov-2002 marcel

Comment-out USB support. A kernel doesn't boot with it. Deal with it
later.


106697 09-Nov-2002 des

Print real / avail memory in megabytes rather than kilobytes.


106605 07-Nov-2002 tmm

Move the definitions of the hw.physmem, hw.usermem and hw.availpages
sysctls to MI code; this reduces code duplication and makes all of them
available on sparc64, and the latter two on powerpc.
The semantics by the i386 and pc98 hw.availpages is slightly changed:
previously, holes between ranges of available pages would be included,
while they are excluded now. The new behaviour should be more correct
and brings i386 in line with the other architectures.

Move physmem to vm/vm_init.c, where this variable is used in MI code.


106503 06-Nov-2002 jmallett

Remove what was a temporary bogus assignment of bits of siginfo_t, as it does
not look like the prerequisites to fill it in properly will be in the tree
for the upcoming release, but it's mostly done, so there is no need for these
to stay around to remind us.


106486 06-Nov-2002 marcel

Define UMA_MD_SMALL_ALLOC so that we can allocate memory with region
7 addresses for use by page tables and kernel stacks.

Obtained from: peter


106447 05-Nov-2002 marcel

o Remove devices that are commented out.
o Enable sc
o Remove NO_GEOM. We need GEOM for GPT.
o Remove NO_CPU_COPTFLAGS.


106446 05-Nov-2002 marcel

Remove mcclock. It's an Alpha left-over.


106364 02-Nov-2002 rwatson

Sync to src/sys/kern/syscalls.master


106197 30-Oct-2002 marcel

Don't pass the return address to exception_save in register b0. Use
a true scratch register. This change and future re-allocations will
eventually result in code that we can unwind to to get the preserved
registers of the process. This of course means that we cannot trash
them while saving the process context.

While re-allocating, remove the register aliases. Abstraction is in
this case disadvanteous.


106189 30-Oct-2002 marcel

Rewrite cpu_switch(). The most notable change is the fact that we now
have f16-f31 as part of the context. The PCB has been reorganized to
better match how we save and restore the (preserved) registers. This
commit also moves the context restoriation to its own function (named
pcb_restore), as we did with pcb_save.

Only minimal effort has been put in writing optimal assembly. The
expectation is that there will be more rounds of changes.


106069 28-Oct-2002 marcel

Remove mf.a from sapic_read() and sapic_write(). We only care
about ordering and not acceptance. The removal of mf.a leaves
behind the mf that accompanied it.


106067 28-Oct-2002 marcel

Remove mf.a (the acceptance form of the memory fence instruction)
from all low-level bus space support functions. There's no need
to actually force the read/write to be accepted by the platform
before we can do anything else. We still have the mf instruction
there, which forces ordering. This too is not required given the
semantices of the bus space I/O functions, but it's not at all
clear to me if there are any poorly written device drivers that
depend on the strict ordering by the processor. The motto here is
to take small steps...


106066 28-Oct-2002 marcel

Make vmstat -i work:
o Properly set the pointer to the counter for each interrupt and
update the intrnames table.
o Remove Alpha cruft from intrcnt.h.
o Create INTRNAME_LEN as the single entity that defines the width
of the names in the intrnames table (incl. terminatinf '\0').


106063 27-Oct-2002 marcel

In ipi_send(), perform a mf instruction prior to initiating the IPI.
This guarantees that loads and stores emitted before the fence are
made visible before the IPI becomes pended.
Remove the mf.a instruction after initiating the IPI. There's no
guarantee that the IPI becomes pended prior to subsequent reads or
writes. Even if there was a guarantee, it would mostly be without
any benefit.


105977 26-Oct-2002 peter

Add COMPAT_FREEBSD4 here too. It has COMPAT_43 as well.


105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


105900 24-Oct-2002 julian

Extract out KSE specific code from machine specific code
so that there is ony one copy of it. Fix that one copy
so that KSEs with no mailbox in a KSE program are not a cause
of page faults (this can legitmatly happen).

Submitted by: (parts) davidxu


105891 24-Oct-2002 jhb

Oops, I missed a few changes in 'device acpica' -> 'device acpi' change.

Submitted by: Hiten Pandya <hiten@angelica.unixdaemons.com>


105890 24-Oct-2002 jhb

Rename 'device acpica' to 'device acpi'.

Approved by: msmith, iwasaki


105591 20-Oct-2002 marcel

In cb_dumphdr() we were calling buf_write() with di->priv as the
pointer to a dumperinfo instead of di. A brainfart, surely. This
bug went unnoticed for all this time because the pointer is only
used by buf_write() when it can write a completely filled buffer
to the dump device. This depends on the number of memory chunks
that needs to be dumped. This has apparently been low enough that
it has never happened up until this point.


105500 20-Oct-2002 marcel

Remove the special casing for IP addresses that are within the IVT
or the do_syscall() function. We have unwind directives to stop the
unwinder.


105499 20-Oct-2002 marcel

Define IVT_ENTRY and IVT_END as special versions of ENTRY and END
for defining vectors. As a result, each vector will be a global
function with unwind directives to notify the unwinder that we're
in an interrupt handler. In the debugger this will show up something
like:

Debugger(0xe000000000a211d8, 0xe000000000748960) at Debugger+0x31
panic(0xe000000000a36858, 0xe0000000021d32d0, 0xe000000000ae42e8, ...
trap(0x14, 0x100000, 0xe0000000021d32d0, 0x0, 0xa0000000002095f0, ...
ivt_Data_TLB(0x14, 0x100000, 0xe0000000021d32d0) at ivt_Data_TLB+0x1f0


105490 19-Oct-2002 peter

Stake a claim on 418 (__xstat), 419 (__xfstat), 420 (__xlstat)


105486 19-Oct-2002 peter

Grab 416/417 real estate before I get burned while testing again.
This is for the not-quite-ready signal/fpu abi stuff. It may not see
the light of day, but I'm certainly not going to be able to validate it
when getting shot in the foot due to syscall number conflicts.


105476 19-Oct-2002 rwatson

Add a placeholder for the execve_mac() system call, similar to SELinux's
execve_secure() system call, which permits a process to pass in a label
for a label change during exec. This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process. Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105470 19-Oct-2002 marcel

Update the unwind information when modules are loaded and unloaded
by using the linker hooks. Since these hooks are called for the
kernel as well, we don't need to deal with that with a special
SYSINIT. The initialization implicitly performed on the first
update of the unwind information is made explicit with a SYSINIT.
We now don't need the _ia64_unwind_{start|end} symbols.


105469 19-Oct-2002 marcel

Add two hooks to signal module load and module unload to MD code.
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.

Validated on: alpha, i386, ia64


105463 19-Oct-2002 rwatson

Permits UFS ACLs to be used with the GENERIC kernel. Due to recent
ACL configuration changes, this shouldn't result in different code paths
for file systems not explicitly configured for ACLs by the system
administrator. For UFS1, administrators must still recompile their
kernel to add support for extended attributes; for UFS2, it's sufficient
to enable ACLs using tunefs or at mount-time (tunefs preferred for
reliability reasons). UFS2, for a variety of reasons, including
performance and reliability, is the preferred file system for use with
ACLs.

Approved by: re


105432 19-Oct-2002 marcel

Make this compile when DDB is not defined by conditionally compiling
all references to ksym_start and ksym_end.


105147 15-Oct-2002 marcel

Fix kernel module loading on ia64. Cross-module function calls
were improperly relocated due to faulty logic in lookup_fdesc()
in elf_machdep.c. The symbol index (symidx) was bogusly used for
load modules other than the one the relocation applied to. This
resulted in bogus bindings and consequently runtime failures.

The fix is to use the symbol index only for the module being
relocated and to use the symbol name for look-ups in the
modules in the dependent list. As such, we need a function to
return the symbol name given the linker file and symbol index.


105138 15-Oct-2002 peter

The a.out md_coredump stuff isn't referenced anywhere anymore, and
hasn't been filled in for ages.. Nuked.


105079 14-Oct-2002 marcel

Allow kernel dumps to be aborted with ctrl-C.


105054 13-Oct-2002 mike

Remove the P1003_1B kernel option; it is no longer used.


105049 13-Oct-2002 mike

struct ia64_fpreg needs to be available outside of the kernel for
struct sigcontext.

Pointy hat to: mike


105014 13-Oct-2002 mike

Add standards visibility conditionals. Change any uses of sigset_t to
struct __sigset to avoid depending on objects from <sys/signal.h>.


105011 12-Oct-2002 marcel

o Fix typo in previous commit: s/sc-nsect/sc->nsect/
o Fix printf format error for %d format with long argument.


105010 12-Oct-2002 marcel

Plug two holes where we returned to userland without restoring
the predicate registers. Even though the ITLB and DTLB interrupts
happen often enough, this bug didn't do much harm. The reason
is that the interrupt handlers only modify p1 and since this is
a preserved (callee-saved) register it is hardly used in code
generated by the compiler. Compilers use scratch registers by
default. Changing the interrupt handlers to use p6 (ie a scratch
register) proved that the bug was in fact fatal.


105001 12-Oct-2002 marcel

Polish previous commit:
o Replace KSTACK_PAGES with pages on panic() in pmap_new_thread(),
o Fix style bugs in adjacent code,
o Use NULL instead of 0 for pointers,
o Save the virtual kstack address if we create an alternate
kstack because 1) we can derive the physical (RR7) address
from it and 2) we need the virtual address for contigfree()
in pmap_dispose_thread(). Thus td_altkstack saves
td_md.md_kstackvirt.


105000 12-Oct-2002 marcel

MFp4: Include machine/vmparam.h to pull in definition of IA64_RR_BASE.

Obtained from: peter


104998 12-Oct-2002 marcel

Remove the dependency on ia64_cpu.h by not defining pmap_kextract()
as a trivial function that only calls ia64_tpa() and hence requires
the prototype of ia64_tpa(), but by defining pmap_kextract as
ia64_tpa. This solves the inclusion ordering issue in ddb/db_watch.c


104939 11-Oct-2002 peter

cut/paste the pmap_new_altkstack stuff from the other platforms.
It's no different here. Update the rest of the kstack API's for scottl's
changes.


104938 11-Oct-2002 peter

Call uma_zalloc on pvzone with M_NOWAIT, just like i386 and alpha.
Otherwise we get hundreds of 'could sleep' during boot.


104908 11-Oct-2002 mike

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


104741 09-Oct-2002 peter

re-regen. Sigh.


104740 09-Oct-2002 peter

Sigh. Fix fat-fingering of diff. I knew this was going to happen.


104739 09-Oct-2002 peter

regenerate. sendfile stuff and other recently picked up stubs.


104738 09-Oct-2002 peter

Try and deal with the #ifdef COMPAT_FREEBSD4 sendfile stuff. This would
have been a lot easier if do_sendfile() was usable externally.


104736 09-Oct-2002 peter

Try and patch up some tab-to-space spammage.


104735 09-Oct-2002 peter

Add placeholder stubs for nsendfile, mac_syscall, ksem_close, ksem_post,
ksem_wait, ksem_trywait, ksem_init, ksem_open, ksem_unlink, ksem_getvalue,
ksem_destroy, __mac_get_pid, __mac_get_link, __mac_set_link,
extattr_set_link, extattr_get_link, extattr_delete_link.


104584 06-Oct-2002 mike

Add conditionals to allow va_list to be defined in other headers.


104583 06-Oct-2002 mike

o Add conditionals to allow va_list to be defined in other headers.
o Standardize on _MACHINE_STDARG_H_ to allow multiple header includes.
o Restrict the definition of va_copy() to C99 environments.


104551 06-Oct-2002 obrien

It appears CPU_MAXID should be 1 more than the number of CPU_* defines.


104519 05-Oct-2002 phk

NB: This commit does *NOT* make GEOM the default in FreeBSD
NB: But it will enable it in all kernels not having options "NO_GEOM"

Put the GEOM related options into the intended order.

Add "options NO_GEOM" to all kernel configs apart from NOTES.

In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.

There are currently three known issues which may force people to
need the NO_GEOM option:

boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.

SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.

PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)

These issues are all being worked.

Sponsored by: DARPA & NAI Labs.


104505 05-Oct-2002 mike

Fix namespace issues by using visibility conditionals from
<sys/cdefs.h>.


104493 04-Oct-2002 mike

style(9) <machine/setjmp.h> headers so they look mostly the same.


104486 04-Oct-2002 sam

New bus_dma interfaces for use by crypto device drivers:

o bus_dmamap_load_mbuf
o bus_dmamap_load_uio

Test on i386. Known to compile on alpha and sparc64, but not tested.
Otherwise untried.


104439 04-Oct-2002 peter

Gah, spell extern correctly. Do not trust cut/paste via old mozilla
builds.


104438 04-Oct-2002 peter

List the IO SAPIC delivery mode definitions.


104436 04-Oct-2002 peter

Declare itc_frequency and itm_reload.


104433 04-Oct-2002 peter

Do a bit of rude hackery to get clock interrupts on all CPUs. This
is partly based on the Alpha system which duplicates the clock to
each cpu, instead of doing a clock roundrobin like on i386. This means
we get hz * ncpu clocks per second and so we have to seperate clock
sampling from actual 'do the work' clock processing. The BSP runs the
complete processing, the rest just sample state etc.

Using the on-cpu interval timer is not ideal as it will drift. There
is more to be done here, we should use an external clock source.


104426 04-Oct-2002 peter

Update stubs for post-kseIII.


104425 04-Oct-2002 peter

Update for post-kseIII


104379 02-Oct-2002 archie

Let kse_wakeup() take a KSE mailbox pointer argument.

Reviewed by: julian


104320 01-Oct-2002 phk

Fix the same misinitialization of pmap_prefault_pageorder as on i386.

Suggeste by: jake


103972 25-Sep-2002 archie

Make the following name changes to KSE related functions, etc., to better
represent their purpose and minimize namespace conflicts:

kse_fn_t -> kse_func_t
struct thread_mailbox -> struct kse_thr_mailbox
thread_interrupt() -> kse_thr_interrupt()
kse_yield() -> kse_release()
kse_new() -> kse_create()

Add missing declaration of kse_thr_interrupt() to <sys/kse.h>.
Regenerate the various generated syscall files. Minor style fixes.

Reviewed by: julian


103870 23-Sep-2002 alfred

use __packed.


103834 23-Sep-2002 peter

At great personal risk, add a __packed and __aligned(x) define that
expand to __attribute__((packed)) and __attribute__((aligned(x)))
respectively. Replace the handful of gcc-ism's that use
__attribute__((aligned(16))) etc around the kernel with __aligned(16).

There are over 400 __attribute__((packed)) to deal with, that can come
later. I just want to use __packed in new code rather than add more
gcc-ism's.


103814 23-Sep-2002 mike

Be careful not to define GCC-specific optimizations in the non-GCC
case.


103714 20-Sep-2002 phk

(This commit touches about 15 disk device drivers in a very consistent
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)

If struct disklabel is the messenger: kill the messenger.

Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer). This commit changes this communication to use four
explicit fields instead.

Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.

Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.

The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.

This concludes (modulus any mistakes) the series of disklabel related
commits.

I belive it all amounts to a NOP for all the rest of you :-)

Sponsored by: DARPA & NAI Labs.


103703 20-Sep-2002 phk

For reasons now lost in historical fog, the bounds_check_with_label()
function were put in i386/i386/machdep.c from where it has been
cut and pasted to other architectures with only minor corruption.

Disklabel is really a MI format in many ways, at least it certainly
is when you operate on struct disklabel.

Put bounds_check_with_label() back in subr_disklabel.c where it belongs.

Sponsored by: DARPA & NAI Labs.


103646 19-Sep-2002 jhb

Implement db_print_backtrace() if DDB is compiled into the kernel. This
MD function is just a wrapper around db_stack_trace_cmd() that prints out
a backtrace of curthread. Currently, this function is only implemented
on i386 and alpha (and the alpha version isn't quite tested yet, will do
that in a bit). Other changes:

- For i386, fix a bug in the raw frame address case. The eip we extract
from the passed in frame address does not match the frame we received.
Thus, instead of printing a bogus frame with the wrong eip, go ahead
and advance frame down to the same frame as the eip we are using.
- For alpha, attempt to add a way of doing a raw trace for alpha. Instead
of passing a frame address in 'addr', pass in a pointer to a structure
containing PC and KSP and use those to start the backtrace. The alpha
db_print_backtrace() uses asm to read in the current PC and KSP values
into such a request.

Tested on: i386
Requested by: many


103526 18-Sep-2002 mike

Implement C99's va_copy() macro.


103436 17-Sep-2002 peter

Initiate deorbit burn for the i386-only a.out related support. Moves are
under way to move the remnants of the a.out toolchain to ports. As the
comment in src/Makefile said, this stuff is deprecated and one should not
expect this to remain beyond 4.0-REL. It has already lasted WAY beyond
that.

Notable exceptions:
gcc - I have not touched the a.out generation stuff there.
ldd/ldconfig - still have some code to interface with a.out rtld.
old as/ld/etc - I have not removed these yet, pending their move to ports.
some includes - necessary for ldd/ldconfig for now.

Tested on: i386 (extensively), alpha


103367 15-Sep-2002 julian

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


103109 09-Sep-2002 kuriyama

Use "options " rather than "options<tab>".


103081 07-Sep-2002 jmallett

Fill out two fields (si_pid, si_uid) in the siginfo structure handed back
to userland in the signal handler that were not being iflled out before, but
should and can be.

This part of sendsig could be slightly refactored to use an MI interface, or
ideally, *sendsig*() would have an API change to accept a siginfo_t, which
would be filled out by an MI function in the level above sendsig, and said MI
function would make a small call into MD code to fill out the MD parts (some
of which may be bogus, such as the si_addr stuff in some places). This would
eventually make it possible for parts of the kernel sending signals to set up
a siginfo with meaningful information.

Reviewed by: mux
MFC after: 2 weeks


103049 07-Sep-2002 peter

Zap the implementations of the i386-aout specific cpu_coredump function.
Most of the non-i386 platforms had rather broken implementations anyway.


102883 03-Sep-2002 peter

Make this compile


102874 03-Sep-2002 mike

Now that _BSD_CLK_TCK_ and _BSD_CLOCKS_PER_SEC_ are the same on all
architectures, move the definition directly into <time.h> and finish
the removal of <machine/ansi.h>.


102869 02-Sep-2002 mike

Align _BSD_CLK_TCK_ and _BSD_CLOCKS_PER_SEC_ with most other
platforms. This introduces some binary incompatibilities for
dynamically linked programs which make use of clock(3) and times(3).


102837 02-Sep-2002 alc

o Remove an initialized but unused variable from pmap_remove_all().


102815 01-Sep-2002 marcel

Sync up: remove device counts.


102808 01-Sep-2002 jake

Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.


102666 31-Aug-2002 peter

Take a shot at fixing up a whole stack of style and other embarresing
unforced errors that Bruce identified. I have not yet addressed all of
his concerns.


102663 31-Aug-2002 peter

Do not use an object for the pte and pv zones on ia64 because it overrides
the pmap_allocf() function that we provide above. We still use the limits
via other means.

Submitted by: jeff


102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


102561 29-Aug-2002 jake

Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.


102560 29-Aug-2002 jake

Fixed printf format errors.


102399 25-Aug-2002 alc

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


102330 23-Aug-2002 marcel

s/_BSD_VA_LIST_/__va_list/. The former type doesn't exist anymore.


102315 23-Aug-2002 mike

Move several MI types from <machine/_types.h> to <sys/_types.h>.
These types are unlikely to ever become very MD. They include:
clockid_t, ct_rune_t, fflags_t, intrmask_t, mbstate_t, off_t, pid_t,
rune_t, socklen_t, timer_t, wchar_t, and wint_t.

While moving them, make a few adjustments (submitted by bde):
o __ct_rune_t needs to be precisely `int', not necessarily __int32_t,
since the arg type of the ctype functions is int.
o __rune_t, __wchar_t and __wint_t inherit this via a typedef of
__ct_rune_t.
o Some minor wording changes in the comment blocks for ct_rune_t and
mbstate_t.

Submitted by: bde (partially)


102278 22-Aug-2002 mux

Convert NEXUS_ACCESSOR to use the __BUS_ACCESSOR
macro instead of reimplementing it.

Approved by: peter


102227 21-Aug-2002 mike

o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif

Concept by: bde
Reviewed by: jake, obrien


102161 20-Aug-2002 rwatson

Correct one more errant whitespace nit that crept in during changes
in the arguments to vn_rdwr(). Hopefully the last.


102153 20-Aug-2002 peter

remove unit counts from atkbdc, pckbd, sc


101945 15-Aug-2002 rwatson

Correct a minor whitespace nit that sneaked in with my previous commit.


101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101645 10-Aug-2002 alc

o Remove the setting and clearing of the PG_MAPPED flag from the alpha and
ia64 pmap.
o Remove the PG_MAPPED flag's declaration.


101625 10-Aug-2002 peter

My quad cpu itanium2 box has its cpu's numbered with a lid starting
at 192. Masking off bottom 4 bits is not very good here.


101588 09-Aug-2002 brooks

Make ppp(4) devices clonable and unloadable.


101479 07-Aug-2002 alc

o Introduce pmap_page_is_mapped(). Its purpose is to obsolete
the PG_MAPPED flag.


101251 03-Aug-2002 peter

Ignore memory above 4GB for now due to unpleasant pci issues.


101199 02-Aug-2002 alc

o Lock page queue accesses by vm_page_deactivate().


100969 30-Jul-2002 iwasaki

Resolve conflicts arising from the ACPI CA 20020725 import.


100882 29-Jul-2002 mike

Create a new header <machine/_stdint.h> for storing MD parts of
<stdint.h>. Previously, parts were defined in <machine/ansi.h> and
<machine/limits.h>. This resulted in two problems:
(1) Defining macros in <machine/ansi.h> gets in the way of that
header only defining types.
(2) Defining C99 limits in <machine/limits.h> adds pollution to
<limits.h>.


100551 23-Jul-2002 peter

de-count pci


100543 23-Jul-2002 arr

- Pass the VM_ALLOC_WIRED flag to vm_page_alloc() in pmap_growkernel() so
that we can avoid a call to vm_page_lock_queues().

Approved by: peter


100464 21-Jul-2002 peter

Add explicit unit count on 'device pci' for ahc/ahd


100398 20-Jul-2002 peter

Change the max IRQ from 63 to 255. I realize we have to block some out
still for the IPI vectors, but 63 isn't enough. There is an fxp at IRQ 86
on the Itanium2 box I have.


100385 20-Jul-2002 peter

Regenerate


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


100274 17-Jul-2002 peter

Fix a transcription typo. s/ACPI_PTR/ACPI_POINTER/


100269 17-Jul-2002 peter

Fix some typos in 1.68 from over a week ago.


100268 17-Jul-2002 peter

Cap the initial PV and PTE table preallocations. Otherwise we explode
on the Itanium2 system I have when we use up *all* of the initial 256MB
direct mapped region before we are ready to dynamically expand it.

The machine that I have has 4 cpus and a very big hole in the middle.
This makes the bogus '(last_address - first_address) / PAGE_SIZE'
calculations especially dangerous and caused many millions of initial
PV/PTE's to be preallocated.


100267 17-Jul-2002 peter

Be sure to use a logical address for the SAL table. For some reason the
phsysical address is still mapped at this stage of boot on the Itanium1
SDV boxes we have. But Itanium2 does *not* let us get away with this.


100266 17-Jul-2002 peter

Update for new ACPICA import. Gah.


100189 16-Jul-2002 jhb

Various comment and minor style fixes. No actual content changes.

Inspired by: bde


100000 14-Jul-2002 alc

o Lock page queue accesses by vm_page_wire().


99900 13-Jul-2002 mini

Add additional cred_free_thread() calls that I had missed the first time.

Pointed out by: jhb


99887 12-Jul-2002 jhb

Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.

Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64


99733 10-Jul-2002 mike

Remove label_t and physadr, which seem to have never been used in
FreeBSD.

Submitted by: bde


99630 09-Jul-2002 mike

Remove an unused type.


99594 08-Jul-2002 mike

Move __offsetof() macro from <machine/ansi.h> to <sys/cdefs.h>. It's
hardly MD, since all our platforms share the same macro. It's not
really compiler dependent either, but this helps in reducing
<machine/ansi.h> to only type definitions.


99571 08-Jul-2002 peter

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


99559 07-Jul-2002 peter

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


99422 05-Jul-2002 peter

Back out proc part of last commit. UMA manages the thread cache only, and
we just have to deal with the kstack when told to. We do not have a
UMA-managed cache for the proc struct and its associated upage yet. So,
go back to the old lazy mechanism. Note that if UMA destroys pages that
used to contain proc structures, we'll lose the corresponding upage
forever. (zones never did this - once a page was allocated, it stayed
attached to the proc zone forever)


99421 05-Jul-2002 peter

Copy from sparc64/pmap.c rev 1.64 (Retrofit changes from i386/pmap.c
rev 1.328-1.331.) but for uarea only. We still have our own broken
kstack code here.


99149 30-Jun-2002 iwasaki

Resolve conflicts arising from the ACPI CA 20020404 import.


99117 30-Jun-2002 mike

Since printf(3) now supports the `j' conversion specifier, use that
when printing intmax_t and uintmax_t.

Forgotten by: mike
Noticed by: bde


99095 29-Jun-2002 julian

Fix reverse ordering of locks. add a comment about locks on some platforms.

Submitted by: jhb@freebsd.org


99081 29-Jun-2002 julian

Add KSE stubs to MD parts of ia64 code.
Dfr will fill these out when we decide to enable KSEs on ia64
(probably not immediatly)


99077 29-Jun-2002 julian

Add a copy of the sparc64 machine/kse.h to satisfy depencies..
dfr will fill in the correct contents at a later time.


99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


98773 24-Jun-2002 dfr

Add UMA_ZONE_VM flag to the zones which are used for pmap_enter().


98765 24-Jun-2002 jake

Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.


98727 24-Jun-2002 mini

Remove unused diagnostic function cread_free_thread().

Approved by: alfred


98484 20-Jun-2002 peter

Update an 'XXX what is this?' type comment about suswintr and fuswintr.
These are 16 bit short values used only by the profiling code.


98480 20-Jun-2002 peter

Deorbit suibyte(). It was only used for split address space systems
for supporting UIO_USERISPACE (ie: it wasn't used).


98474 20-Jun-2002 peter

ia32 %edx return comes from td_retval[1], not td_retval[0]

Obtained from: dfr


98473 20-Jun-2002 peter

Use suword32/64 and fuword32/64 like elsewhere instead of inventing
suhword/fuhword.


98471 20-Jun-2002 peter

panic rather than fault and explode if we fail to contigmalloc a kernel
stack. This is still bad(TM), but at least we have a clue when we get
hit when contigmalloc fails.


98470 20-Jun-2002 peter

Use the canonical pmap_{new,dispose,swapin,swapout}_proc() functions,
in this case cut/pasted from sparc64 instead of messing with
contigmalloc where it is not needed.


98469 20-Jun-2002 peter

Move the "- 1" into the RQB_FFS(mask) macro itself so that
implementations can provide a base zero ffs function if they wish.
This changes
#define RQB_FFS(mask) (ffs64(mask))
foo = RQB_FFS(mask) - 1;
to
#define RQB_FFS(mask) (ffs64(mask) - 1)
foo = RQB_FFS(mask);
On some platforms we can get the "- 1" for free, eg: those that use the
C code for ffs64().

Reviewed by: jake (in principle)


98001 07-Jun-2002 jhb

- Fixup / remove obsolete comments.
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.


97969 06-Jun-2002 marcel

Work around a bug in the Linux version of ski, that's specific to
SSC_GET_RTC. This fixes the panic seen shortly after mounting the
root file system.

Thanks to: "K.Sumitani" <ksumitani@mui.biglobe.ne.jp>


97748 02-Jun-2002 schweikh

Fix typo in the BSD copyright: s/withough/without/

Spotted and suggested by: des
MFC after: 3 weeks


97564 30-May-2002 dfr

Move the definition of ElfN_Hashelt to common headers. The only platform
which has a different definition for this is alpha.


97443 29-May-2002 marcel

Remove the definition of struct mca_guid and use the generic
struct uuid defined in <sys/uuid.h>.

Use uuid/UUID instead of guid/GUID to emphasize that the
identifiers are DCE version 1 identifiers and also to avoid
inconsistencies as much a possible.


97261 25-May-2002 jake

Make the run queue parameters machine dependent. Optimize 64 bit
architectures by using a 64 bit word for the bit array which keeps
track of non-empty queues.

Reviewed by: peter


97090 22-May-2002 marcel

o Add records for PCI bus and PCI device errors.
o Rename mem_platform_id to mem_oem_id.
o Minor style fixes.


96973 20-May-2002 marcel

Flesh-out ptrace support. This obviously needs more work.


96961 19-May-2002 marcel

Fix a kernel page fault when accessing user memory. We were
combining too much conditions and as such ended up with the
kernel map instead of the corresponding process map. While
here, remove code to allow access to the stackgap and restyle
slightly to improve readability.

This fix specifically fixes the procfs failure we're having
when reading the process map (cat /proc/curproc/map)


96957 19-May-2002 marcel

It's time to build modules by default.


96956 19-May-2002 marcel

Simplify IA64_CMPXCHG to avoid having braced-groups in expressions.
As a minor positive side-effect, code at -O0 is more optimal. As a
minor negative side-effect, certain boundary cases yield no better
code than non-boundary cases. For example, atomic_set_acq_32(p, 0)
does a useless logical OR with value 0. This was previously elimina-
ted as part of if/while optimizations. Non-boundary cases yield
identical code at -O1 and -O2.


96921 19-May-2002 marcel

Add record definition for memory checks.


96916 19-May-2002 peter

Catch another C++ comment


96912 19-May-2002 marcel

o Remove namespace pollution from param.h:
- Don't include ia64_cpu.h and cpu.h
- Guard definitions by _NO_NAMESPACE_POLLUTION
- Move definition of KERNBASE to vmparam.h

o Move definitions of IA64_RR_{BASE|MASK} to vmparam.h
o Move definitions of IA64_PHYS_TO_RR{6|7} to vmparam.h

o While here, remove some left-over Alpha references.


96907 19-May-2002 marcel

o Move prototypes for restorectx and savectx from cpu.h to pcb.h,
o Remove Alpha specific contents of struct md_coredump.


96900 19-May-2002 marcel

Remove option ACPI_DEBUG. It causes compile failures in the
function tracing bits due to __func__ being declared as const.


96899 19-May-2002 marcel

Cast dumpsize to long long to match printf format.


96755 16-May-2002 trhodes

More s/file system/filesystem/g


96606 14-May-2002 phk

Move MI stuff out of MD param.h files.

It can all still be overridden in the MD files should need suddenly arise.


96604 14-May-2002 phk

Remove the unused definitions of ctod() and dotc().


96495 13-May-2002 marcel

s/_ALPHA_/_MACHINE_/


96494 13-May-2002 marcel

Remove reference to the "Alpha Calling Standard".


96487 13-May-2002 jake

These were repo-copied to dump_machdep.c.


96442 12-May-2002 marcel

o Rename ia64_count_aps to ia64_count_cpus and reimplement the
function to return the total number of CPUs and not the highest
CPU id.
o Define mp_maxid based on the minimum of the actual number of
CPUs in the system and MAXCPU.
o In cpu_mp_add, when the CPU id of the CPU we're trying to add
is larger than mp_maxid, don't add the CPU. Formerly this was
based on MAXCPU. Don't count CPUs when we add them. We already
know how many CPUs exist.
o Replace MAXCPU with mp_maxid when used in loops that iterate
over the id space. This avoids a couple of useless iterations.
o In cpu_mp_unleash, use the number of CPUs to determine if we
need to launch the CPUs.
o Remove mp_hardware as it's not used anymore.
o Move the IPI vector array from mp_machdep.c to sal.c. We use
the array as a centralized place to collect vector assignments.
Note that we still assign vectors to SMP specific IPIs in
non-SMP configurations. Rename the array from mp_ipi_vector to
ipi_vector.
o Add IPI_MCA_RENDEZ and IPI_MCA_CMCV. These are used by MCA.
Note that IPI_MCA_CMCV is not SMP specific.
o Initialize the ipi_vector array so that we place the IPIs in
sensible priority classes. The classes are relative to where
the AP wake-up vector is located to guarantee that it's the
highest priority (external) interrupt. Class assignment is
as follows:
class IPI notes
x AP wake-up (normally x=15)
x-1 MCA rendezvous
x-2 AST, Rendezvous, stop
x-3 CMCV, test


96336 10-May-2002 marcel

Add missing #endif


96318 10-May-2002 obrien

Gcc 3.1 varargs support.


96146 07-May-2002 marcel

o Add ar.lc to the pcb.
o Create pcb_save as the backend for savectx and cpu_switch.
o While here, use explicit bundling for pcb_save and optimize
for compactness (~87% density).

o Not part of the commit is a backend pcb_restore. restorectx()
still jumps halfway into cpu_switch().


96062 05-May-2002 marcel

o Add struct mca_guid
o Add currently known GUIDs
o Slight restyling


96061 05-May-2002 marcel

o Include md_var.h
o Remove definition of struct ia64_fdesc
o Remove prototype of os_boot_rendez
o Use the FDESC_FUNC and FDESC_GP abstractions


96059 05-May-2002 marcel

Remove definition of struct ia64_fdesc. It's been moved to md_var.h


96058 05-May-2002 marcel

o Move definition of struct ia64_fdesc here to remove duplication.
o Add prototype of os_boot_rendez.


96029 04-May-2002 dfr

Use region 7 addresses for the slabs in the PV and PT zones so that we
don't confuse the zone allocater by translating region 5 addresses to
region 7 addresses (which is unavoidable for PTEs).


96019 04-May-2002 marcel

Make sure we don't index the pm_rid array out of bounds in
pmap_ensure_rid(). This can happen because the function is
called for both user and kernel addresses, while the rid array
only has room for user addresses. This bug got exposed by rev
1.58 of ia64/ia64/pmap.c and rev 1.8 of ia64/include/pmap.h.


95929 02-May-2002 dfr

The width of segsz_t should be 64, not 32 on ia64.


95920 02-May-2002 marcel

In pmap_pinit0, remove duplicate initialization.


95919 02-May-2002 marcel

PCPU(current_pmap) is initialized in pmap_bootstrap. No need to
do it again.


95893 01-May-2002 marcel

Save the MCA info specific to the AP as part of the AP launch.


95892 01-May-2002 marcel

Make ia64_mca_save_state MP safe. Protect access to the info block,
updating the sysctl tree and clearing the SAL state by a spin lock.


95863 01-May-2002 peter

Connect up kern_envp *before* we use it for getenv() and console probing.
It is a bit late after that when we have no consoles. :-]

Also, fix a comment nit and print a warning about missing metadata.


95814 30-Apr-2002 phk

Don't export timecounter structures under debug. with sysctl, they
contain no truly interesting data anymore.


95768 30-Apr-2002 marcel

Add ar.lc and ar.ec to the trapframe. These are not saved for syscalls,
only for exceptions.

While adding this to exception_save and exception_restore, it was hard
to find a good place to put the instructions. The code sequence was
sufficiently arbitrarily ordered that the density was low (roughly 67%).
No explicit bundling was used.
Thus, I rewrote the functions to optimize for density (close to 80% now),
and added explicit bundles and nop instructions. The immediate operand
on the nop instruction has been incremented with each instance, to make
debugging a bit easier when looking at recurring patterns. Redundant
stops have been removed as much as possible. Future optimizations can
focus more on performance. A well-placed lfetch can make all the
difference here!

Also, the FRAME_Fxx defines in frame.h were mostly bogus. FRAME_F10 to
FRAME_F15 were copied from FRAME_F9 and still had the same index. We
don't use them yet, so nothing was broken.


95762 30-Apr-2002 marcel

Make this work for ski again. Don't call ia64_mca_init() when we're
in the simulator.


95761 30-Apr-2002 marcel

Include md_var.h. It has the prototype of ia64_running_in_simulator().


95760 30-Apr-2002 marcel

Remove KTR_EXTEND.


95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


95519 26-Apr-2002 marcel

Initialize MCA in cpu_startup() so that it's ready before we wake-up
the application processors. This allows us to collect unconsumed AP
specific error records as part of the wake-up.


95518 26-Apr-2002 marcel

MCA specific code has been moved to a seperate file. It is expected
to grow enough to be in the way here.


95517 26-Apr-2002 marcel

Machine Check Architecture (MCA) support code. Error records are
collected at boot and made available through sysctl(8). At the
moment, the following MIB names are created:

hw.mca.count - The number of error records collected.
hw.mca.first - The lowest sequence number present.
hw.mca.last - The highest sequence number present.
hw.mca.<X> - The error record with sequence number <X>.

Using sysctl(8) allows us to easily detect and analyze the records,
which is very helpful during development of MCA but can also be used
in production as a way to collect machine health statistics.


95515 26-Apr-2002 marcel

Machine Check Architecture (MCA) structures and constants.


95458 25-Apr-2002 marcel

The official name for McKinley is: Itanium 2


95410 25-Apr-2002 marcel

Don't use the symbol name to lookup the symbol value when we can use
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.

Don't pass the symbol name to elf_reloc(), as it isn't used any more.


95245 22-Apr-2002 marcel

Add ia64_sal_init_state(). This function will initialize the machine
check handling. In its current form, it only determines the largest
amount of state information it can get from SAL and allocates a region
7 memory block for it.

The next steps involve:
o get and log any unconsumed (NVM stored) error records across
reboots,
o register an OS_MCA handler and enable machine checks.


95244 22-Apr-2002 marcel

Add state information types.


95231 21-Apr-2002 marcel

Fix WAW dependency violation on r17 (line 198) that only exists for
the SMP case. While on the subject, remove unnecessary stops. I don't
know if this resolves the memory corruption I'm seeing, but it does
have the potential. We'll see...


95229 21-Apr-2002 marcel

Implement elf_reloc(). The RT specification says that we can expect
both Elf_Rel and Elf_Rela types of relocation, so handle them both
even though we only have Rel_Rela ATM. We don't handle 32-bit and
big-endian variants yet. Support for that is not trivial enough to
implement it without any evidence that we ever need it in the near
future.

For the FPTR relocations, we currently use the fptr_storage used by
_reloc() is locore.s. This is in no way a real solution, but for now
provides the service we need to get the basics going.

A static recursive function lookup_fdesc() is used to find the address
of a function in a way that keeps track of the load module so that
we can get the correct GP value if we need to construct an OPD (ie
there's no OPD yet for the function.

For simplicity, we create an OPD for the IPLT relocations as well and
simply fill the user provided function descriptor from the OPD. Since
the the official descriptors are unique, this has no bad side effects.
Note that we ignore the addend for FPTR relocations, but use the
addend for IPLT relocations as an offset to the function address.

This commit allows us to load and relocate modules and modules appear
to work correctly, although we probably need to make sure that we set
GP correctly in all cases when we have inter-module calls. This
especially applies to assembly coded functions that have cross module
calls.


95202 21-Apr-2002 dfr

Setup the child's return values correctly when forking an IA-32 process.


95191 21-Apr-2002 marcel

Improve self-relocation and fix ABI misinterpretation. The changes
here mostly mirror the changes made in
boot/efi/libefi/arch/ia64/start.S rev 1.5

Significant difference: We don't handle the IPLT relocation here.
For barebones KLD support, we make the fptr_storage global.


95025 19-Apr-2002 marcel

Remove the bootinfo kludge. We get the address of the bootinfo
block from the loader.


95019 19-Apr-2002 alc

o Remove vm_map_growstack() from ia64's trap_pfault().
o Remove the acquisition and release of Giant from ia64's trap_pfault().
(vm_fault() still acquires it.)


94980 18-Apr-2002 rwatson

Since WITNESS doesn't just do mutexes, remove "mutex" from the WITNESS
comment in GENERIC config files of appropriate platforms. For whatever
reason, powerpc didn't use WITNESS in GENERIC.


94936 17-Apr-2002 mux

Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by: peter


94819 16-Apr-2002 alc

Remove code that updates vm->vm_ssize. This duplicates work already performed
by vm_map_growstack().


94779 15-Apr-2002 peter

Fix an "oops!" that turned out to be mostly harmless (but gave a warning).
I did this right on the sparc64. Store the direct mapped addresses in
the correct variables.

Submitted by: jake


94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


94642 14-Apr-2002 marcel

Dotting the i-s:
o Use chunk instead of region when we talk about a memory range.
Region can be confused with region register and we already
call it chunk in machdep.c
o Update the twiddle every 16MB


94639 14-Apr-2002 peter

Allow a kernel to be compiled with both SKI and acpica and still
work on real hardware. (SKI used to break the sapic probes)


94628 13-Apr-2002 alc

Add comment that sigreturn() is MPSAFE.


94496 12-Apr-2002 dfr

Initialise ar.cflg, which contains the IA-32 registers cr0 and cr4. Since
all IA-32 processes use the same values for cr0 and cr4, we initialise
them at system startup.


94495 12-Apr-2002 dfr

Print extra information in printtrap() if the interrupted state was for
an IA-32 process. Don't sign extend arguments in ia32_syscall - its not
normally going to be useful (e.g. pointers need to be zero extended).


94494 12-Apr-2002 marcel

Fix definition of va_start: We don't need to take the address of
va_list. It's a builtin type. gcc 3.1 doesn't care either way,
but gcc 3.2 is more picky and doesn't like the former.


94481 12-Apr-2002 peter

Really fix uniprocessor on IA64. Note to self: do not use variables before
they are initialized. I had correctly figured out that the UP problem was
the pcpu current_pmap thing, but didn't fix it right last time.


94380 10-Apr-2002 dfr

Initial support for executing IA-32 binaries. This will not compile
without a few patches for the rest of the kernel to allow the image
activator to override exec_copyout_strings and setregs.

None of the syscall argument translation has been done. Possibly, this
translation layer can be shared with any platform that wants to support
running ILP32 binaries on an LP64 host (e.g. sparc32 binaries?)


94378 10-Apr-2002 dfr

Save and restore the IA-32 state in cpu_switch(). Probably should only do
this if the thread has been executing IA-32 code.


94377 10-Apr-2002 dfr

Add suhword() and fuhword() for accessing 32-bit values ("half words") in
userland. All these functions should be renamed to be explicit about the
size of value being read or written.


94376 10-Apr-2002 dfr

Add exception and syscall support for executing IA-32 binaries.


94375 10-Apr-2002 dfr

Add ucode values for SIGFPE etc. Copied from i386/include/signal.h.


94374 10-Apr-2002 dfr

Add fields for saving/restoring the IA-32 state.


94373 10-Apr-2002 dfr

Add definitions for IA-32 exceptions, interrupts and intercepts.


94365 10-Apr-2002 dfr

Call ast() from the syscall exit path as well as for full exception
restores.


94364 10-Apr-2002 dfr

Initialise PCPU_GET(current_pmap) in pmap_bootstrap - cpu_switch needs
to be sure that it is always correct and this was not true for the first
call to cpu_switch. When thread0 resumed later, it ended up calling
pmap_install with a null pmap, which is bad.


94363 10-Apr-2002 mike

Remove the hack for segsz_t from <sys/types.h>; use the normal
_BSD_FOO_T_ method for defining segsz_t.


94362 10-Apr-2002 mike

Add manifest constants: _LITTLE_ENDIAN, _BIG_ENDIAN, _PDP_ENDIAN, and
_BYTE_ORDER. These are far more useful than their non-underscored
equivalents as these can be used in restricted namespace environments.
Mark the non-underscored variants as deprecated.


94275 09-Apr-2002 phk

GC various bits and pieces of USERCONFIG from all over the place.


94271 09-Apr-2002 dfr

Define a complete set of accessors for application and control registers.


94270 09-Apr-2002 dfr

Don't call make_dev from ssccnattach - its far too early to work properly.


94026 07-Apr-2002 peter

ia64 depends on ACPICA on actual hardware. It might be worth having a
seperate SKI config (like we had SIMOS for alpha).


93997 06-Apr-2002 marcel

Add prototype for bootpc_init when BOOTP is defined.


93961 06-Apr-2002 dfr

Merge fixes for dbtob() and btodb() from alpha/include/param.h. This stops
ffs_snapshot() from using negative numbers for byte offsets in large file
systems.


93933 06-Apr-2002 marcel

Fix a braino in the alignment of the segment contents after dumping
the program headers. As a result of this, dumplo was advanced too
much causing the end of the dump and most notably the trailing
dump header to be written beyond the end of the the dump medium.


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93794 04-Apr-2002 brian

Back out the previous commit.

In the i386 case, options BOOTP requires options NFS_ROOT as well as
options NFSCLIENT. With *both* the NFS options, a bootpc_init()
prototype is brought in by nfsclient/nfsdiskless.h.

In the ia64 case, it just doesn't work and my change just pushes it
further away from working.

Suggested to be wrong by: bde


93793 04-Apr-2002 bde

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


93785 04-Apr-2002 brian

Pre-declare bootpc_init() so that options BOOTP doesn't break the
build in ia64 and i386 due to -Werror.


93761 04-Apr-2002 alc

o Kill the MD grow_stack(). Call the MI vm_map_growstack()
in its place.
o Eliminate the use of useracc() and grow_stack() from sendsig().

Reviewed by: peter


93757 04-Apr-2002 marcel

o Add architecture specific segment types.
o Add architecture specific segment attributes.


93719 03-Apr-2002 ru

Dike out a highly insecure UCONSOLE option.
TIOCCONS must be able to VOP_ACCESS() /dev/console to succeed.

Obtained from: OpenBSD


93717 03-Apr-2002 marcel

Make the kernel dump header endianness invariant by always dumping
in dump byte order (=network byte order). Swap blocksize and dumptime
to avoid extraneous padding on 64-bit architectures. Use CTASSERT
instead of runtime checks to make sure the header is 512 bytes large.
Various style(9) fixes.

Reviewed by: phk, bde, mike


93713 03-Apr-2002 marcel

o GC dumplo
o Replace the string lit. "ia64" with MACHINE


93712 03-Apr-2002 marcel

Use a twiddle to show that we're busy dumping. The initial code
emitted the total number of pages it still had to dump prior to
dumping a block of up to 16 pages. For a 128MB region this would
result in 8M number of printf()s. Barf!

The problem in general is that memory typically has one really
big region and a number of "scattered" smaller regions. Some may
even be just a few pages. The twiddle works best for now, but
it doesn't really give a good progress indication for the large
regions. Those are the cases where you definitely want good PI
to avoid having the user turn into a twiddle :-)


93702 02-Apr-2002 jhb

- Move the MI mutexes sched_lock and Giant from being declared in the
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.

Tested on: alpha, i386


93647 02-Apr-2002 marcel

Initial implementation of the ia64 kernel dumper. The dumper
constructs an ELF image, consisting of the ELF header, for
each memory region a program header, followed by the memory
contents for each region. It does blocked I/O for the headers
as they are typically smaller than DEV_BSIZE.


93627 02-Apr-2002 marcel

o GC totalphysmem and resvmem.
o Rephrase comment describing that the memory region can contain
the kernel.


93607 01-Apr-2002 dillon

Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by: jake


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93467 31-Mar-2002 phk

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


93458 30-Mar-2002 marcel

Transition to a model where the loader passes the address of the
bootinfo block in register r8. In locore.s we save the address
in the global variable 'pa_bootinfo'. In machdep.c we compare
this value against the hardwired address, but don't depend on its
validity yet (ie: we still expect the bootinfo block to be at the
hardwired address). After a small amount of time, we'll flip the
switch and depend on the loader to pass us the address. From that
moment on the loader is free to put it anywhere it likes, provided
the machine itself likes it as well.

Add some verbosity to aid in the transition. We emit a message if
the loader didn't pass the address and we also emit a message if
there's no bootinfo block at the hardwired address.

While in locore.s, reduce the number of redundant serialization
instructions. A srlz.i is a proper superset of a srlz.d and thus
is a valid replacement. Also slightly reorder the movl instructions
to improve bundle density.


93389 29-Mar-2002 jake

Remove abuse of intr_disable/restore in MI code by moving the loop in ast()
back into the calling MD code. The MD code must ensure no races between
checking the astpening flag and returning to usermode.

Submitted by: peter (ia64 bits)
Tested on: alpha (peter, jeff), i386, ia64 (peter), sparc64


93312 28-Mar-2002 obrien

style(9)

Approved by: jake


93273 27-Mar-2002 jeff

Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag. Remove the dup_list and dup_ok code from subr_witness. Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface. The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by: jhb


93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


93256 27-Mar-2002 marcel

o Revert previous commit in asm.h. There's no need to undefine
__FBSDID first, because it should not be defined at all,
o Remove inclusion of cdefs.h in locore.s.

Pointed out by: peter


93192 26-Mar-2002 obrien

Get the guarding right. The IA-64 has a different organization for this
than our other platforms.


93092 24-Mar-2002 obrien

Guard against redefining __gnuc_va_list.


93088 24-Mar-2002 marcel

Undefine __FBSDID before defining it as it's already defined at
that point.


92998 23-Mar-2002 obrien

ASM versions of __FBSDID.


92870 21-Mar-2002 dfr

Change critical_t to register_t for intr_disable/restore.


92869 21-Mar-2002 dfr

Change cpu_critical_enter/exit to intr_disable/restore.


92865 21-Mar-2002 peter

In UP mode, the primary cpu's per-cpu current_pmap was not initialized -
this was only done as a side effect of calling cpu_mp_start(). I haven't
actually tested that this fixes UP kernels, but it feels about right.


92851 21-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.

Approved by: peter


92843 20-Mar-2002 alfred

Remove __P.

Reviewd by: peter


92824 20-Mar-2002 jhb

Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount. Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case. In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place. This also removse td_cache_ucred from struct thread. This is
still only done #ifdef DIAGNOSTIC.

Tested on: i386, alpha


92805 20-Mar-2002 dfr

Change intr_enable to intr_restore for consistency with sparc64.


92782 20-Mar-2002 dfr

Replace calls to cpu_critical_enter/exit with appropriate calls to
either explicitly disable interrupts or use a real critical section,
as appropriate.


92781 20-Mar-2002 dfr

Recreate intr_disable/intr_enable and implement cpu_critical_enter/exit
in terms of that (for now).


92696 19-Mar-2002 peter

#if 0 some unused variables (only in #if 0 code)


92679 19-Mar-2002 peter

Enabling the SKI option is a guaranteed breakage for me. Interrupts no
longer work.
I can only get a box to boot with 'options SMP'.


92677 19-Mar-2002 peter

My ia64 box for some reason likes to fragment the beginning/end of memory
a bit before handing it over to the OS. I occasionally have 11
segments with several 8K or so fragments depending on nvram settings and
what I have done under loader(8) before booting. This needs to be
revisited.


92676 19-Mar-2002 peter

Fix some unused variables.


92675 19-Mar-2002 peter

Move a couple of prototypes together instead of being incompletely
scattered around.


92674 19-Mar-2002 peter

__func__ is a const char *, not a "string" that can be concatenated.


92673 19-Mar-2002 peter

Fix a pointer/int warning


92672 19-Mar-2002 peter

#ifdef SMP some variables that are only used elsewhere under #ifdef SMP
also.


92671 19-Mar-2002 peter

Work around an apparent compiler bug with gcc-3.1, although it might be
a language feature that I do not know about. gcc is complaining about
a left shift >= sizeof type, even when shifting a (cast) 64 bit type left
by 43 bits.


92670 19-Mar-2002 peter

Believe it or not, I ran into the 32MB stack size limit using a natively
hosted gcc.


92669 19-Mar-2002 peter

#if 0 out some unused code.


92668 19-Mar-2002 peter

Add some #includes after things got broken with the last round of
MI include file (<sys/smp.h> I think) tweaks.


92667 19-Mar-2002 peter

Turn off the ia64 ITC timecounter when SMP is present since it has the
same problem as the TSC on the x86 - ie: it is not synchronized.
#if 0 out some unused functions, ia64 doesn't calibrate clocks yet.


92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


92552 18-Mar-2002 dfr

Fix spelling.


92383 16-Mar-2002 des

Move the definition of PT_[GS]ET{,DB,FP}REGS from the MD ptrace.h to the
MI ptrace.h, since all platforms define them. Keep the MD ptrace.h around
for FIX_SSTEP (which is currently only needed on Alpha).


92321 15-Mar-2002 dfr

* Stop other cpus when one cpu enters DDB and restart them after it
leaves.
* Add a sync.i instruction to the code which writes out breakpoints to
ensure that the breakpoint is seem by all cpus in the coherence domain.


92318 15-Mar-2002 dfr

* Remove a breakpoint() I accidentally left in for debugging :-(.
* Make cpu_mp_probe() work before the VM system is available and
initialise mp_maxid accordingly.


92287 14-Mar-2002 dfr

Tweak the AP startup code somewhat. With all the other recent changes,
this now works pretty well for two processors at least.

Submitted by: marcel, mostly.


92286 14-Mar-2002 dfr

* Initialise pcb_pmap for new threads.
* Add support for forking new threads from &thread0 as well as curthread.


92285 14-Mar-2002 dfr

* Save and restore PCPU_GET(current_pmap) in pcb_pmap so that we don't
lose if a process is preempted while pmap is temporarily switched to
another pmap.
* For SMP, drop the high-fp state when a thread is switched away from
so that if another cpu resumes that thread, it doesn't have to play
games with IPI to get ahold of the correct register values.


92281 14-Mar-2002 dfr

Add pcpu.pc_current_pmap and pcb.pcb_pmap.


92280 14-Mar-2002 dfr

Add a field to hold the current pmap of a thread.


92271 14-Mar-2002 dfr

Add ia64_sync_i(), ia64_get_tpr() and ia64_set_tpr().


92268 14-Mar-2002 dfr

* Add some KTR messages for IPIs.
* Don't call ast() from interrupt() - if we switch, then we will miss
writing cr.eoi which will prevent the current cpu from receiving
interrupts until the current thread is resumed. The call to ast()
happens magically in exception_restore where it is safe.
* Add DDB 'show irq' command to examine interrupt hardware state.


92267 14-Mar-2002 dfr

Add debug code to print SAPIC registers.


92263 14-Mar-2002 dfr

* Use a mutex to protect the RID allocator.
* Use ptc.g instead of ptc.l so that TLB shootdowns are broadcast to the
coherence domain.
* Use smp_rendezvous for pmap_invalidate_all to ensure it happens on all
cpus.
* Dike out a DIAGNOSTIC printf which didn't compile.
* Protect the internals of pmap_install with cpu_critical_enter/exit.


92262 14-Mar-2002 dfr

Move the call to pmap_bootstrap to after the initialisation of thread0.
This allows us to use mutexes in pmap safely. Also initialise fpcurthread
for cpu0 so that ia64_fpstate_check doesn't barf during boot.


92249 14-Mar-2002 dfr

Don't restore r13 when returning to kernel mode. We may have migrated to
a different cpu since the exception_save and r13 needs to point at the
current cpu's pcpu structure.


92124 12-Mar-2002 peter

Fix some -Wunused warnings by "using" a macro argument


92123 12-Mar-2002 peter

Fix a warning (make ucontext_t *ucp a const)


92122 12-Mar-2002 peter

Stop concatenating __func__ with strings


92121 12-Mar-2002 peter

Deal with a structure member rename in a recent acpica import


92105 11-Mar-2002 jhb

Fix a misspelling of mine: s/optomization/optimization/.

Noticed by: bmilekic


92020 10-Mar-2002 dfr

Add an implementation of cpu_throw() and make restorectx() simply branch
to the tail of cpu_switch.


92019 10-Mar-2002 dfr

Don't try to print the arguments if the value of bsp is outside the
kernel - its asking for trouble.


92010 10-Mar-2002 dfr

Use the right value for the region length in parse_spill_mask.


91959 09-Mar-2002 mike

o Don't require long long support in bswap64() functions.
o In i386's <machine/endian.h>, macros have some advantages over
inlines, so change some inlines to macros.
o In i386's <machine/endian.h>, ungarbage collect word_swap_int()
(previously __uint16_swap_uint32), it has some uses on i386's with
PDP endianness.

Submitted by: bde

o Move a comment up in <machine/endian.h> that was accidentially moved
down a few revisions ago.
o Reenable userland's use of optimized inline-asm versions of
byteorder(3) functions.
o Fix ordering of prototypes vs. redefinition of byteorder(3)
functions, so that the non-GCC (libc asm) case has proper
prototypes.
o Add proper prototypes for byteorder(3) functions in <sys/param.h>.
o Prevent redundant duplicate prototypes by making use of the
_BYTEORDER_PROTOTYPED define.
o Move the bswap16(), bswap32(), bswap64() C functions into MD space
for platforms in which asm versions don't exist. This significantly
reduces the complexity of some things at the cost of duplicate code.

Reviewed by: bde


91779 07-Mar-2002 jake

Include machine/smp.h.


91669 05-Mar-2002 marcel

Call ast() only when we're handling a user trap.


91639 04-Mar-2002 dfr

Add PSEUDOFS.


91635 04-Mar-2002 dfr

Add emulation support for PAL_VM_SUMMARY.


91598 03-Mar-2002 dfr

* Include <sys/ucontext.h> so that this compiles again.
* Move the section which manipulates ia64_pal_base to after cninit() so
that we don't risk printing anything before we have a console.
* Don't call ia64_probe_sapics() for a SKI build. This should really
be dependant on ACPICA being present or something.


91504 28-Feb-2002 arr

- Move a comment from being on the same line as a #ifdef to the line
following it. This should have gone in the previous commit, but
misviewed Bruce's patch.

Requested by: bde


91475 28-Feb-2002 arr

- Fix panic() message and a couple style nits that snuck in from the
recent diagnostics commit (rev. 1.84).


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


91403 27-Feb-2002 silby

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


91394 27-Feb-2002 tmm

Add the following functions/macros to support byte order conversions and
device drivers for bus system with other endinesses than the CPU (using
interfaces compatible to NetBSD):

- bwap16() and bswap32(). These have optimized implementations on some
architectures; for those that don't, there exist generic implementations.
- macros to convert from a certain byte order to host byte order and vice
versa, using a naming scheme like le16toh(), htole16().
These are implemented using the bswap functions.
- stream bus space access functions, which do not perform a byte order
conversion (while the normal access functions would if the bus endianess
differs from the CPU endianess).

htons(), htonl(), ntohs() and ntohl() are implemented using the new
functions above for kernel usage. None of the above interfaces is currently
exported to user land.

Make use of the new functions in a few places where local implementations
of the same functionality existed.

Reviewed by: mike, bde
Tested on alpha by: mike


91090 22-Feb-2002 julian

Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by: dillon@freebsd.org


91066 22-Feb-2002 phk

Convert p->p_runtime and PCPU(switchtime) to bintime format.


90892 19-Feb-2002 julian

Duplicate the changes to i386 to keep creds over the user boundary.


90885 19-Feb-2002 mike

Add C++ support.


90868 18-Feb-2002 mike

o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
complexities associated with having MD (asm and inline) versions, and
having to prevent exposure of these functions in other headers that
happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on: alpha, i386
Reviewed by: bde, jake, tmm


90711 15-Feb-2002 wollman

Resurrect one of the easiest changes from my big include files roll-up
patch from a year ago: give file flags their own type. This does not
(yet) change the type used by system calls or library functions.
The underlying type was chosen to match what is returned by stat().


90598 13-Feb-2002 rwatson

Remove WITNESS from GENERIC by default: as we grow more locks, this gets
slower, and may be impeding adoption of -CURRENT by developers. We
recommend turning on WITNESS by default on crash boxes, and when doing
locking development. It will probably get turned on by default for a week
or two following any major locking commits, also.

Approved by: all and sundry (jhb, phk, ...)


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


90344 07-Feb-2002 phk

GC the PC_SWITCH* symbols which are not used in assembly anymore.


90065 01-Feb-2002 bde

Compile osigreturn() unconditionally since it will always be needed on
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.

ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.

powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.

sparc64: copy the cleaned up the ia64 stub, since there was no stub before.


89493 18-Jan-2002 marcel

Add a definition of ddb_regs. ddb_regs is declared as extern in
db_machdep.h to fix the link failure (multiple definitions)
caused by disabling the emission of common symbols. As a result,
there were no definitions at all. While here, remove useless
declarations.


89492 18-Jan-2002 marcel

Remove the definition of bootverbose. This fixes the link failure
caused by disabling the emission of common symbols.


89491 18-Jan-2002 marcel

Declare ddb_regs as extern to avoid creating a tentative definition.
This fixes the link failure caused by disabling the emission of
common symbols.


88903 05-Jan-2002 peter

Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).


88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


88727 30-Dec-2001 marcel

Revert previous definition of cpu_throw(). Non-MP configurations
were broken as well.


88695 30-Dec-2001 marcel

Better implement SMP support:
o Do not use a special struct to keep track of CPUs we found;
instead, use struct pcpu. This handles all the magic WRT
thread creation (yay!).
o Respect MAXCPU.
o Use the vhpt_base and vhpt_size values to initialize the AP.
o Style fixes.

Note that this commit temporarily breaks SMP configurations.
Previously APs didn't do anything, but they now enter the
scheduler. They hold sched_lock for more than 5 secs though
and cause a panic. That's what I call progress :-)


88693 30-Dec-2001 marcel

o Reimplement map_pal_code to work with a global variable
ia64_pal_base instead of scanning the EFI tables. This way
AP startup code can more easily use the function.
o Initialize ia64_pal_base in ia64_init(). When the PAL code
doesn't need explicit mapping or no PAL code has been found,
ia64_pal_base will be 0.
o Remove some unused global variables.
o Also in ia64_init(), allocate only 1 page for struct pcpu
and remove some Alpha leftovers.
o Initialize pc_pcb in cpu_pcpu_init().


88692 30-Dec-2001 marcel

Make vhpt_base and vhpt_size globals so that they can be used by
the AP startup code.


88691 30-Dec-2001 marcel

Cleanup the IPIs.


88690 30-Dec-2001 marcel

Remove unused MD fields (pc_pending_ipis, pc_next_asn and
pc_current_asngen) and add SMP specific fields (pc_pcb,
pc_lid and pc_awake).


88689 30-Dec-2001 marcel

o Remove temporary implementation of cpu_throw in vm_machdep.c
and instead make it an alternate entry-point of cpu_switch()
in swtch.s
o Add SMP support to cpu_switch().


88687 30-Dec-2001 marcel

Draft implementation of IPI handling.


88686 30-Dec-2001 marcel

Add PC_IDLETHREAD. We need it in cpu_switch.


88685 30-Dec-2001 marcel

Add missing predicate in interruption_Data_TLB. Without this
predicate we never used the VHPT entry we found.

While here, normalize the compares.


88441 23-Dec-2001 dfr

Fix CRITICAL_FORK so that it compiles.


88376 21-Dec-2001 tmm

Use the new resource_list_print_type() function.
Pass the bus device to isa_init() (this is needed for the sparc64
version).


88245 20-Dec-2001 peter

Replace a bunch of:
for (pv = TAILQ_FIRST(&m->md.pv_list);
pv;
pv = TAILQ_NEXT(pv, pv_list)) {
with:
TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {


88088 18-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


87894 14-Dec-2001 iedowse

Enable UFS_DIRHASH in the GENERIC kernel.

Suggested by: silby
Reviewed by: dillon
MFC after: 5 days


87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


87599 10-Dec-2001 obrien

Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.


87572 09-Dec-2001 obrien

style(9)


87546 09-Dec-2001 dillon

Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after: 1 week


87454 06-Dec-2001 jhb

Add multiple inclusion protection.


87341 04-Dec-2001 des

PROCFS requires PSEUDOFS.


87158 01-Dec-2001 mike

o Stop abusing MD headers with non-MD types.
o Hide nonstandard functions and types in <netinet/in.h> when
_POSIX_SOURCE is defined.
o Add some missing types (required by POSIX.1-200x) to <netinet/in.h>.
o Restore vendor ID from Rev 1.1 in <netinet/in.h> and make use of new
__FBSDID() macro.
o Fix some miscellaneous issues in <arpa/inet.h>.
o Correct final argument for the inet_ntop() function (POSIX.1-200x).
o Get rid of the namespace pollution from <sys/types.h> in
<arpa/inet.h>.

Reviewed by: fenner
Partially submitted by: bde


87119 30-Nov-2001 dfr

* Don't use critical_enter/critical_exit when accessing the VHPT - its
pointless and would be inadequate for SMP systems. We will rely on the
VM system's locks to serialise this for now.
* Change pmap_remove() so that if the range being removed is larger than
the number of pages mapped by the pmap, we iterate over the currently
mapped pages instead of over the virtual address range. This should
make a difference when removing large virtual address ranges from an
address space.


86951 27-Nov-2001 dfr

Minor tweaks to the TLB handling code - avoid movl instructions and add
itc.x instructions to attempt to avoid the little flurry of TLB exceptions
for handling access, dirty etc.


86593 19-Nov-2001 peter

s/code/ucode/ (last minute typo)


86592 19-Nov-2001 peter

Initial cut at calling the EFI-provided FPSWA (Floating Point Software
Assist) driver to handle the "messy" floating point cases which
cause traps to the kernel for handling.


86587 19-Nov-2001 peter

Use some (now) spare space for passing through a pointer to the FPSWA
Interface provided by EFI (Floating Point SoftWare Assist).


86586 19-Nov-2001 peter

Remove bootinfo.bi_kernel. It isn't used by the kernel. struct bootinfo
should go away on ia64, we should be loader metadata based since that is
the only way we can boot (loader, skiload).


86443 16-Nov-2001 peter

Oops, I accidently merged a whitespace error from the original commit.
(whitespace at end of line in rev 1.264 pmap.c). Fix them all.


86442 16-Nov-2001 peter

Merge rev 1.264 from i386/pmap.c (tegge via alfred):
Protect against an infinite loop when prefaulting pages. This can
happen when the vm system maps past the end of an object or tries
to map a zero length object, the pmap layer misses the fact that
offsets wrap into negative numbers and we get stuck.


86441 16-Nov-2001 peter

Merge rev 1.202 from i386/pmap.c (back in 1998 by John Dyson):
Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


86440 16-Nov-2001 peter

Merge rev 1.293 of i386/pmap.c - skip PG_UNMANAGED in pmap_collect()


86438 16-Nov-2001 peter

Converge with i386/pmap.c - dont refer to curproc, use curthread.


86435 16-Nov-2001 peter

As part of a general cleanup and reconvergence of related pmap code,
start tidying up some loose ends. The DEBUG_VA stuff has long since
passed its use-by date. It wasn't used on ia64 but got cut/pasted there.


86294 12-Nov-2001 peter

Implement eficlock_set() to set hardware clock.


86291 12-Nov-2001 marcel

o os_boot_rendez is responsible for clearing the IRR bit by
reading cr.ivr, as well as writing to cr.eoi.
o use global variables to pass information to os_boot_rendez
so that it doesn't have to jump through hoops to find it
out. This avoids traps on the AP without it even being
initialized. This fixes SMP configurations.
o Move the probing of the MADT to the end of cpu_startup,
instead of at the start of cpu_mp_probe. We need to probe
the MADT for non-SMP configurations as well. This fixes
uniprocessor configurations.
o Serialize AP wake-up by waiting for the AP. We need to do
this since we use global variables to for the AP to use.
As a side-effect, we can use printf() more easily to see
what's going on.


86290 12-Nov-2001 marcel

Invoke trap() for the alt. ITLB and alt. DTLB interrrupts when
the region is not 6 or 7. This changes the behaviour from
inserting a bogus region 6 mapping to a kernel panic.


86286 12-Nov-2001 peter

Remove #if 0'ed code that was replaced by vm_ksubmap_init() and GC'ed
on other platforms.


86239 10-Nov-2001 marcel

Avoid using the .align directive to skip to the next vector offset.
It doesn't help us catch overflowing vector entries at compile time.
Instead use the .org directive. The last entry in the IVT doesn't
strictly need to be limited to 256 bytes, but doing so allows the
the VHPT to be placed immediately following the IVT without wasting
any space due to alignment.


86213 09-Nov-2001 dfr

* Make sure we increment pm_stats.resident_count in pmap_enter_quick
* Re-organise RID allocation so that we don't accidentally give a RID
to two different processes. Also randomise the order to try to reduce
collisions in VHPT and TLB. Don't allocate RIDs for regions which are
unused.
* Allocate space for VHPT based on the size of physical memory. More
tuning is needed here.
* Add sysctl instrumentation for VHPT - see sysctl vm.stats.vhpt
* Fix a bug in pmap_prefault() which prevented it from actually adding
pages to the pmap.
* Remove ancient dead debugging code.
* Add DDB commands for examining translation registers and region
registers.

The first change fixes the 'free/cache page %p was dirty' panic which I
have been seeing when the system is put under moderate load. It also
fixes the negative RSS values in ps which have been confusing me for a
while.

With this set of changes the ia64 port is reliable enough to build its
own kernels, even with a 20-way parallel build. Next stop buildworld.


86212 09-Nov-2001 dfr

Raise SIGILL for General Exceptions - its closer to the correct meaning.


86211 09-Nov-2001 dfr

Reserve more space for phys_avail. Really need to be more careful about
overflowing phys_avail.


86210 09-Nov-2001 dfr

Teach DDB about branch registers.


86209 09-Nov-2001 dfr

Define PS and VE fields of region register correctly.


86204 09-Nov-2001 marcel

Implement os_boot_rendez. Application processors are initialized
and brought to a point where kernel specific initializations can
be done. That will be the next step...


86069 05-Nov-2001 marcel

Don't pass os_boot_rendez directly to SAL_SET_VECTORS, because it's
actually the address of the function descriptor. The fdesc has both
the address of the function and it's corresponding gp value. Now
that we have a gp value, use it instead of passing 0.


85973 03-Nov-2001 dfr

Implement <machine/ieeefp.h>


85930 03-Nov-2001 dillon

Implement i386/i386/pmap.c 1.292 for alpha, ia64 (avoid free
page exhaustion / kernel panic for certain madvise() scenarios)


85892 02-Nov-2001 mike

o Add new header <sys/stdint.h>.
o Make <stdint.h> a symbolic link to <sys/stdint.h>.
o Move most of <sys/inttypes.h> into <sys/stdint.h>, as per C99.
o Remove <sys/inttypes.h>.
o Adjust includes in sys/types.h and boot/efi/include/ia64/efibind.h
to reflect new location of integer types in <sys/stdint.h>.
o Remove previously symbolicly linked <inttypes.h>, instead create a
new file.
o Add MD headers <machine/_inttypes.h> from NetBSD.
o Include <sys/stdint.h> in <inttypes.h>, as required by C99; and
include <machine/_inttypes.h> in <inttypes.h>, to fill in the
remaining requirements for <inttypes.h>.
o Add additional integer types in <machine/ansi.h> and
<machine/limits.h> which are included via <sys/stdint.h>.

Partially obtain from: NetBSD
Tested on: alpha, i386
Discussed on: freebsd-standards@bostonradio.org
Reviewed by: bde, fenner, obrien, wollman


85866 02-Nov-2001 dfr

Call ast() from exception_restore when we are restoring to user mode.


85865 02-Nov-2001 dfr

Use static storage for the unwind state so that we can still get backtraces
when the VM system is hosed.


85856 02-Nov-2001 dfr

Remember to actually free the pv_entry in pmap_remove_entry().


85852 02-Nov-2001 peter

argh! cut/paste typo. :-(
(committed on a different machine to what I was testing it on)


85850 02-Nov-2001 peter

"Fix" a problem that got copied from alpha to ia64 and broke there.
When we truncate the msgbuf size because the last chunk is too small,
correctly terminate the phys_avail[] array - the VM system tests
the *end* for zero, not the start. This leads the VM startup to
attempt to recreate a duplicate set of pages for all physical memory.

XXX the msgbuf handling is suspiciously different on i386 vs
alpha/ia64...


85785 31-Oct-2001 dfr

Experiment with rewriting the syscall() wrapper using explicit bundling
and trying to reduce stalls from reading certain high latency registers.
This should be faster than the old syscall code. Its certainly a lot
smaller.


85777 31-Oct-2001 dfr

Add TF_AR_FPSR, the offset of ar.fpsr in a trapframe.


85766 31-Oct-2001 dfr

Print the bundle template name on the first slot of the bundle.


85685 29-Oct-2001 dfr

* Factor out common code for manipulating the RSE backing store.
* Implement a fairly simplistic parser for unwinding stack frames.
* Use unwind records for DDB's 'trace' command. Also add support for
tracing past exceptions to the context which generated the exception.

The stack unwind code requires a toolchain based on binutils-2.11.2 or
later and gcc-3.0.1 or later.


85684 29-Oct-2001 dfr

Make the various bits of SMP code conditional on SMP so that I can still
build non-SMP kernels.


85682 29-Oct-2001 dfr

Various fixes to make stack traces using the unwind tables work properly.


85681 29-Oct-2001 dfr

Fix disassembly of 'add a=b,c,1' and make the disassembly of the various
break and nops consistent.


85680 29-Oct-2001 peter

The size of the ELF hash table changed from 64 bits in the prototype
toolchains to 32 bits in 2.11.2.

Obtained from: dfr


85674 29-Oct-2001 marcel

o Send a test IPI from the BSP to itself at the same time APs
are woken up.
o Make IPIs synchronuous by default. If we want asynchronuous
IPIs, we may want to make the memory fence controllable.


85673 29-Oct-2001 marcel

Add an IPI used for testing proper operation of delivering IPIs.


85670 29-Oct-2001 marcel

Make the clock vector 255 instead of 240. On Lion boxes, 240 is
the AP wake-up vector. We probably want a more dynamic approach
to assigning vectors in the future...


85667 29-Oct-2001 marcel

Small correction in the LOCAL_SAPIC structure. The Flags field
starts at offset 8; not 6. Hence the structure is 12 bytes and
not 10 bytes. Adjust the definition so that the ProcessorEnabled
flag is moved from bit 15 to bit 31 in the Flags field.

The definition now matches ACPI 2.0 Errata 1.5.


85656 29-Oct-2001 marcel

o Do not parse the MADT as a side-effect in AcpiOsGetRootPointer,
do it as a side-effect of probing for MP hardware. This allows
us to scan for local SAPICs early (especially before MBUF
initialization).
o Fix the Local SAPIC structure so that matches the Local SAPIC
table entry. Now that the Local SAPIC info is the same as the
Local APIC info, stop dumping the Local APIC entries.
o For every Local SAPIC entry in the MADT that's not disabled,
let the SMP code know about it. They represent actual CPUs.
o Register the OS_BOOT_RENDEZ entry point and provide a (bogus)
implementation for the entry point.
o Provide a mapping for internal IPI numbers to ExtINT vectors.
o In a MP system, announce the CPUs and start them by sending
IPI_AP_WAKEUP to each of them. Not that it makes a difference
at this time :-)
o Miscellaneous style fixes and other adjustments.


85556 26-Oct-2001 iwasaki

Add APM compatibility feature to ACPI.
This emulates APM device node interface APIs (mainly ioctl) and
provides APM services for the applications. The goal is to support
most of APM applications without any changes.
Implemented ioctls in this commit are:
- APMIO_SUSPEND (mapped ACPI S3 as default but changable by sysctl)
- APMIO_STANDBY (mapped ACPI S1 as default but changable by sysctl)
- APMIO_GETINFO and APMIO_GETINFO_OLD
- APMIO_GETPWSTATUS

With above, many APM applications which get batteries, ac-line
info. and transition the system into suspend/standby mode (such as
wmapm, xbatt) should work with ACPI enabled kernel (if ACPI works well :-)

Reviewed by: arch@, audit@ and some guys


85525 26-Oct-2001 jhb

Add a per-thread ucred reference for syscalls and synchronous traps from
userland. The per thread ucred reference is immutable and thus needs no
locks to be read. However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on: x86 (SMP), alpha, sparc64


85439 24-Oct-2001 dfr

* Clear the TLB on boot.
* If a pte for a location given to pmap_enter_quick is valid, just give
up - don't panic, even if the mapping is different.


85438 24-Oct-2001 dfr

If we get an unhandled page fault in kernel mode, either panic (if
pcb_onfault is not set) or arrange to restart at the location in
pcb_onfault.

This ought to help the stability of a system under moderate load. It
certainly stops DDB from hanging the kernel when it tries to access a
non-present page.


85401 24-Oct-2001 marcel

Remove call to cninit_finish. This is part of the multiple
low-level console support.


85399 24-Oct-2001 marcel

Add parse functions for local APIC and I/O APIC entries.
Also, show when a local APIC or SAPIC is disabled.


85383 23-Oct-2001 peter

Fix RAW dependency violation when compiled with gcc-3
Warning: Use of 'br.ret.sptk.many' violates RAW dependency 'PSR.tb' (data)


85360 23-Oct-2001 peter

Turn off the single-user override. We've been running multi-user
for some time. Having a machine boot unattended is useful. :-)


85356 23-Oct-2001 dfr

Add data serialisations after ptc and mov to rr[] instructions.


85335 23-Oct-2001 mike

Remove funky right justification.

Pointed out by: bde


85330 22-Oct-2001 dfr

In the signal trampoline, flush the register stack before calling
sigreturn. This appears to fix the last set of problems with csh.


85304 22-Oct-2001 obrien

Setup for a 200MB FS -- 209715200/512= 409600 sectors.

(DFR's latest ia64-root-*.tar.gz leaves only 7.7M avail when created by
dd if=/dev/zero of=ia64-root.fs bs=1024k count=200)


85297 21-Oct-2001 des

Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85294 21-Oct-2001 des

[partially forced commit due to pilot error in earlier commit attempt]

{set,fill}_{,fp,db}regs() fixup:

- Add dummy {set,fill}_dbregs() on architectures that don't have them.

- KSEfy the powerpc versions (struct proc -> struct thread).

- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.

These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
one yet.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85293 21-Oct-2001 des

{set,fill}_{,fp,db}regs() fixup:

- Add dummy {set,fill}_dbregs() on architectures that don't have them.

- KSEfy the powerpc versions (struct proc -> struct thread).

- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.

These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
{set,fill}_{,fp,db}regs() fixup:

- Add dummy {set,fill}_dbregs() on architectures that don't have them.

- KSEfy the powerpc versions (struct proc -> struct thread).

- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.

These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
one yet.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85286 21-Oct-2001 dfr

Add some more names for bits of trapframe.


85285 21-Oct-2001 dfr

We need to save a bit more information in the partial syscall trapframe
in case we need to take a signal.


85284 21-Oct-2001 dfr

Set ar.fpsr to something sane before trying to handle a trap - the user
might have trashed it.


85283 21-Oct-2001 dfr

Use ia64_set_fpsr() instead of __asm to set ar.fpsr.


85282 21-Oct-2001 dfr

Add ia64_set_fpsr().


85276 21-Oct-2001 marcel

Implement the IPI send functions. No mapping between IPI message
Id and interrupt vector has been made yet.


85269 21-Oct-2001 marcel

Add define for the PIB default address and include a reference to
the SDM.


85255 20-Oct-2001 mjacob

Remove wx.


85230 20-Oct-2001 dfr

Reserve space for signal state.


85211 20-Oct-2001 marcel

Save the AP wake-up vector from the SAL descriptor under SMP.
Note that the descriptor is optional. Add a comment to indicate
that we want to register the OS_BOOT_RENDEZ here as well.


85210 20-Oct-2001 marcel

Make this compile under option SMP.


85199 19-Oct-2001 dfr

Make a start at an unaligned trap handler. Only integer loads and stores
are handled so far.


85195 19-Oct-2001 dfr

Translate various userland traps into SIGBUS (instead of just panicing).


85189 19-Oct-2001 ru

Try two on the preprocessing logic.

Reviewed by: obrien


85185 19-Oct-2001 jhb

Remove unneeded sys/mutex.h includes.


85183 19-Oct-2001 obrien

Blah, fix braino where ru had to remind me of proper preprocessor syntax.
Bad fingers, no cookie.


85142 19-Oct-2001 dfr

Rework pmap so that it separates the PTE structure from the pv_entry
structure. This makes it possible to pre-allocate PTEs for the kernel,
which is necessary for a reliable implementation of pmap_kenter(). This
also avoids wasting space (about 48 bytes per page) for kernel mappings
and user mappings of memory-mapped devices.

This also fixes a bug with the previous version where the implementation
required the pv_entry structure to be physically contiguous but did not
enforce this (the structure size was not a power of two). This meant
that the pv_entry free list was quickly corrupted as soon as the system
was even mildly loaded.


85109 18-Oct-2001 dfr

Shift the code which packs and unpacks instruction bundles out of DDB
since it is useful for various emulations duties (e.g. unaligned trap
handling).


85088 18-Oct-2001 marcel

Fix typos in previous commit:

o s/sys_narg/sy_narg/
o s/SYS_MPSAFE/SYF_MPSAFE/


85085 18-Oct-2001 obrien

Add support for "__gnuc_va_list". Some overly "smart" libraries assume
the existence of the __gnuc_va_list type[*] because our compiler is GCC.

[*] __gnuc_va_list is defined in the GCC ginclude/stdarg.h replacement
headerwhich we don't use.


85082 17-Oct-2001 jhb

- Small cleanups to the Giant handling in trap().
- Only release Giant in trap() if we locked it, otherwise we could release
Giant in a kernel trap if we didn't get it for a page fault and the
previous frame had grabbed the lock.
- Only get Giant for !MP safe syscalls.


85035 16-Oct-2001 mjacob

Make SCSI changer and SES devices standard in generic kernels.

Reviewed by: ken@kdm.org


85024 16-Oct-2001 dfr

Size the number of pv_entries we use to bootstrap the pv_entry allocator
based on the size of physical memory. This should eliminate the tweaking
needed for larger memory configurations.


84966 15-Oct-2001 marcel

When compiling with SKI support, create the fake memory regions
when either the memory descriptor in the bootinfo is NULL or
the descriptor count is 0.


84876 13-Oct-2001 dfr

Only the first eight arguments can possibly be in stacked registers.


84840 12-Oct-2001 dfr

Pass the correct trapframe pointer to fork_exit - sp is trapframe-16.


84839 12-Oct-2001 dfr

If the faulting instruction is a cmpxchg, then isr.w and isr.r will both
be set. We need to check isr.w before isr.r so that we can correctly
handle a cmpxchg to a copy-on-write page.

This fixes the hang-after-fork problem for dynamically linked programs.


84801 11-Oct-2001 dfr

Implement MCOUNT hook for assembler. Probably doesn't work right.


84800 11-Oct-2001 dfr

Implement mcount trampoline (untested).


84798 11-Oct-2001 dfr

* Change the calling convention for execve so that it conforms to normal
C calling conventions. This allows crt1.c to be written nearly without
any inline assembler.
* Initialise cpu_model[] so that the hw.model sysctl works properly.


84783 10-Oct-2001 ps

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


84751 10-Oct-2001 dfr

Add a definition for the ia64's special PLT_RESERVE entry in the _DYNAMIC
section.


84732 09-Oct-2001 dfr

Clarify a comment.

Requested by: jhb


84714 09-Oct-2001 dfr

Don't include isavar.h - we don't need it.


84713 09-Oct-2001 dfr

Add a minimalist kernel config which can run inside SKI.


84677 08-Oct-2001 dfr

Make printtrap() more informative.


84638 07-Oct-2001 dfr

Implement inline versions of ntohl etc.


84637 07-Oct-2001 des

Dissociate ptrace from procfs.

Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".


84632 07-Oct-2001 dfr

* Use srlz.i to serialise changes to psr.ic
* Don't enable psr.i at the same time as psr.dt and psr.ic

These changes improve stability considerably.


84621 07-Oct-2001 dfr

Remove bogus include.


84592 06-Oct-2001 dfr

Move console probes until after we set boothowto so that 'boot -h' works.


84590 06-Oct-2001 dfr

Assume round-to-nearest mode for floating point.


84583 06-Oct-2001 dfr

Delete legacy pcib code - we can't possibly work without acpi on ia64.


84581 06-Oct-2001 marcel

o Change ia64_memory_address to explicitly take a u_int64_t
o Add memcpy_fromio, memcpy_io, memcpy_toio, memset_io,
memsetw and memsetw_io. I'm not sure this is the right
place for it, though.


84557 05-Oct-2001 dfr

Add BOOTP support.


84556 05-Oct-2001 dfr

Fix some dependency violations (don't know why gas didn't catch this).


84555 05-Oct-2001 dfr

Use physical addresses, not virtual addresses when calling PHYS_TO_VM_PAGE.


84554 05-Oct-2001 dfr

Eliminate some alpha craziness.


84553 05-Oct-2001 dfr

In in_cksumdata, len must be a signed type.


84543 05-Oct-2001 dfr

Low-level code for programming the I/O SAPIC.


84541 05-Oct-2001 dfr

Wire up most of the interrupt handling infrastructure. Not sure it works
right yet but its enough for the ATA probe to work. The SCSI probes which
follow are broken though.


84540 05-Oct-2001 dfr

Fix typo which meant that we never actually found the ACPI 2.0 table.


84535 05-Oct-2001 dfr

Disable interrupts when we are in DDB.


84534 05-Oct-2001 dfr

Add ia64_get_lid().


84477 04-Oct-2001 dfr

Don't pretend the argument to clockattach is a device - it isn't.


84476 04-Oct-2001 dfr

* Don't pretend the object passed to clockattach is a device - it isn't.
* Declare itc_frequency properly.


84475 04-Oct-2001 dfr

Use EFI (or some reasonable simulation) to read the RTC.


84474 04-Oct-2001 dfr

Fake the EFI runtime call GetTime.


84447 04-Oct-2001 dfr

Add low-level ACPI support code and make a start on parsing the ACPI
interrupt information.


84412 03-Oct-2001 dfr

The encoding for the bus being passed to SAL was completely wrong.


84381 02-Oct-2001 mjacob

Fix problem where a user buffer outside of the area being tested
will be corrupted.

PR: 29194
Obtained from: Tor.Egge@fast.no
MFC after: 2 weeks


84349 02-Oct-2001 marcel

Remove redundant and misplaced "options DDB" line.


84131 29-Sep-2001 dfr

Support for SKI is now an option.


84130 29-Sep-2001 dfr

Make sio0 a console device.


84129 29-Sep-2001 dfr

Add a couple of arguments to ia64_init. I'll use them later to improve
the method of passing bootinfo from the loader.


84128 29-Sep-2001 dfr

Various changes to use the firmware on a real machine.


84127 29-Sep-2001 dfr

* Read parameters for ptc.e instruction from PAL Code.
* Add pmap_unmapdev().


84126 29-Sep-2001 dfr

Fake PAL Code for SKI.


84124 29-Sep-2001 dfr

Start hooking up devices.


84123 29-Sep-2001 dfr

Add pmap_unmapdev().


84122 29-Sep-2001 dfr

Fill out the firmware interfaces somewhat.


84121 29-Sep-2001 dfr

Add code to initialise firmware resources (and to fake them if we are
running in simulation).


84120 29-Sep-2001 dfr

Add shims for calling PAL Code in physical mode.


84118 29-Sep-2001 dfr

Add some move definitions.


84117 29-Sep-2001 dfr

Call cpu_boot from cpu_reset.


84116 29-Sep-2001 dfr

Give up on the backtrace if the calculated pc isn't in region 7.


84115 29-Sep-2001 dfr

Use PAGE_SHIFT instead of a hardcoded constant for log2(PAGE_SIZE).


84114 29-Sep-2001 dfr

* Preserve ar.rsc in ia64_change_mode.
* Convert sp to/from physical in ia64_change_mode.
* Add a shim for calling EFI procedures in virtual mode.


84113 29-Sep-2001 dfr

Change END(locorestart) to END(__start).


83983 26-Sep-2001 rwatson

o Modify the access control checks for the ia64 /dev/mem (and friends)
to use securelevel_gt() instead of direct variable checks.

Obtained from: TrustedBSD Project


83964 26-Sep-2001 dfr

Tidy up and fix a runtime warning.


83936 25-Sep-2001 brooks

The faith(4) device is no longer a count device so don't specify a count.


83913 24-Sep-2001 dfr

Use b6 instead of b1 - b1 is supposed to be preserved and b6 is scratch.


83912 24-Sep-2001 dfr

Make the Alternate {I,D} TLB vector code actually work for virtual
addresses greater than 256M (the page size for region 6 and 7).


83908 24-Sep-2001 dfr

Don't try to access external files from SKI unless we are actually running
in SKI.


83907 24-Sep-2001 dfr

Increase the number of bootstrap PVs.


83906 24-Sep-2001 dfr

Include <machine/pte.h> instead of <machine/pmap.h>


83905 24-Sep-2001 dfr

We need different call stubs for static and stacked calling conventions.


83900 24-Sep-2001 dfr

Factor out PTE and related definitions from pmap.h - they are useful in
the loader.


83895 24-Sep-2001 dfr

Fix a few comment typos from the last commit.


83893 24-Sep-2001 dfr

Add some code which can be used to change to/from physical mode when
calling various firmware functions.


83872 24-Sep-2001 obrien

+ Fix misplacement of `txp'
+ Document our -CURRENT debugging bits


83856 23-Sep-2001 dfr

Add definitions of SAL System Table.


83837 22-Sep-2001 dfr

Don't activate the ssc console unless we are running in SKI.


83836 22-Sep-2001 dfr

Add implementations of readx() and writex().


83835 22-Sep-2001 dfr

Add declaration of ia64_running_in_simulator().


83834 22-Sep-2001 dfr

* Turn off memory descriptor debugging - its served its purpose.
* Don't get confused when memory regions don't lie on page boundaries -
remember our page size is typically larger than the firmware's page size.
* Add a function ia64_running_in_simulator() which is intended to detect
whether the kernel is running in SKI or on real hardware.


83833 22-Sep-2001 dfr

Remove a redundant stop.


83767 21-Sep-2001 dfr

Fix a warning and make sure we flush the cache after writing an
instruction bundle otherwise the CPU won't see the changed bundle.


83766 21-Sep-2001 dfr

Add ia64_fc().


83734 20-Sep-2001 dfr

If two @fptr relocations refer to the same symbol, use the same fptr
structure to resolve them. This is necessary to allow code to compare
function pointers.


83733 20-Sep-2001 dfr

Don't clear the single-step bit after a trap - leave it up to the
debugger. The code was broken anyway - it clear every bit *except* the
single-step bit (oops).


83732 20-Sep-2001 dfr

The second instruction in an MLX bundle is slot one, not slot two, even
though the actual opcode is stored in the value in slot two.


83727 20-Sep-2001 dfr

Tidy.


83718 20-Sep-2001 dfr

Don't include NFS headers. I have no idea why they were here in the first
place - NFS has no assembler in it.


83691 20-Sep-2001 peter

Replicate a change from alpha/genassym.c to other arches. This should
fix nfs-related build breakage.


83651 18-Sep-2001 peter

Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.


83644 18-Sep-2001 jhb

Whitespace fixes.


83643 18-Sep-2001 jhb

- If we ever do the per-cpu KTR stuff, the index won't be volatile as it
will be private to each CPU.
- Re-style(9) the globaldata structures. There really needs to be a MI
struct pcpu that has a MD struct mdpcpu member at some point.


83622 18-Sep-2001 dfr

Add ia64_get_cpuid().


83611 18-Sep-2001 dfr

Flesh out identifycpu().


83522 15-Sep-2001 dfr

Rearrange so we search for I/O port space as early as possible (i.e.
before console probing). Also fix a confusion between EFI's page size
which is fixed at 4096 and our own page size which is variable at compile
time.


83520 15-Sep-2001 dfr

Avoid the region used for thread0's trapframe when setting up the stack
for ia64_init. If we use this area for ia64_init's stack, it ends up
containing garbage which causes cpu_fork to die horribly later.


83512 15-Sep-2001 dfr

Use the MI console code to initialise the console.


83511 15-Sep-2001 dfr

Implement inx() and outx() functions for accessing I/O ports.


83510 15-Sep-2001 dfr

Add ia64_mf_a() which executes an mf.a instruction.


83509 15-Sep-2001 dfr

* Use Intel's EFI headers instead of home-grown ones.
* Use the bootinfo's memory map if present instead of hard-coding SKI's
memory map.
* Record the location of the I/O Port Space if present in the memory map.


83506 15-Sep-2001 dfr

Fill out some gaps in ia64 DDB support. This involves generalising DDB's
breakpoint handling slightly to cope with the fact that ia64 instructions
are not located on byte boundaries.


83499 15-Sep-2001 dfr

Sync the PCI NIC sections with i386.


83407 13-Sep-2001 dfr

* Enable dynamically linked kernel. This involves adding a self-relocator
to locore to process the @fptr relocations in the dynamic executable.
* Don't initialise the timer until *after* we install the timecounter to
avoid a race between timecounter initialisation and hardclock.
* Tidy up bootinfo somewhat including adding sanity checks for when the
kernel is loaded without a recognisable bootinfo.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83359 12-Sep-2001 marcel

o Fix struct ssc_time and enable the SSC call to get the RTC.
o Print a message that the TODR is not set in sscclock_set.


83301 10-Sep-2001 dfr

* Make a start on a realistic definition for bootinfo.
* Switch to proc0's stack and backing store before calling ia64_init
so that we don't rely on the loader's stack at all.
* Change kernel entry point name from locorestart to __start.


83276 10-Sep-2001 peter

Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by: jake, tmm, dillon


83223 08-Sep-2001 peter

Missing part of dillon's coredump commit. cpu_coredump() was still
passing IO_NODELOCKED to vn_rdwr(), this would cause operations on the
unlocked core vnode and softupdates nastiness if an a.out binary cored.


83198 07-Sep-2001 dfr

Add options to select between 4k, 8k and 16k page sizes on ia64. The
default is now 8k.


83197 07-Sep-2001 dfr

Typo in comment.


83196 07-Sep-2001 dfr

* Track ref/mod information properly when a mapping changes.
* Fix a panic in pmap_remove() for a non-current pmap.


83195 07-Sep-2001 dfr

Remove old setjmp/longjmp stubs.


83163 06-Sep-2001 jhb

Call sendsig() with the proc lock held and return with it held.


83156 06-Sep-2001 dfr

Add struct tags to avoid warnings in kernel code.


83047 05-Sep-2001 obrien

style(9) the structure definitions.


82938 04-Sep-2001 peter

Nuke #if 0'ed "setredzone()" stub. We never used it, and probably
never will. I've implemented an optional redzone as part of the KSE
upage breakup.


82867 03-Sep-2001 dfr

Add a working version of setjmp/longjmp.

Obtained from: Intel's EFI toolkit.


82857 03-Sep-2001 peter

Since we're cross compiling from x86, ignore the x86 CPUTYPE by default.


82849 03-Sep-2001 peter

Dont conflict with sysctl debug.mddebug


82788 02-Sep-2001 peter

Sync with i386 / alpha. Whitespace unindent / style prep for kse.


82785 02-Sep-2001 peter

Merge from i386: various cleanups including moving the map calculations
to MI code. This gets ia64 to compile again.


82632 31-Aug-2001 peter

Same as i386/i386/pmap.c: clean up some style. This is irrelevant since
it is inside #if 0'ed code, but it would be a shame if this stuff got
cut/pasted elsewhere.


82530 30-Aug-2001 mike

o Remove some GCCisms in src/powerpc/include/endian.h.
o Unify <machine/endian.h>'s across all architectures.
o Make bswapXX() functions use a different spelling of u_int16_t and
friends to reduce namespace pollution. The bswapXX() functions
don't actually exist, but we'll probably import these at some
point. Atleast one driver (if_de) depends on bswapXX() for big
endian cases.
o Deprecate byteorder(3) prototypes from <sys/types.h>, these are
now prototyped indirectly in <arpa/inet.h>.
o Deprecate in_addr_t and in_port_t typedefs in <sys/types.h>, these
are now typedef'd in <arpa/inet.h>.
o Change byteorder(3) prototypes to use standards compliant uint32_t
(spelled __uint32_t to reduce namespace pollution).
o Document new preferred headers and standards compliance.

Discussed with: bde
PR: 29946
Reviewed by: bmilekic


82393 27-Aug-2001 peter

Enable hardwiring of things like tunables from embedded enironments
that do not start from loader(8).


82313 25-Aug-2001 peter

vm_page_zero_idle() is no longer MD.


82107 21-Aug-2001 peter

Strip out some #if's for old implementations of global data pointers.


82025 21-Aug-2001 peter

Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this
stuff.


81763 16-Aug-2001 obrien

style(9) and make consistent across platforms


81727 15-Aug-2001 ache

OFF_T -> OFF (more standard style)


81720 15-Aug-2001 ache

Add OFF_T_MAX/OFF_T_MIN


81675 15-Aug-2001 obrien

Style changes to commonize the various platforms.


81493 10-Aug-2001 jhb

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


81265 08-Aug-2001 peter

Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.


81255 07-Aug-2001 jhb

Grab Giant arond page faults. ia64 boots again in the simulator now.


81198 06-Aug-2001 dfr

Make this compile again.


81197 06-Aug-2001 dfr

Remove usage of nonexistent vm_mtx.


80729 31-Jul-2001 jhb

GC some obsolete alpha code.


80700 31-Jul-2001 jake

Use a machine dependent type, Elf_Hashelt, for the elements of the elf
dynamic symbol table buckets and chains. The sparc64 toolchain uses 32
bit .hash entries, unlike other 64 bits architectures (alpha), which use
64 bit entries.

Discussed with: dfr, jdp


80431 27-Jul-2001 peter

Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel
for this, since it is easy to run into with large systems with lots of
shared mmap space.

Obtained from: yahoo


80421 26-Jul-2001 peter

Call the early tunable setup functions as soon as kern_envp is available.
Some things depend on hz being set not long after this.


80399 26-Jul-2001 bmilekic

- Do not handle the per-CPU containers in mbuf code as though the cpuids
were indices in a dense array. The cpuids are a sparse set and treat
them as such, setting up containers only for CPUs activated during
mb_init().

- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
map, in accordance with the above.

This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.

Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob


79573 11-Jul-2001 bsd

Add 'hwatch' and 'dhwatch' ddb commands analogous to 'watch' and
'dwatch'. The new commands install hardware watchpoints if supported
by the architecture and if there are enough registers to cover the
desired memory area.

No objection by: audit@, hackers@

MFC after: 2 weeks


79418 08-Jul-2001 julian

A set of changes to reduce the number of include files the kernel
takes from /usr/include. I cannot check them on alpha.. (will try beast)

Briefly looked at by: Warner Losh <imp@harmony.village.org>


79265 05-Jul-2001 dillon

Move vm_page_zero_idle() from machine-dependant sections to a
machine-independant source file, vm/vm_zeroidle.c. It was exactly the
same for all platforms and updating them all was getting annoying.


79263 04-Jul-2001 dillon

Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)


79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


79123 03-Jul-2001 jhb

Allow Giant to be recursed when a process terminates.


79106 02-Jul-2001 brooks

gif(4) and stf(4) modernization:

- Remove gif dependencies from stf.
- Make gif and stf into modules
- Make gif cloneable.

PR: kern/27983
Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week


79053 01-Jul-2001 imp

Don't need the .keep_me files. Obrien and I committed past each other.

Add 0-9 to the list of possible kernel names at matsushita-san's
suggestion.

Submitted by: Makoto MATSUSHITA-san <matusita@jp.FreeBSD.org>


79030 30-Jun-2001 obrien

Ensure sys/${MACHINE}/compile/FOO exists

Reviewed by: arch, imp, peter, and the USENIX terminal room secret kernel cabal


79026 30-Jun-2001 imp

Really do proper keepme files in the compile directories. Use
.cvsignore file for [A-Za-z]* to keep these directories around rather
than waste a file on .keepme. This should also make people's built
trees place nice with CVS.

Idea for .cvsignore: peter (although I suggested the regexp)
Pointed out by: Makoto MATSUSHITA-san <matusita@jp.FreeBSD.org>
Llama's costuming by: Fernamdo Llamas


79019 30-Jun-2001 obrien

Ensure sys/${MACHINE}/compile/FOO exists

Reviewed by: arch, imp, peter and
the USENIX terminal room secret kernel cabal


79008 30-Jun-2001 imp

Repo copy i8237.h to dev/ic so we can get rid of some of the final vestiges
of includes of i386 files from non-i386 ports.


78983 29-Jun-2001 jhb

Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.


78962 29-Jun-2001 jhb

Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by: jake (in principle)


78888 27-Jun-2001 jhb

Catch up to mbuf allocator changes from last September so this compiles
again.


78887 27-Jun-2001 jhb

Make this compile again. Broken since June 1.


78762 25-Jun-2001 jhb

Fix cut-n-paste brain-o.

Pointy-hat to: me


78636 22-Jun-2001 jhb

- Grab the proc lock around CURSIG and postsig(). Don't release the proc
lock until after grabbing the sched_lock to avoid CURSIG racing with
psignal.
- Don't grab Giant for addupc_task() as it isn't needed.

Reported by: tegge (signal race), bde (addupc_task a while back)


78269 15-Jun-2001 peter

oops. prepare_usermode() died in August 2000 in the MI and x86 code.

Issue raised by: scottl


77931 09-Jun-2001 obrien

Fix style of defines.


77800 06-Jun-2001 joerg

Nuke the various poorly maintained copies of ioctl_fd.h. The file is
not machine-dependant, thus it has been moved out (repo-copied) into
<sys/fdcio.h>.


77796 06-Jun-2001 jhb

Don't hold sched_lock across addupc_task().

Reported by: David Taylor <davidt@yadt.co.uk>
Submitted by: bde


77626 02-Jun-2001 phk

Properly wrap mtx_intr_enable() macro in "do $bla while (0)"


77582 01-Jun-2001 tmm

Clean up the code exporting interrupt statistics via sysctl a bit:
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs

Requested by: bde
Reviewed by: bde (some time ago)


77507 30-May-2001 jhb

Catch up to the axeing of MFS and fix the ia64 build.

Forgotten by: a Danish axe-wielder


77448 30-May-2001 jhb

- Catch up to the VM mutex changes.
- Sort includes in a few places.


77414 29-May-2001 phk

Remove MFS options from all example kernel configs.


77031 23-May-2001 ru

- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.


76784 18-May-2001 obrien

Style changes -- revert ordering to mostly two revs ago.
Embellish some comments, fix tab'ing.

Requested by: bde


76770 17-May-2001 jhb

- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT.
- Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be
toggled after boot.
- Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just
afer the display of the copyright message instead of doing it by hand in
three MD places.


76701 16-May-2001 obrien

Consistently define the rune types.
Follow NetBSD's lead and add a _BSD_MBSTATE_T_ type.


76700 16-May-2001 obrien

Move the int typedefs to the top so they can be used in defining other types.
Ensure every platform has __offsetof.
Make multiple inclusion detection consistent with other
<platform>/include/*.h files.


76659 16-May-2001 jhb

Lock the procfs functions for doing a single step and reading/writing
registers better. Hold sched_lock not only for checking the flag but
also while performing the actual operation to ensure the process doesn't
get swapped out by another CPU while we the operation is being performed.


76651 15-May-2001 jhb

"Sir, the deorbit burn completed succesfully."

RIP {sys/machine}/ipl.h.


76650 15-May-2001 jhb

Remove unneeded includes of sys/ipl.h and machine/ipl.h.


76554 13-May-2001 phk

Convert DEVFS from an "opt-in" to an "opt-out" option.

If for some reason DEVFS is undesired, the "NODEVFS" option is
needed now.

Pending any significant issues, DEVFS will be made mandatory in
-current on july 1st so that we can start reaping the full
benefits of having it.


76494 11-May-2001 jhb

Simplify the vm fault trap handling code a bit by using if-else instead of
duplicating code in the then case and then using a goto to jump around
the else case.


76440 10-May-2001 jhb

- Split out the support for per-CPU data from the SMP code. UP kernels
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.

Reviewed by: jake


76411 09-May-2001 jhb

Add include of sys/mutex.h and resort include of sys/lock.h.


76410 09-May-2001 jhb

Add needed sys/lock.h include.


76322 06-May-2001 phk

Actually biofinish(struct bio *, struct devstat *, int error) is more general
than the bioerror().

Most of this patch is generated by scripts.


76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


75914 24-Apr-2001 dfr

When switching backing store during signal delivery, do the switch before
creating the register frame for calling the handler. Also discard that
frame before switching back to the old backing store after the handler
returns.


75913 24-Apr-2001 dfr

Align stack pointer and backing store pointer to 16 byte boundary when
delivering signals.


75912 24-Apr-2001 dfr

Don't trash the user's pr on syscalls.


75701 19-Apr-2001 dfr

Don't unwrap the function descriptor used as the callout argument to
fork_exit(). The MI version of fork_exit() needs a real function
descriptor, not a simple function pointer.


75700 19-Apr-2001 dfr

Don't take the Giant mutex for clock interrupts.


75668 18-Apr-2001 dfr

Don't panic when we try to modify the kernel pmap.


75667 18-Apr-2001 dfr

Print an approximation of the function arguments in the stack trace.


75666 18-Apr-2001 dfr

Implement a simple stack trace for DDB. This will have to be redone
if/when we change to a more modern toolchain.


75665 18-Apr-2001 dfr

Record the right value for tf_ndirty for kernel interruptions so that
we can examine the interrupted register stack frame in DDB.


75528 15-Apr-2001 obrien

Turn on kernel debugging support (DDB, INVARIANTS, INVARIANT_SUPPORT, WITNESS)
by default while SMPng is still being developed.

Submitted by: jhb


75421 11-Apr-2001 jhb

Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just
"redundant noise" and to match the IPI constant namespace (IPI_*).

Requested by: bde


75002 29-Mar-2001 obrien

Reduce the emasculation of bounds_check_with_label() by one line, so we
propagate a bio error condition to the caller and above.


74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


74912 28-Mar-2001 jhb

Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.


74903 28-Mar-2001 jhb

Switch from save/disable/restore_intr() to critical_enter/exit().


74902 28-Mar-2001 jhb

Catch up to the mtx_saveintr -> mtx_savecrit change.


74900 28-Mar-2001 jhb

- Switch from using save/disable/restore_intr to using critical_enter/exit
and change the u_int mtx_saveintr member of struct mtx to a critical_t
mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules. Change all the
_mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
parameters and rename them to use a prefix of two underscores. Inside
of kern_mutex.c, generate wrapper functions for
_mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
that are called from modules. The macros mtx_{un,}lock_{spin,}_flags()
are mapped to the __mtx_* macros inside of the kernel to inline the
usual case of mutex operations and map to the internal _mtx_* functions
in the module case so that modules will use WITNESS and KTR logging if
the kernel is compiled with support for it.


74897 28-Mar-2001 jhb

- Add the new critical_t type used to save state inside of critical
sections.
- Add implementations of the critical_enter() and critical_exit() functions
and remove restore_intr() and save_intr().
- Remove the somewhat bogus disable_intr() and enable_intr() functions on
the alpha as the alpha actually uses a priority level and not simple bit
flag on the CPU.


74810 26-Mar-2001 phk

Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.


74746 24-Mar-2001 ume

Unbreak build on alpha.
- Move in_port_t to sys/types.h.
- Nuke in_addr_t from each endian.h.

Reported by: jhb


74733 24-Mar-2001 jhb

- Define and use MAXCPU like the alpha and i386 instead of NCPUS.
- Sort the sys/mutex.h include in mp_machdep.c into a closer to correct
location.


74732 24-Mar-2001 jhb

Stick a prototype for handleclock() in machine/clock.h and include it
interrupt.c to quiet a warning.


74670 23-Mar-2001 tmm

Export intrnames and intrcnt as sysctls (hw.nintr, hw.intrnames and
hw.intrcnt).

Approved by: rwatson


74384 17-Mar-2001 peter

Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t. This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is
around 45% faster with -march=pentiumpro on a p6 cpu.


74031 09-Mar-2001 dfr

Allow the config file to specify a root filesystem filename.


74030 09-Mar-2001 dfr

Adjust a comment slightly.


74016 09-Mar-2001 jhb

Fix mtx_legal2block. The only time that it is bad to block on a mutex is
if we hold a spin mutex, since we can trivially get into deadlocks if we
start switching out of processes that hold spinlocks. Checking to see if
interrupts were disabled was a sort of cheap way of doing this since most
of the time interrupts were only disabled when holding a spin lock. At
least on the i386. To fix this properly, use a per-process counter
p_spinlocks that counts the number of spin locks currently held, and
instead of checking to see if interrupts are disabled in the witness code,
check to see if we hold any spin locks. Since child processes always
start up with the sched lock magically held in fork_exit(), we initialize
p_spinlocks to 1 for child processes. Note that proc0 doesn't go through
fork_exit(), so it starts with no spin locks held.

Consulting from: cp


73936 07-Mar-2001 jhb

Unrevert the pmap_map() changes. They weren't broken on x86.

Sense beaten into me by: peter


73931 07-Mar-2001 jhb

- Release Giant a bit earlier on syscall exit.
- Don't try to grab Giant before postsig() in userret() as it is no longer
needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.


73929 07-Mar-2001 jhb

Grab the process lock while calling psignal and before calling psignal.


73922 07-Mar-2001 jhb

Use the proc lock to protect p_pptr when waking up our parent in cpu_exit()
and remove the mpfixme() message that is now fixed.


73903 07-Mar-2001 jhb

Back out the pmap_map() change for now, it isn't completely stable on the
i386.


73867 06-Mar-2001 jhb

Don't psignal() a process from forward_hardclock() but set the appropriate
pending flag in p_sflag instead.


73862 06-Mar-2001 jhb

- Rework pmap_map() to take advantage of direct-mapped segments on
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.

Submitted by: dfr
Debugging help: peter
Tested by: me


73555 04-Mar-2001 dfr

Fix a couple of typos which became obvious when I started to actually use
this on real hardware.


72991 24-Feb-2001 jhb

sched_swi -> swi_sched


72990 24-Feb-2001 jhb

Don't include machine/mutex.h and relocate sys/mutex.h's include to be
closer to alphabetical order and identical to that of the alpha.


72988 24-Feb-2001 jhb

Clockframes have a trapframe stored in a cf_tf member, not ct_tf.


72985 24-Feb-2001 jhb

Whitespace nits.


72984 24-Feb-2001 jhb

Pass in process to mark ast on to aston().


72907 22-Feb-2001 jhb

Axe pcb_schednest as it is no longer used.


72906 22-Feb-2001 jhb

Rename switch_trampoline() to fork_trampoline() on the alpha and ia64.

Suggested by: dfr


72905 22-Feb-2001 jhb

Don't set the sched_lock lesting level for new processes as it is no
longer used.


72904 22-Feb-2001 jhb

Catch comments up to child_return() -> fork_return() as well.


72903 22-Feb-2001 jhb

Synch up with the other architectures:
- Remove unneeded spl()'s around mi_switch() in userret().
- Don't hold sched_lock across addupc_task().
- Remove the MD function child_return() now that the MI function
fork_return() is used instead.
- Use TRAPF_USERMODE() instead of dinking with the trapframe directly to
check for ast's in kernel mode.
- Check astpending(curproc) and resched_wanted() in ast() and return if
neither is true.
- Use astoff() rather than setting the non-existent per-cpu variable
astpending to 0 to clear an ast.


72899 22-Feb-2001 jhb

Use the MI fork_return() fork trampoline callout function for child
processes instead of the MD child_return().


72898 22-Feb-2001 jhb

- Don't dink with sched_lock in cpu_switch() since mi_switch() does this
for us.
- Change the switch_trampoline() to call fork_exit() passing in the
required arguments instead of calling the fork trampoline callout
function directly.
Warning: this hasn't been tested.

Looked over by: dfr


72896 22-Feb-2001 jhb

- Axe the now unused ASS_* assertions for interrupt status.
- Use ia64_get_psr() instead of save_intr() in mtx_legal2block().


72895 22-Feb-2001 jhb

Add a inline function to read the psr.


72894 22-Feb-2001 jhb

Add a mtx_intr_enable() macro.


72893 22-Feb-2001 jhb

Axe the astpending per-cpu variable.


72892 22-Feb-2001 jhb

Add TRAPF_PC() and TRAPF_USERMODE() macros and redefine CLKF_PC() and
CLKF_USERMODE() in terms of them.


72884 22-Feb-2001 jhb

Catch up to new MI astpending and need_resched handling.


72569 17-Feb-2001 ume

Correct disordering which is corresponding to bde's fix to
i386/include/ansi.h.


72510 15-Feb-2001 ume

Correct 2nd argument of getnameinfo(3) to socklen_t.

Reviewed by: itojun


72376 12-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


72358 11-Feb-2001 markm

RIP <machine/lock.h>.

Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb


72226 09-Feb-2001 jhb

Move the initailization of the proc lock for proc0 very early into the MD
startup code.


72221 09-Feb-2001 jhb

Remove bogus #if 0'd code that dinked with the saved interrupt state in
sched_lock.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


72011 04-Feb-2001 peter

Clean up some leftovers from the root mount cleanup that was done some
time ago. FFS_ROOT and CD9660_ROOT are obsolete.


71984 04-Feb-2001 peter

All the world is not an i386. Merge rev 1.438 of i386/i386/machdep.c.
Make buffer_map a system map.


71880 31-Jan-2001 peter

Remove count for NSIO. The only places it was used it were incorrect.
(alpha-gdbstub.c got sync'ed up a bit with the i386 version)


71803 29-Jan-2001 dfr

Flesh out EFI support somewhat.


71785 29-Jan-2001 peter

Send "#if NISA > 0" to the bit-bucket and replace it with an option.
These were compile-time "is the isa code present?" tests and not
'how many isa busses' tests.


71731 28-Jan-2001 marcel

Add gd_witness_spin_check.


71730 28-Jan-2001 marcel

Fix typo.


71729 28-Jan-2001 marcel

Improve kernel bootstrapping:
o Use objdump instead of gensetdefs(1) to build the linker sets.
o Allow overriding of nm and objdump in resp. genassym.sh and
gensetdefs.pl for non-native toolchains.

Reviewed by: arch
Perl improvements: Jos Backus <josb@cncdsl.com>, benno


71684 26-Jan-2001 dfr

Initialise proc0.p_heldmtx and proc0.p_contested and call
mtx_enter(&Giant, MTX_DEF) after Giant is initialised.

Reviewed by: jhb


71596 24-Jan-2001 dfr

Change cpuno to cpuid.


71595 24-Jan-2001 dfr

Fix typo.


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


71554 24-Jan-2001 jhb

- Proc locking.
- P_OWEUPC -> PS_OWEUPC.


71553 24-Jan-2001 jhb

- Proc locking.
- Update userret() to take a struct trapframe * as a second argument.
- Axe have_giant and use mtx_owned(&Giant) where appropriate.


71552 24-Jan-2001 jhb

- Proc locking.
- P_FOO -> PS_FOO.


71551 24-Jan-2001 jhb

- Proc locking.
- Bring across forwarded_statclock() fixes from i386 and alpha.


71350 21-Jan-2001 des

First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
- remove zalloci() and zfreei(), which are now redundant.

Reviewed by: bmilekic, jasone


71337 21-Jan-2001 jake

Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By: peter


71320 21-Jan-2001 jasone

Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.

This change is necessary in order to avoid some circular bootstrapping
dependencies.


71241 19-Jan-2001 peter

Remove the now-empty ipl_funcs.c file on all platforms.


71240 19-Jan-2001 peter

Remove the static splXXX functions and replace them by static __inline
stubs. Remove the xxx_imask variables which have been all but gone for
a while.


71228 19-Jan-2001 bmilekic

Implement MTX_RECURSE flag for mtx_init().
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.

The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.

The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.

Reviewed by: jhb


71103 16-Jan-2001 phk

These files have been on deathrow for a couple of months, no appeal.


71037 14-Jan-2001 markm

Remove NOBLOCKRANDOM as a compile-time option. Instead, provide
exactly the same functionality via a sysctl, making this feature
a run-time option.

The default is 1(ON), which means that /dev/random device will
NOT block at startup.

setting kern.random.sys.seeded to 0(OFF) will cause /dev/random
to block until the next reseed, at which stage the sysctl
will be changed back to 1(ON).

While I'm here, clean up the sysctls, and make them dynamic.
Reviewed by: des
Tested on Alpha by: obrien


70952 12-Jan-2001 jake

Remove unused per-cpu variables inside_intr and ss_eflags.


70928 11-Jan-2001 jake

- Remove compatibility macros for accessing per-cpu variables.
__FreeBSD_version 500015 can be used to detect their disappearance.
- Move the symbols for SMP_prvspace and lapic from globals.s to
locore.s.
- Remove globals.s with extreme prejudice.


70861 10-Jan-2001 jake

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


70789 08-Jan-2001 obrien

Put VCS ids in a consistent place and form.


70786 08-Jan-2001 obrien

Remove seconds types we don't use that came in thru the NetBSD heiratage.


70723 06-Jan-2001 jake

Implement accessors for per-cpu variables which don't depend on the
symbols in globals.s.

PCPU_GET(name) returns the value of the per-cpu variable
PCPU_PTR(name) returns a pointer to the per-cpu variable
PCPU_SET(name, val) sets the value of the per-cpu variable

In general these are not yet used, compatibility macros remain.

Unifdef SMP struct globaldata, this makes variables such as cpuid
available for UP as well.

Rebuilding modules is probably a good idea, but I believe old
modules will still work, as most of the old infrastructure
remains.


70509 30-Dec-2000 dfr

Don't include <stddef.h> for offsetof() - its also defined in <sys/types.h>


70508 30-Dec-2000 dfr

Merge ALIGN changes from alpha code.


70507 30-Dec-2000 dfr

Fix typo.


70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


70210 20-Dec-2000 marcel

Resolve RAW dependency violation between tbit and adds.


70034 14-Dec-2000 jhb

Remove the "machine dependent" KTR trace buffer ddb commands. The code was
exactly the same on all platforms.


69966 13-Dec-2000 obrien

Sync with i386/GENERIC rev 1.294 removing "COMPAT_OLDPCI".

This fixed the broken kernel build on the Alpha.


69947 13-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


69881 12-Dec-2000 jake

- Add code to detect if a system call returns with locks other than Giant
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.


69812 10-Dec-2000 marcel

o Remove mcclock
o s/alpha/ia64/g


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


69586 05-Dec-2000 jake

Remove the last of the MD netisr code. It is now all MI. Remove
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.

Reviewed by: jhb


69399 30-Nov-2000 alfred

remove unneded sys/ucred.h includes


69379 30-Nov-2000 marcel

Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286


69207 26-Nov-2000 jlemon

Add 'mpsafe' parameter to callout_init() in MD bits.

Reminded by: jake


69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


69003 21-Nov-2000 markm

Add a consistent API to a feature that most modern CPUs have; a fast
counter register in-CPU.

This is to be used as a fast "timer", where linearity is more important
than time, and multiple lines in the linearity caused by multiple CPUs
in an SMP machine is not a problem.

This adds no code whatsoever to the FreeBSD kernel until it is actually
used, and then as a single-instruction inline routine (except for the
80386 and 80486 where it is some more inline code around nanotime(9).

Reviewed by: bde, kris, jhb


68974 20-Nov-2000 phk

Make programs which still #include <machine/{mouse,console}.h> fail
at compiletime, with an explanatory error message. Previously they
would only get a warning.

These files will be finally removed 2001-01-15


68889 19-Nov-2000 jake

- Protect the callout wheel with a separate spin mutex, callout_lock.
- Use the mutex in hardclock to ensure no races between it and
softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
CALLOUT_MPSAFE, which specifies that a callout handler does not
need giant. There is still no way to set this flag when
regstering a callout.

Reviewed by: -smp@, jlemon


68862 17-Nov-2000 jake

- Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb


68808 16-Nov-2000 jhb

Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by: witness


68762 15-Nov-2000 jhb

Don't perform an mi_switch() when we release Giant during cpu_exit(). We
are about to call cpu_switch() anyways.

Found by: witness


68520 09-Nov-2000 marcel

Make MINSIGSTKSZ machine dependent, and have the sigaltstack
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.

The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.

The native MINSIGSTKSZ is now defined as follows:

Arch MINSIGSTKSZ
---- -----------
alpha 4096
i386 2048
ia64 12288

Reviewed by: mjacob
Suggested by: bde


67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


67689 27-Oct-2000 markm

As the blocking model has seems to be troublesome for many, disable
it for now with an option.

This option is already deprecated, and will be removed when the
entropy-harvesting code is fast enough to warrant it.


67636 26-Oct-2000 dfr

Minor build fixes.


67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


67540 25-Oct-2000 jhb

Implement atomic_{set,clear,add,subtract}_{acq_,rel_,}_ptr()


67522 24-Oct-2000 dfr

* Various fixes to breakage introduced by the atomic and mutex reorgs.
* Fixes to the signal delivery code. Not quite right yet.

I would have preferred to wait until I have signal delivery actually
working but the current kernel in CVS doesn't build.


67488 24-Oct-2000 obrien

Adjust comments
Submitted by: bde

Add ISO C99's long long type limits.
Reviewed by: bde


67474 23-Oct-2000 mjacob

CURTHD now defines in globals.h


67403 20-Oct-2000 jhb

Define the mtx_legal2block() macro used in the witness code that managed
to get lost during the MI mutex conversion.

Reported by: Steve Kargl <sgk@troutmask.apl.washington.edu>


67358 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Catch up to the MI mutex structure due to saveflags,saveipl,savepsr
becoming saveintr.


67357 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for Giant and sched_lock.


67352 20-Oct-2000 jhb

- Make the mutex code almost completely machine independent. This greatly
reducues the maintenance load for the mutex code. The only MD portions
of the mutex code are in machine/mutex.h now, which include the assembly
macros for handling mutexes as well as optionally overriding the mutex
micro-operations. For example, we use optimized micro-ops on the x86
platform #ifndef I386_CPU.
- Change the behavior of the SMP_DEBUG kernel option. In the new code,
mtx_assert() only depends on INVARIANTS, allowing other kernel developers
to have working mutex assertiions without having to include all of the
mutex debugging code. The SMP_DEBUG kernel option has been renamed to
MUTEX_DEBUG and now just controls extra mutex debugging code.
- Abolish the ugly mtx_f hack. Instead, we dynamically allocate
seperate mtx_debug structures on the fly in mtx_init, except for mutexes
that are initiated very early in the boot process. These mutexes
are declared using a special MUTEX_DECLARE() macro, and use a new
flag MTX_COLD when calling mtx_init. This is still somewhat hackish,
but it is less evil than the mtx_f filler struct, and the mtx struct is
now the same size with and without mutex debugging code.
- Add some micro-micro-operation macros for doing the actual atomic
operations on the mutex mtx_lock field to make it easier for other archs
to override/optimize mutex ops if needed. These new tiny ops also clean
up the code in some places by replacing long atomic operation function
calls that spanned 2-3 lines with a short 1-line macro call.
- Don't call mi_switch() from mtx_enter_hard() when we block while trying
to obtain a sleep mutex. Calling mi_switch() would bogusly release
Giant before switching to the next process. Instead, inline most of the
code from mi_switch() in the mtx_enter_hard() function. Note that when
we finally kill Giant we can back this out and go back to calling
mi_switch().


67351 20-Oct-2000 jhb

- Expand the set of atomic operations to optionally include memory barriers
in most of the atomic operations. Now for these operations, you can
use the normal atomic operation, you can use the operation with a read
barrier, or you can use the operation with a write barrier. The function
names follow the same semantics used in the ia64 instruction set. An
atomic operation with a read barrier has the extra suffix 'acq', due to
it having "acquire" semantics. An atomic operation with a write barrier
has the extra suffix 'rel'. These suffixes are inserted between the
name of the operation to perform and the typename. For example, the
atomic_add_int() function now has 3 variants:
- atomic_add_int() - this is the same as the previous function
- atomic_add_acq_int() - this function combines the add operation with a
read memory barrier
- atomic_add_rel_int() - this function combines the add operation with a
write memory barrier
- Add 'ptr' to the list of types that we can perform atomic operations
on. This allows one to do atomic operations on uintptr_t's. This is
useful in the mutex code, for example, because the actual mutex lock is
a pointer.
- Add two new operations for doing loads and stores with memory barriers.
The new load operations use a read barrier before the load, and the
new store operations use a write barrier after the load. For example,
atomic_load_acq_int() will atomically load an integer as well as
enforcing a read barrier.


67350 20-Oct-2000 jhb

Axe the barrier_{read,write,rw}() helper functions as this method of
doing memory barriers doesn't really scale well for the ia64. Also,
memory barriers are more a property of the CPU than bus space.

Requested by: dfr


67325 19-Oct-2000 dfr

Don't force bootverbose anymore.


67324 19-Oct-2000 dfr

Decrease the number of ticks between clock interrupts by a factor of ten
to place more pressure on the exception handling code.


67323 19-Oct-2000 dfr

* Disable interrupts when restoring a trapframe.
* Make sure we reset ar.k6 (used to hold the kernel stack pointer when
we are returning to user mode after a syscall.


67285 18-Oct-2000 jhb

Add in a simple API for memory barriers to machine/bus.h:
- barrier_read() enforces a memory read barrier
- barrier_write() enforces a memory write barrier
- barrier_rw() enforces a memory read/write barrier


67247 17-Oct-2000 ps

Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before: After:
WCE TIME WCE TIME
------------------ ------------------
1 141.820972 1 15.600111
0 797.265072 0 65.480465

Obtained from: Yahoo!
Reviewed by: peter


67213 16-Oct-2000 dfr

In pmap_remove_pv(), only manipulate the page's list if the pv is
managed.


67212 16-Oct-2000 dfr

Do a full exception_restore after an execve syscall to ensure that the
new program gets the right values for its arguments etc.


67211 16-Oct-2000 dfr

Clear the register stack frame before using loadrs to invalidate the
stacked registers.


67210 16-Oct-2000 dfr

Clear ar.pfs for the child process in cpu_fork - switch_trampoline
doesn't want a stack frame.


67201 16-Oct-2000 dfr

Track changes to trapframe.


67199 16-Oct-2000 dfr

* Correct some of my misunderstandings about how best to switch to the
kernel backing store.
* Implement syscalls via break instructions.
* Fix backing store copying in cpu_fork() so that the child gets the right
register values.

This thing is actually starting to work now. This set of changes takes me
up to the second execve (the one which runs the first shell). Next stop
single-user mode :-).


67196 16-Oct-2000 dfr

Use the right mask for extracting sof from cr.ifs.


67195 16-Oct-2000 dfr

Remember to re-initialise cr.itm on clock interrupts so that we get more
than just one tick.


67194 16-Oct-2000 dfr

Merge a fix from the alpha port - put softintr in the right place in the
table.


67193 16-Oct-2000 dfr

Give names to app registers and control registers. Fix a typo handling
mov from branch register instructions.


67158 15-Oct-2000 phk

Move DELAY() from <machine/clock.h> to <sys/systm.h>


67032 12-Oct-2000 dfr

Implement a rudimentary interrupt handling system which should be good
enough for clock interrupts in SKI.


67031 12-Oct-2000 dfr

Turn off a debugging printf.


67020 12-Oct-2000 dfr

* Fix exception handling so that it actually works. We can now handle
exceptions from both kernel and user mode.
* Fix context switching so that we can switch back to a proc which we
switched away from (we were saving the state in the wrong place).
* Implement lazy switching of the high-fp state. This needs to be looked
at again for SMP to cope with the case of a process migrating from one
processor to another while it has the high-fp state.
* Make setregs() work properly. I still think this should be called
cpu_exec() or something.
* Various other minor fixes.

With this lot, we can execve() /sbin/init and we get all the way up to its
first syscall. At that point, we stop because syscall handling is not done
yet.


67018 12-Oct-2000 dfr

Fix this so that it can cope with transfers to/from regions which are not
physically contiguous.


67017 12-Oct-2000 dfr

* Allocate kernel stacks with contigmalloc() to make exception handling
safe - we can't afford to take a TLB trap when we are writing a
trapframe. Possibly revisit this later.
* Various fixes to pmap_enter() so that it actually works properly.


67016 12-Oct-2000 dfr

Some minor fixes and simplifications.


66937 10-Oct-2000 dfr

* Add rudimentary DDB support (no kgdb, no backtrace, no single step).
* Track recent changes to SWI code.
* Allocate RIDs for pmaps (untested).
* Implement assembler version of cpu_switch - its cleaner that way.


66860 09-Oct-2000 phk

Initiate deorbit burn sequence for <machine/mouse.h>.

Replace all in-tree uses with <sys/mouse.h> which repo-copied a few
moments ago from src/sys/i386/include/mouse.h by peter.
This is also the appropriate fix for exo-tree sources.

Put warnings in <machine/mouse.h> to discourage use.
November 15th 2000 the warnings will be converted to errors.
January 15th 2001 the <machine/mouse.h> files will be removed.


66834 08-Oct-2000 phk

Initiate deorbit burn sequence for <machine/console.h>.

Replace all in-tree uses with necessary subset of <sys/{fb,kb,cons}io.h>.
This is also the appropriate fix for exo-tree sources.

Put warnings in <machine/console.h> to discourage use.
November 15th 2000 the warnings will be converted to errors.
January 15th 2001 the <machine/console.h> files will be removed.


66802 08-Oct-2000 bmilekic

Cleanup comment in machine/param.h regarding mbuf-related sizes, and get rid
of MCLOFSET, which does not appear to be used anywhere anymore, and if it is,
it probably shouldn't be.


66739 06-Oct-2000 bde

Work around a bug by adding struct tags. gcc-2.95 apparently gets the
check in the [basic.link] section of the C++ standard wrong. gcc-2.7.2.3
apparently doesn't do the check, so the bug doesn't affect RELENG_3.

PR: 16170, 21427
Submitted by: Max Khon <fjoe@lark.websci.ru> (i386 version)
Discussed with: jdp


66721 06-Oct-2000 jasone

Reduce userland namespace polution.

#include <proc.h>, since curproc is needed.


66633 04-Oct-2000 dfr

Next round of fixes to the ia64 code. This includes simulated clock and
disk drivers along with a load of fixes to context switching, fork
handling and a load of other stuff I can't remember now. This takes us as
far as start_init() before it dies. I guess now I will have to finish off
the VM system and syscall handling :-).


66486 30-Sep-2000 dfr

Next round of ia64 work, including fixes to context switching,
implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these
changes, my test kernel reaches the mountroot prompt.


66464 29-Sep-2000 dfr

Ansify and fix warnings.


66463 29-Sep-2000 dfr

Implement dirty and access bit exceptions.


66462 29-Sep-2000 dfr

Bodge the simplelocks in a way which works UP.


66460 29-Sep-2000 dfr

Use write-back instead of write-combining for region 7.


66459 29-Sep-2000 dfr

Add a few more files for the ia64 port.


66458 29-Sep-2000 dfr

This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.