History log of /freebsd-10.0-release/sys/amd64/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
267829 24-Jun-2014 delphij

Fix iconv(3) NULL pointer dereference and out-of-bounds array
access. [SA-14:15]

Fix multiple vulnerabilities in file(1) and libmagic(3).
[SA-14:16]

Worked around bug with PCID implementation. [EN-14:07]

Security: CVE-2014-3951
Security: FreeBSD-SA-14:15.iconv
Security: CVE-2013-7345, CVE-2014-1943, CVE-2014-2270
Security: FreeBSD-SA-14:16.file
Approved by: so

259637 20-Dec-2013 glebius

Merge r259541 from stable/10:

Merge r256868,257276-257277,257515,257913 from head. These are fixes
required to make Xen buildable w/o INET.

Approved by: re (delphij)

259128 09-Dec-2013 gjb

Remove svn:mergeinfo from the releng/10.0 branch.

After branch creation from stable/10, the stable/10 branch mergeinfo
was moved to the root of the branch.

Since there have not been any merges from stable/10 to releng/10.0
yet, we do not need to track any of the existing mergeinfo here.

Merges to releng/10.0 should now be done to the root of the branch.

For future branches during the release cycle, unless otherwise noted,
this change will be done as part of the stable/ and releng/ branch
creation.

Discussed with: peter
Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-10.0-release/MAINTAINERS
/freebsd-10.0-release/Makefile.inc1
/freebsd-10.0-release/ObsoleteFiles.inc
/freebsd-10.0-release/UPDATING
/freebsd-10.0-release/bin/df
/freebsd-10.0-release/bin/freebsd-version
/freebsd-10.0-release/cddl
/freebsd-10.0-release/cddl/contrib/opensolaris
/freebsd-10.0-release/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-10.0-release/cddl/contrib/opensolaris/cmd/zfs
/freebsd-10.0-release/cddl/contrib/opensolaris/lib/libzfs
/freebsd-10.0-release/contrib/apr
/freebsd-10.0-release/contrib/apr-util
/freebsd-10.0-release/contrib/atf
/freebsd-10.0-release/contrib/binutils
/freebsd-10.0-release/contrib/bmake
/freebsd-10.0-release/contrib/byacc
/freebsd-10.0-release/contrib/bzip2
/freebsd-10.0-release/contrib/com_err
/freebsd-10.0-release/contrib/compiler-rt
/freebsd-10.0-release/contrib/dialog
/freebsd-10.0-release/contrib/dtc
/freebsd-10.0-release/contrib/ee
/freebsd-10.0-release/contrib/expat
/freebsd-10.0-release/contrib/file
/freebsd-10.0-release/contrib/gcc
/freebsd-10.0-release/contrib/gdb
/freebsd-10.0-release/contrib/gdtoa
/freebsd-10.0-release/contrib/groff
/freebsd-10.0-release/contrib/ipfilter
/freebsd-10.0-release/contrib/ipfilter/ml_ipl.c
/freebsd-10.0-release/contrib/ipfilter/mlfk_ipl.c
/freebsd-10.0-release/contrib/ipfilter/mlh_rule.c
/freebsd-10.0-release/contrib/ipfilter/mli_ipl.c
/freebsd-10.0-release/contrib/ipfilter/mln_ipl.c
/freebsd-10.0-release/contrib/ipfilter/mls_ipl.c
/freebsd-10.0-release/contrib/ldns
/freebsd-10.0-release/contrib/less
/freebsd-10.0-release/contrib/libarchive
/freebsd-10.0-release/contrib/libarchive/cpio
/freebsd-10.0-release/contrib/libarchive/libarchive
/freebsd-10.0-release/contrib/libarchive/libarchive_fe
/freebsd-10.0-release/contrib/libarchive/tar
/freebsd-10.0-release/contrib/libc++
/freebsd-10.0-release/contrib/libc-vis
/freebsd-10.0-release/contrib/libcxxrt
/freebsd-10.0-release/contrib/libexecinfo
/freebsd-10.0-release/contrib/libpcap
/freebsd-10.0-release/contrib/libstdc++
/freebsd-10.0-release/contrib/llvm
/freebsd-10.0-release/contrib/llvm/tools/clang
/freebsd-10.0-release/contrib/mtree
/freebsd-10.0-release/contrib/ncurses
/freebsd-10.0-release/contrib/netcat
/freebsd-10.0-release/contrib/ntp
/freebsd-10.0-release/contrib/nvi
/freebsd-10.0-release/contrib/one-true-awk
/freebsd-10.0-release/contrib/openbsm
/freebsd-10.0-release/contrib/openpam
/freebsd-10.0-release/contrib/openresolv
/freebsd-10.0-release/contrib/pf
/freebsd-10.0-release/contrib/sendmail
/freebsd-10.0-release/contrib/serf
/freebsd-10.0-release/contrib/smbfs
/freebsd-10.0-release/contrib/subversion
/freebsd-10.0-release/contrib/tcpdump
/freebsd-10.0-release/contrib/tcsh
/freebsd-10.0-release/contrib/tnftp
/freebsd-10.0-release/contrib/top
/freebsd-10.0-release/contrib/top/install-sh
/freebsd-10.0-release/contrib/tzcode/stdtime
/freebsd-10.0-release/contrib/tzcode/zic
/freebsd-10.0-release/contrib/tzdata
/freebsd-10.0-release/contrib/unbound
/freebsd-10.0-release/contrib/wpa
/freebsd-10.0-release/contrib/xz
/freebsd-10.0-release/crypto/heimdal
/freebsd-10.0-release/crypto/openssh
/freebsd-10.0-release/crypto/openssl
/freebsd-10.0-release/etc
/freebsd-10.0-release/etc/rc.d
/freebsd-10.0-release/gnu/lib
/freebsd-10.0-release/gnu/usr.bin/binutils
/freebsd-10.0-release/gnu/usr.bin/cc/cc_tools
/freebsd-10.0-release/gnu/usr.bin/gdb
/freebsd-10.0-release/include
/freebsd-10.0-release/lib
/freebsd-10.0-release/lib/libc
/freebsd-10.0-release/lib/libc/stdtime
/freebsd-10.0-release/lib/libc_nonshared
/freebsd-10.0-release/lib/libfetch
/freebsd-10.0-release/lib/libiconv_modules
/freebsd-10.0-release/lib/libsmb
/freebsd-10.0-release/lib/libthr
/freebsd-10.0-release/lib/libutil
/freebsd-10.0-release/lib/libvmmapi
/freebsd-10.0-release/lib/libyaml
/freebsd-10.0-release/lib/libz
/freebsd-10.0-release/release
/freebsd-10.0-release/release/doc
/freebsd-10.0-release/sbin
/freebsd-10.0-release/sbin/camcontrol
/freebsd-10.0-release/sbin/dumpon
/freebsd-10.0-release/sbin/hastd
/freebsd-10.0-release/sbin/ifconfig
/freebsd-10.0-release/sbin/ipfw
/freebsd-10.0-release/sbin/nvmecontrol
/freebsd-10.0-release/share
/freebsd-10.0-release/share/examples/bhyve
/freebsd-10.0-release/share/i18n/csmapper/JIS
/freebsd-10.0-release/share/i18n/esdb/EUC
/freebsd-10.0-release/share/man
/freebsd-10.0-release/share/man/man4
/freebsd-10.0-release/share/man/man4/bhyve.4
/freebsd-10.0-release/share/man/man5
/freebsd-10.0-release/share/man/man7
/freebsd-10.0-release/share/man/man8
/freebsd-10.0-release/share/misc
/freebsd-10.0-release/share/mk
/freebsd-10.0-release/share/mk/bsd.arch.inc.mk
/freebsd-10.0-release/share/syscons
/freebsd-10.0-release/share/zoneinfo
/freebsd-10.0-release/sys
include/vmm.h
include/vmm_dev.h
include/vmm_instruction_emul.h
include/xen
vmm
/freebsd-10.0-release/sys/boot
/freebsd-10.0-release/sys/boot/i386/efi
/freebsd-10.0-release/sys/boot/ia64/efi
/freebsd-10.0-release/sys/boot/ia64/ski
/freebsd-10.0-release/sys/boot/powerpc/boot1.chrp
/freebsd-10.0-release/sys/boot/powerpc/ofw
/freebsd-10.0-release/sys/cddl/contrib/opensolaris
/freebsd-10.0-release/sys/conf
/freebsd-10.0-release/sys/contrib/dev/acpica
/freebsd-10.0-release/sys/contrib/dev/acpica/changes.txt
/freebsd-10.0-release/sys/contrib/dev/acpica/common
/freebsd-10.0-release/sys/contrib/dev/acpica/compiler
/freebsd-10.0-release/sys/contrib/dev/acpica/components/debugger
/freebsd-10.0-release/sys/contrib/dev/acpica/components/disassembler
/freebsd-10.0-release/sys/contrib/dev/acpica/components/dispatcher
/freebsd-10.0-release/sys/contrib/dev/acpica/components/events
/freebsd-10.0-release/sys/contrib/dev/acpica/components/executer
/freebsd-10.0-release/sys/contrib/dev/acpica/components/hardware
/freebsd-10.0-release/sys/contrib/dev/acpica/components/namespace
/freebsd-10.0-release/sys/contrib/dev/acpica/components/parser
/freebsd-10.0-release/sys/contrib/dev/acpica/components/resources
/freebsd-10.0-release/sys/contrib/dev/acpica/components/tables
/freebsd-10.0-release/sys/contrib/dev/acpica/components/utilities
/freebsd-10.0-release/sys/contrib/dev/acpica/include
/freebsd-10.0-release/sys/contrib/dev/acpica/os_specific
/freebsd-10.0-release/sys/contrib/ipfilter
/freebsd-10.0-release/sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
/freebsd-10.0-release/sys/contrib/ipfilter/netinet/ip_raudio_pxy.c
/freebsd-10.0-release/sys/contrib/libfdt
/freebsd-10.0-release/sys/contrib/octeon-sdk
/freebsd-10.0-release/sys/contrib/x86emu
/freebsd-10.0-release/sys/dev/bvm
/freebsd-10.0-release/sys/dev/fdt/fdt_ic_if.m
/freebsd-10.0-release/sys/dev/hyperv
/freebsd-10.0-release/sys/modules/hyperv
/freebsd-10.0-release/sys/modules/vmm
/freebsd-10.0-release/sys/x86/include/acpica_machdep.h
/freebsd-10.0-release/tools
/freebsd-10.0-release/tools/build
/freebsd-10.0-release/tools/build/options
/freebsd-10.0-release/tools/tools/atsectl
/freebsd-10.0-release/usr.bin/calendar
/freebsd-10.0-release/usr.bin/csup
/freebsd-10.0-release/usr.bin/iscsictl
/freebsd-10.0-release/usr.bin/procstat
/freebsd-10.0-release/usr.sbin
/freebsd-10.0-release/usr.sbin/bhyve
/freebsd-10.0-release/usr.sbin/bhyvectl
/freebsd-10.0-release/usr.sbin/bhyveload
/freebsd-10.0-release/usr.sbin/bsdconfig
/freebsd-10.0-release/usr.sbin/bsdinstall
/freebsd-10.0-release/usr.sbin/ctladm
/freebsd-10.0-release/usr.sbin/ctld
/freebsd-10.0-release/usr.sbin/freebsd-update
/freebsd-10.0-release/usr.sbin/jail
/freebsd-10.0-release/usr.sbin/mergemaster
/freebsd-10.0-release/usr.sbin/mount_smbfs
/freebsd-10.0-release/usr.sbin/ndiscvt
/freebsd-10.0-release/usr.sbin/pkg
/freebsd-10.0-release/usr.sbin/rtadvctl
/freebsd-10.0-release/usr.sbin/rtadvd
/freebsd-10.0-release/usr.sbin/rtsold
/freebsd-10.0-release/usr.sbin/zic
259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

258996 05-Dec-2013 royger

MFC 258176:

Fix accounting for hw.realmem on the i386 and amd64 platforms.

sys/i386/i386/machdep.c:
sys/amd64/amd64/machdep.c:
The value reported by FreeBSD as "real memory" when booting
doesn't match what is later reported by sysctl as hw.realmem.
This is due to the fact that the value printed during the
boot process is fetched from smbios data (when possible),
and accounts for holes in physical memory. On the other
hand, the value of hw.realmem is unconditionally set to be
one larger than the highest page of the physical address
space.

Fix this by setting hw.realmem to the same value printed
during boot, this makes hw.realmem honour it's name and
account properly for physical memory present in the system.

Submitted by: Roger Pau Monné
Reviewed by: gibbs
Approved by: gibbs (mentor)
Approved by: re (gjb)


258886 03-Dec-2013 kib

MFC r258660:
Fix sys/sysctl.h use for cc -m32 on amd64.

Approved by: re (gjb)


258559 25-Nov-2013 emaste

MFC r258135: x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)

Debuggers may need to change PSL_RF. Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could
be modified via sigreturn, so this change should not provide any new
ability to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by: jhb, kib

Sponsored by: DARPA, AFRL
Approved by: re (gjb)


258159 15-Nov-2013 kib

MFC r257856:
Add bits for the AMD features from CPUID function 0x80000001 ECX,
described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.

Approved by: re (gjb)


257575 03-Nov-2013 kib

MFC r257216:
Several small fixes for the amd64 minidump code.

Approved by: re (gjb)


256869 22-Oct-2013 neel

MFC r256645.

Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Approved by: re (hrs)


256651 16-Oct-2013 neel

MFC r256570:

Fix the witness warning that warned against calling uiomove() while holding
the 'vmmdev_mtx' in vmmdev_rw().

Rely on the 'si_threadcount' accounting to ensure that we never destroy the
VM device node while it has operations in progress (e.g. ioctl, mmap etc).

Approved by: re (rodrigc)


256329 11-Oct-2013 gjb

MFC r256328:
Document XENHVM and xenpci are mutually inclusive.

Approved by: re (delphij)
Sponsored by: The FreeBSD Foundation


256283 10-Oct-2013 gjb

- Remove debugging from GENERIC* kernel configurations
- Enable MALLOC_PRODUCTION
- Default dumpdev=NO
- Remove UPDATING entry regarding debugging features
- Bump __FreeBSD_version to 1000500

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


256166 08-Oct-2013 dim

In sys/amd64/amd64/pmap.c, fix several gcc warnings about uninitialized
variables in reclaim_pv_chunk().

Approved by: re (marius)
Reviewed by: neel, kib
X-MFC-With: r256072


256073 05-Oct-2013 gibbs

Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id
field. Perform vcpu enumeration for Xen PV and HVM environments
and convert all Xen drivers to use vcpu_id instead of a hard coded
assumption of the mapping algorithm (acpi or apic ID) in use.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)

amd64/include/pcpu.h:
i386/include/pcpu.h:
Add vcpu_id to the amd64 and i386 pcpu structures.

dev/xen/timer/timer.c
x86/xen/xen_intr.c
Use new vcpu_id instead of assuming acpi_id == vcpu_id.

i386/xen/mp_machdep.c:
i386/xen/mptable.c
x86/xen/hvm.c:
Perform Xen HVM and Xen full PV vcpu_id mapping.

x86/xen/hvm.c:
x86/acpica/madt.c
Change SYSINIT ordering of acpi CPU enumeration so that it
is guaranteed to be available at the time of Xen HVM vcpu
id mapping.


256072 05-Oct-2013 neel

Merge projects/bhyve_npt_pmap into head.

Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
Bit Position Interpreted By
PG_V 52 software (accessed bit emulation handler)
PG_RW 53 software (dirty bit emulation handler)
PG_A 0 hardware (aka EPT_PG_RD)
PG_M 1 hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by: re
Discussed with: grehan
Reviewed by: alc, kib
Tested by: pho


256053 04-Oct-2013 jmg

add aesni module to i386 and amd64 NOTES...

Approved by: re (gjb)


255911 27-Sep-2013 grehan

Return 0 for a rdmsr of MSR_IA32_PLATFORM_ID. This
is enough to get Ubuntu 12.0.4/13.0.4 to boot.

Approved by: re@ (blanket)


255849 24-Sep-2013 kib

In pmap_clear_modify(), initialize pvh even for fictitious managed
page, otherwise the small mappings loop would use uninitialized value.
Note that currently pmap_clear_modify() is not called for fictitious
pages.

Sponsored by: The FreeBSD Foundation
Approved by: re (glebius)


255845 24-Sep-2013 kib

Use the pv lists generation count to read-lock the pvh_global_lock in
pmap_clear_modify().

Noted and reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)


255844 24-Sep-2013 kib

Ensure that the ERESTART return from the syscall reloads the
registers, to make the restarted syscall instruction pass the correct
arguments.

PR: kern/182161
Reported by: Russ Cox <rsc@swtch.com>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Approved by: re (marius)


255827 23-Sep-2013 kib

Free both KVA and backing pages when freeing TSS memory.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)


255752 21-Sep-2013 gjb

Put 'device hyperv' back in amd64/GENERIC, incorrectly removed with
r255736.

Pointed out by: gibbs
Approved by: re (delphij)
Sponsored by: The FreeBSD Foundation


255751 21-Sep-2013 grehan

Reorder/regroup the vmm ioctl api definitions to allow some
semblance of API stability and growth during the 10.* timeframe.

Userland/kernel bhyve will have to be recompiled after this.

Reviewed by: neel
Approved by: re@ (blanket)


255744 20-Sep-2013 gibbs

Merge Xen PVHVM support into the GENERIC kernel config for both
amd64 and i386.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
- Introduce two new CPU hooks for initialization and resume
purposes. This allows us to get rid of the XENHVM ifdefs in
mp_machdep, and also sets some hooks into common code that can be
used by other hypervisor implementations.

sys/amd64/conf/XENHVM:
sys/i386/conf/XENHVM:
- Remove these configs now that GENERIC has builtin support for Xen
HVM.

sys/kern/subr_smp.c:
- Make sure there are no pending IPIs when suspending a system.

sys/x86/xen/hvm.c:
- Add cpu init and resume vectors that are called from mp_machdep
using the new hooks.
- Only clear the vcpu_info mapping data on resume. It is already
clear for the BSP on a cold boot and is set correctly as APs
are started.
- Gate xen_hvm_init_cpu only to systems running under Xen.

sys/x86/xen/xen_intr.c:
- Gate the setup of event channels only to systems running under Xen.


255736 20-Sep-2013 davidch

Substantial rewrite of bxe(4) to add support for the BCM57712 and
BCM578XX controllers.

Approved by: re
MFC after: 4 weeks


255732 20-Sep-2013 neel

Merge the following changes from projects/bhyve_npt_pmap:
- add fields to 'struct pmap' that are required to manage nested page tables.
- add a parameter to 'vmspace_alloc()' that can be used to override the
default pmap initialization routine 'pmap_pinit()'.

These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap'
in anticipation of the upcoming KBI freeze for 10.0.

Reviewed by: kib@, alc@
Approved by: re (glebius)


255726 20-Sep-2013 gibbs

Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
- Make sure that are no MMU related IPIs pending on migration.
- Reset pending IPI_BITMAP on resume.
- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
- Add a "suspend_cancelled" parameter to pic_resume(). For the
Xen PIC, restoration of interrupt services differs between
the aborted suspend and normal resume cases, so we must provide
this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
- Don't swap out "suspend safe" timers across a suspend/resume
cycle. This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
- Perform proper suspend/resume process for PVHVM:
- Suspend all APs before going into suspension, this allows us
to reset the vcpu_info on resume for each AP.
- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
- Implement suspend/resume support for the PV timer. Since FreeBSD
doesn't perform a per-cpu resume of the timer, we need to call
smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
- When suspending a PVHVM domain make sure there are no MMU IPIs
in-flight, or we will get a lockup on resume due to the fact that
pending event channels are not carried over on migration.
- Implement a generic version of restart_cpus that can be used by
suspended and stopped cpus.

sys/x86/xen/hvm.c:
- Implement resume support for the hypercall page and shared info.
- Clear vcpu_info so it can be reset by APs when resuming from
suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
- Properly rebind per-cpus VIRQs and IPIs on resume.


255724 20-Sep-2013 alc

The pmap function pmap_clear_reference() is no longer used. Remove it.

pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


255692 19-Sep-2013 grehan

Reconnect the hyperv drivers back into GENERIC now that the
disengage driver issue has been resolved.

Approved by: re@ (gjb)


255677 18-Sep-2013 pjd

Fix panic in ktrcapfail() when no capability rights are passed.
While here, correct all consumers to pass NULL instead of 0 as we pass
capability rights as pointers now, not uint64_t.

Reported by: Daniel Peyrolon
Tested by: Daniel Peyrolon
Approved by: re (marius)


255676 18-Sep-2013 rdivacky

Regen.

Approved by: re (delphij)


255675 18-Sep-2013 rdivacky

Revert r255672, it has some serious flaws, leaking file references etc.

Approved by: re (delphij)


255673 18-Sep-2013 rdivacky

Regen.

Approved by: re (delphij)


255672 18-Sep-2013 rdivacky

Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by: re (delphij)
Sponsored by: Google Summer of Code
Submitted by: Yuri Victorovich <yuri at rawbw dot com>
Tested by: Yuri Victorovich <yuri at rawbw dot com>


255645 17-Sep-2013 grehan

Hide TSC-deadline APIC timer support from guests. This mode
isn't yet implemented in bhyve's APIC emulation.

Reviewed by: neel
Approved by: re@ (blanket)


255638 17-Sep-2013 neel

Fix a bug in decoding an instruction that has an SIB byte as well as an
immediate operand. The presence of an SIB byte in decoding the ModR/M field
would cause 'imm_bytes' to not be set to the correct value.

Fix this by initializing 'imm_bytes' independent of the ModR/M decoding.

Reported by: grehan@
Approved by: re@


255623 17-Sep-2013 bryanv

Add vmx(4) to i386 and amd64 GENERIC

Approved by: re (gjb)


255607 16-Sep-2013 kib

In pmap_copy(), when the copied region is mapped with superpage but does
not cover entire superpage, avoid copying. Doing partial copy would
require demotion, which is incompatible with the already held locks.

Reported by: cperciva
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (delphij)


255574 14-Sep-2013 grehan

Pull the hyperv drivers from GENERIC until the fix to the disengage
driver to make it only probe when running on hyperv is reviewed and
tested.

Approved by: re (rodrigc)


255524 13-Sep-2013 grehan

Import Hyper-V paravirtualized drivers from projects/hyperv
branch into head.

Approved by: re@ (hrs)
Obtained from: Microsoft, NetApp, and Citrix.


255469 11-Sep-2013 neel

Fix a limitation in bhyve that would limit the number of virtual machines to
the maximum number of VT-d domains (256 on a Sandybridge). We now allocate a
VT-d domain for a guest only if the administrator has explicitly configured
one or more PCI passthru device(s).

If there are no PCI passthru devices configured (the common case) then the
number of virtual machines is no longer limited by the maximum number of
VT-d domains.

Reviewed by: grehan@
Approved by: re@


255438 10-Sep-2013 grehan

Go way past 11 and bump bhyve's max vCPUs to 16.

This should be sufficient for 10.0 and will do
until forthcoming work to avoid limitations
in this area is complete.

Thanks to Bela Lubkin at tidalscale for the
headsup on the apic/cpu id/io apic ASL parameters
that are actually hex values and broke when
written as decimal when 11 vCPUs were configured.

Approved by: re@


255409 08-Sep-2013 alc

Prior to r254304, we only began scanning the active page queue when the
amount of free memory was close to the point at which we would begin
reclaiming pages. Now, we continuously scan the active page queue,
regardless of the amount of free memory. Consequently, we are continuously
calling pmap_ts_referenced() on active pages.

Prior to this change, pmap_ts_referenced() would always demote superpage
mappings in order to obtain finer-grained reference information. This made
sense because we were coming under memory pressure and would soon have to
begin reclaiming pages. Now, however, with continuous scanning of the
active page queue, these demotions are taking a toll on performance. For
example, on one of my test machines, the running time for the HPCC Random
Access benchmark (also known as GUPS) has increased by 54%. To address this
problem, I have replaced the demotion with a heuristic for periodically
clearing the reference flag on superpage mappings.

Reviewed by: kib
Approved by: re (glebius)
Sponsored by: EMC / Isilon Storage Division


255343 07-Sep-2013 neel

Allocate VPIDs by using the unit number allocator to keep do the bookkeeping.

Also deal with VPID exhaustion by allocating out of a reserved range as the
last resort.


255342 07-Sep-2013 grehan

Mask off the vector from the MSI-x data word.
Some o/s's set the trigger-mode level bit which
results in an invalid vector and pass-thru interrupts
not being delivered.


255331 06-Sep-2013 gibbs

Implement PV IPIs for PVHVM guests and further converge PV and HVM
IPI implmementations.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Submitted by: gibbs (misc cleanup, table driven config)
Reviewed by: gibbs
MFC after: 2 weeks

sys/amd64/include/cpufunc.h:
sys/amd64/amd64/pmap.c:
Move invltlb_globpcid() into cpufunc.h so that it can be
used by the Xen HVM version of tlb shootdown IPI handlers.

sys/x86/xen/xen_intr.c:
sys/xen/xen_intr.h:
Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(),
and remove the ipi vector parameter. This api allocates
an event channel port that can be used for ipi services,
but knows nothing of the actual ipi for which that port
will be used. Removing the unused argument and cleaning
up the comments surrounding its declaration helps clarify
its actual role.

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
Implement a generic framework for amd64 and i386 that allows
the implementation of certain CPU management functions to
be selected at runtime. Currently this is only used for
the ipi send function, which we optimize for Xen when running
on a Xen hypervisor, but can easily be expanded to support
more operations.

sys/x86/xen/hvm.c:
Implement Xen PV IPI handlers and operations, replacing native
send IPI.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
sys/i386/include/smp.h:
Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS
is defined already for us in the xen interface files.
NR_IPIS is only needed in one file per Xen platform and is
easily inferred by the IPI vector table that is defined in
those files.

sys/i386/xen/mp_machdep.c:
Restructure to more closely match the HVM implementation by
performing table driven IPI setup.


255323 06-Sep-2013 bryanv

Add vmx device to the i386 and amd64 NOTES files


255312 06-Sep-2013 kib

Only lock pvh_global_lock read-only for pmap_page_wired_mappings(),
pmap_is_modified() and pmap_is_referenced(), same as it was done for
pmap_ts_referenced().

Consolidate identical code for pmap_is_modified() and
pmap_is_referenced() into helper pmap_page_test_mappings().

Reviewed by: alc
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation


255311 06-Sep-2013 kib

In pmap_ts_referenced(), when restarting the loop due to pv list
generation changed, do not drop and immediately relock the pv list.

Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


255289 06-Sep-2013 glebius

On those machines, where sf_bufs do not represent any real object, make
sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely
empty functions.

Reviewed by: alc, kib, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


255288 06-Sep-2013 grehan

Emulate reading of the IA32_MISC_ENABLE MSR, by returning
the host MSR and masking off features that aren't supported.
Linux reads this MSR to detect if NX has been disabled via
BIOS.


255287 06-Sep-2013 grehan

Allow CPUID leaf 0xD to be read as zeroes.
Linux reads this even though extended features
aren't exposed.

Support for 0xD will be expanded once AVX[2]
is exposed to the guest in upcoming work.


255219 05-Sep-2013 pjd

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


255217 04-Sep-2013 kib

Tidy up some loose ends in the PCID code:

- Restore the pre-PCID TLB shootdown handlers for whole address space
and single page invalidation asm code, and assign the IPI handler to
them when PCID is not supported or disabled. Old handlers have
linear control flow. But, still use the common return sequence.

- Stop using pcpu for INVPCID descriptors in the invlrg handler. It
is enough to allocate descriptors on the stack. As result, two
SWAPGS instructions are shaved off from the code for Haswell+.

- Fix the reverted condition in invlrng for checking of the PCID
support [1], also in invlrng check that pmap is kernel pmap before
performing other tests. For the kernel pmap, which provides global
mappings, the INVLPG must be used for invalidation always.

- Save the pre-computed pmap' %CR3 register in the struct pmap. This
allows to remove several checks for pm_pcid validity when %CR3 is
reloaded [2].

Noted by: gibbs [1]
Discussed with: alc [2]
Tested by: pho, flo
Sponsored by: The FreeBSD Foundation


255192 03-Sep-2013 jhb

Add support for the 'invpcid' instruction to binutils and DDB's
disassembler on amd64.

MFC after: 1 month


255106 31-Aug-2013 kib

Fix two build failures for non-tb configurations, UP [2] and when using gas [1].

Reported by: andreast [1], bf [2]
Sponsored by: The FreeBSD Foundation


255079 30-Aug-2013 kib

The pm_save should be cleared on the pmap initialization, and not on
the activation.

Noted by: alc


255060 30-Aug-2013 kib

Implement support for the process-context identifiers ('PCID') on
Intel CPUs. The feature tags TLB entries with the Id of the address
space and allows to avoid TLB invalidation on the context switch, it
is available only in the long mode. In the microbenchmarks, using the
PCID decreased latency of the context switches by ~30% on SandyBridge
class desktop CPUs, measured with the lat_ctx program from lmbench.

If available, use INVPCID instruction when a TLB entry in non-current
address space needs to be invalidated. The instruction is typically
available on the Haswell.

If needed, the use of PCID can be turned off with the
vm.pmap.pcid_enabled loader tunable set to 0. The state of the
feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl
vm.pmap.pcid_save_cnt reports the number of context switches which
avoided invalidating the TLB; compare with the total number of context
switches, available as sysctl vm.stats.sys.v_swtch.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
Tested by: pho, bf


255058 30-Aug-2013 kib

Provide a wrapper for the INVPCID instruction, definition of the
descriptor and symbolic names for the operation types.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
Tested by: pho, bf


255040 29-Aug-2013 gibbs

Implement vector callback for PVHVM and unify event channel implementations

Re-structure Xen HVM support so that:
- Xen is detected and hypercalls can be performed very
early in system startup.
- Xen interrupt services are implemented using FreeBSD's native
interrupt delivery infrastructure.
- the Xen interrupt service implementation is shared between PV
and HVM guests.
- Xen interrupt handlers can optionally use a filter handler
in order to avoid the overhead of dispatch to an interrupt
thread.
- interrupt load can be distributed among all available CPUs.
- the overhead of accessing the emulated local and I/O apics
on HVM is removed for event channel port events.
- a similar optimization can eventually, and fairly easily,
be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
Reserve IDT vector 0x93 for the Xen event channel upcall
interrupt handler. On Hypervisors that support the direct
vector callback feature, we can request that this vector be
called directly by an injected HVM interrupt event, instead
of a simulated PCI interrupt on the Xen platform PCI device.
This avoids all of the overhead of dealing with the emulated
I/O APIC and local APIC. It also means that the Hypervisor
can inject these events on any CPU, allowing upcalls for
different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
Increase the FreeBSD IRQ vector table to include space
for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
Remove Xen HVM per-cpu variable data. These fields are now
allocated via the dynamic per-cpu scheme. See xen_intr.c
for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
Remove constants, macros, and functions unused in FreeBSD's Xen
support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
Introduce new functions xen_domain(), xen_pv_domain(), and
xen_hvm_domain(). These are used in favor of #ifdefs so that
FreeBSD can dynamically detect and adapt to the presence of
a hypervisor. The goal is to have an HVM optimized GENERIC,
but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
Refactor magic ioport, Hypercall table and Hypervisor shared
information page setup, and move it to a dedicated HVM support
module.

HVM mode initialization is now triggered during the
SI_SUB_HYPERVISOR phase of system startup. This currently
occurs just after the kernel VM is fully setup which is
just enough infrastructure to allow the hypercall table
and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
Add definitions and a method for configuring Hypervisor event
delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
Adjust kernel build to reflect the refactoring of early
Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
Since blkback defers all event handling to a taskqueue,
convert this task queue to a "fast" taskqueue, and schedule
it via an interrupt filter. This avoids an unnecessary
ithread context switch.

sys/xen/xenstore/xenstore.c:
The xenstore driver is MPSAFE. Indicate as much when
registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
Remove unused event channel APIs.

sys/xen/evtchn.h:
Remove all kernel Xen interrupt service API definitions
from this file. It is now only used for structure and
ioctl definitions related to the event channel userland
device driver.

Update the definitions in this file to match those from
NetBSD. Implementing this interface will be necessary for
Dom0 support.

sys/xen/evtchn/evtchnvar.h:
Add a header file for implemenation internal APIs related
to managing event channels event delivery. This is used
to allow, for example, the event channel userland device
driver to access low-level routines that typical kernel
consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
Standardize on the evtchn_port_t type for referring to
an event channel port id. In order to prevent low-level
event channel APIs from leaking to kernel consumers who
should not have access to this data, the type is defined
twice: Once in the Xen provided event_channel.h, and again
in xen/xen_intr.h. The double declaration is protected by
__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
New implementation of Xen interrupt services. This is
similar in many respects to the i386 PV implementation with
the exception that events for bound to event channel ports
(i.e. not IPI, virtual IRQ, or physical IRQ) are further
optimized to avoid mask/unmask operations that aren't
necessary for these edge triggered events.

Stubs exist for supporting physical IRQ binding, but will
need additional work before this implementation can be
fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
Add support for placing vcpu_info into an arbritary memory
page instead of using HYPERVISOR_shared_info->vcpu_info.
This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
Add support for new event channle implementation.


255028 29-Aug-2013 alc

Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division


254964 27-Aug-2013 neel

Add support for emulating the byte move instruction "mov r/m8, r8".

This emulation is required when dumping MMIO space via the ddb "examine"
command.


254667 22-Aug-2013 kib

Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


254666 22-Aug-2013 kib

Use the generation count of the pv list to work around LOR between
pmap lock and pv list lock, and use the shared locking on
pvh_global_lock in pmap_remove_write(), same as it was done for
pmap_ts_referenced().

Noted and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


254624 21-Aug-2013 obrien

The PADLOCK_RNG and RDRAND_RNG kernel options are now devices.
Thus "device padlock_rng" and "device rdrand_rng" should be
used instead of "options PADLOCK_RNG" & "options RDRAND_RNG".

Requested by: so@ (des)
Submitted by: obrien, arthurmesh@gmail.com
Obtained from: Juniper Networks


254623 21-Aug-2013 jkim

Reimplement atomic operations on PDEs and PTEs in pmap.h. This change
significantly reduces duplicate code and make it easier to read.

Reviewed by: alc, bde


254618 21-Aug-2013 jkim

Remove empty lines before return statements for style consistency.


254617 21-Aug-2013 jkim

Implement atomic_swap() and atomic_testandset().

Reviewed by: arch, bde, jilles, kib


254614 21-Aug-2013 jkim

- Remove the "a" constraint from main output operand for atomic_cmpset().
- Use "+" modifier for the "expect" because it is also an output (unused).


254612 21-Aug-2013 jkim

Use '+' modifier for a memory operand that is both an input and an output.
It was actually done in r86301 but reverted in r150182 because GCC 3.x was
not able to handle it for a memory operand. Apparently, this problem was
fixed in GCC 4.1+ and several contrib sources already rely on this feature.


254611 21-Aug-2013 jkim

Remove bogus labels. No functional change.


254610 21-Aug-2013 jkim

Use consistent style. No functional change.


254549 20-Aug-2013 neel

Do not create superpage mappings in the iommu.

This is a workaround to hide the fact that we do not have any code to
demote a superpage mapping before we unmap a single page that is part
of the superpage.


254548 20-Aug-2013 neel

Extract the location of the remapping hardware units from the ACPI DMAR table.

Submitted by: Gopakumar T (gopakumar_thekkedath@yahoo.co.in)


254547 20-Aug-2013 neel

Fix breakage caused by r254466 in minidumpsys().

r254466 increased the KVA from 512GB to 2TB which requires 4 PDP pages as
opposed to a single one before the change. This broke minidumpsys() since
it assumed that the entire KVA could be addressed via a single PDP page.

Fix this by obtaining the address of the PDP page from the PML4 entry
associated with the KVA being dumped.

Reported by: pho
Submitted by: kib
Pointy hat to: neel


254501 18-Aug-2013 kib

When code from r254064 in pmap_ts_referenced() drops pv lock and
blocks on a pmap lock, pmap_release() might proceed in parallel and
destroy the pmap mutex, since unlocked pv lock allows to remove pv
entry owned by the pmap.

For now, gate the pmap_release() on write-locked pvh_global_lock.
Since pmap_ts_release() does not unlock the global lock,
pmap_release() would not destroy pmap mutex until the
pmap_ts_referenced() finished. We cannot enter pmap_ts_referenced()
and encounter a pv entry for the destroyed pmap if pmap_release()
passed the global lock gate, since pmap_remove_pages() would finish
earlier.

Reported by: jeff, pho
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation


254480 18-Aug-2013 pjd

Add process descriptors support to the GENERIC kernel. It is already being
used by the tools in base systems and with sandboxing more and more tools
the usage should only increase.

Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Sponsored by: Google Summer of Code 2013
MFC after: 1 month


254466 17-Aug-2013 neel

Bump up the maximum addressable memory on amd64 systems from 1TB to 4TB.
Bump up the KVA size proportionally from 512GB to 2TB.

The number of page table pages used by the direct map is now calculated at
run time based on 'Maxmem'. This means the small memory systems will not
see any additional tax in terms of page table pages for the direct map.

However all amd64 systems, regardless of the memory size, will use 3 more
pages to accomodate the bump in the KVA size.

More details available here:
http://lists.freebsd.org/pipermail/freebsd-hackers/2013-June/043015.html
http://lists.freebsd.org/pipermail/freebsd-current/2013-July/043143.html

Tested with the following configurations:
- Sandybridge server with 64GB of memory.
- bhyve VM with 64MB of memory.
- bhyve VM with a 8GB of memory with the memory segment above 4GB cuddling
right up against the 4TB maximum memory limit.

Discussed on: hackers@, current@
Submitted by: Chris Torek (torek@torek.net)


254463 17-Aug-2013 jilles

libc: Access _logname_valid more efficiently.

The variable _logname_valid is not exported via the version script;
therefore, change C and i386/amd64 assembler code to remove indirection
(which allowed interposition). This makes the code slightly smaller and
faster.

Also, remove #define PIC_GOT from i386/amd64 in !PIC mode. Without PIC,
there is no place containing the address of each variable, so there is no
possible definition for PIC_GOT.


254374 15-Aug-2013 brooks

Use an ANSI C definition of initializecpucache() to match the declaration
and the rest of the file.


254305 13-Aug-2013 jkim

Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these
two files were functionally identical.


254300 13-Aug-2013 jkim

Tidy up global locks for ACPICA. There is no functional change.


254182 10-Aug-2013 kib

Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue. Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked. See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members. Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


254141 09-Aug-2013 attilio

On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc (older version)
Reviewed by: jeff
Tested by: pho, scottl


254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


254133 09-Aug-2013 avg

follow up to r254051

- update powerpc/GENERIC64 as well, suggested by mdf
- update comments so that they make sense after the change, suggested by
jhb

X-MFC after: never (change specific to head)


254081 08-Aug-2013 neel

Use local variables with the appropriate types and eliminate a bunch of casts.

This is a cosmetic change but it does help with a proposed change to increase
the maximum size of physical memory supported on amd64 platforms.

Submitted by: Chris Torek (torek@torek.net)


254065 07-Aug-2013 kib

Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation


254064 07-Aug-2013 kib

Change the pmap_ts_referenced() method of amd64 pmap to use shared
pvh_global_lock. This allows the method to be executed in parallel,
avoiding undue contention on the pvh_global_lock for the multithreaded
pagedaemon.

The pmap_ts_referenced() function has to inspect the page mappings for
several pmaps, which need to be locked while pv list lock is owned.
This contradicts to the lock order, where pmap lock is before pv list
lock. Introduce the generation count for the pv list of the page or
superpage, which indicate any change in the pv list, and, as usual,
perform restart of the iteration if generation changed while pv lock
was dropped for blocking acquire of a pmap lock.

Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation


254051 07-Aug-2013 avg

enable KDB_TRACE in GENERICs

KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.

X-MFC after: never (change specific to head)


254025 07-Aug-2013 jeff

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253949 05-Aug-2013 jeff

- Introduce a specific function, pmap_remove_kernel_pde, for removing
huge pages in the kernel's address space. This works around several
asserts from pmap_demote_pde_locked that did not apply and gave false
warnings.

Discovered by: pho
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division


253909 03-Aug-2013 grehan

Follow-up commit to fix CR0 issues. Maintain
architectural state on CR vmexits by guaranteeing
that EFER, CR0 and the VMCS entry controls are
all in sync when transitioning to IA-32e mode.

Submitted by: Tycho Nightingale (tycho.nightingale <at> plurisbusnetworks.com)


253854 01-Aug-2013 grehan

Moved clearing of vmm_initialized to avoid the case
of unloading the module while VMs existed. This would
result in EBUSY, but would prevent further operations
on VMs resulting in the module being impossible to
unload.

Submitted by: Tycho Nightingale (tycho.nightingale <at> plurisbusnetworks.com)
Reviewed by: grehan, neel


253849 01-Aug-2013 grehan

Correctly maintain the CR0/CR4 shadow registers.
This was exposed with AP spinup of Linux, and
booting OpenBSD, where the CR0 register is unconditionally
written to prior to the longjump to enter protected
mode. The CR-vmexit handling was not updating CPU state which
resulted in a vmentry failure with invalid guest state.

A follow-on submit will fix the CPU state issue, but this
fix prevents the CR-vmexit prior to entering protected
mode by properly initializing and maintaining CR* state.

Reviewed by: neel
Reported by: Gopakumar.T @ netapp


253845 31-Jul-2013 obrien

Back out r253779 & r253786.


253779 29-Jul-2013 obrien

Decouple yarrow from random(4) device.

* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.

* random(4) device doesn't really depend on rijndael-*. Yarrow, however, does.

* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
random_adaptor is basically an adapter that plugs in to random(4).
random_adaptor can only be plugged in to random(4) very early in bootup.
Unplugging random_adaptor from random(4) is not supported, and is probably a
bad idea anyway, due to potential loss of entropy pools.
We currently have 3 random_adaptors:
+ yarrow
+ rdrand (ivy.c)
+ nehemeiah

* Remove platform dependent logic from probe.c, and move it into
corresponding registration routines of each random_adaptor provider.
probe.c doesn't do anything other than picking a specific random_adaptor
from a list of registered ones.

* If the kernel doesn't have any random_adaptor adapters present then the
creation of /dev/random is postponed until next random_adaptor is kldload'ed.

* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
system wide one.

Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: obrien


253750 28-Jul-2013 avg

Revert r253748,253749

This WIP should not have been committed yet.

Pointyhat to: avg


253748 28-Jul-2013 avg

put contents of cpu.h under _KERNEL

no userland-serviceable parts inside

MFC after: 20 days


253747 28-Jul-2013 avg

x86: detect mwait capabilities and extensions, when present

Reviewed by: kib (earlier amd64-only version)
MFC after: 2 weeks


253685 26-Jul-2013 jeff

- Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc.
This eliminates some unusual uses of that API in favor of more typical
uses of kmem_malloc().

Discussed with: kib/alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253585 23-Jul-2013 neel

Add support for emulation of the "or r/m, imm8" instruction.

Submitted by: Zhixiang Yu (zxyu.core@gmail.com)
Obtained from: GSoC 2013 (AHCI device emulation for bhyve)


253582 23-Jul-2013 neel

Fix a bug introduced in r252646 that causes a page with the PG_PTE_PAT bit set
to be interpreted as a superpage. This is because PG_PTE_PAT is at the same
bit position in PTE as PG_PS is in a PDE.

This caused a number of regressions on amd64 systems: panic when starting
X applications, freeze during shutdown etc.

Pointy hat to: me
Tested by: gperez@entel.upc.edu, joel, dumbbell
Reviewed by: kib


253352 15-Jul-2013 kib

MFi386: add ddb "show sysregs" command.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


253140 10-Jul-2013 kib

Clear m->object for the page taken from the delayed free list for
reuse as the pv chink page in reclaim_pv_chunk(). Having non-NULL
m->object is wrong for page not owned by an object and confuses both
vm_page_free_toq() and vm_page_remove() when the page is freed later.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


252867 06-Jul-2013 delphij

Import HighPoint DC Series Data Center HBA (DC7280 and R750) driver.
This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms.

Many thanks to HighPoint for providing this driver.

MFC after: 1 day


252646 03-Jul-2013 neel

If a superpage mapping is being removed then we need to ignore the PG_PDE_PAT
bit when looking up the vm_page associated with the superpage's physical
address.

If the caching attribute for the mapping is write combining or write protected
then the PG_PDE_PAT bit will be set and thus cause an 'off-by-one' error
when looking up the vm_page.

Fix this by using the PG_PS_FRAME mask to compute the physical address for
a superpage mapping instead of PG_FRAME.

This is a theoretical issue at this point since non-writeback attributes are
currently used only for fictitious mappings and fictitious mappings are not
subject to promotion.

Discussed with: alc, kib
MFC after: 2 weeks


252641 03-Jul-2013 neel

Verify that all bytes in the instruction buffer are consumed during decoding.

Suggested by: grehan


252475 01-Jul-2013 grehan

Ignore guest PAT settings by default in EPT mappings.
From experimentation, other hypervisors also do this.

Diagnosed by: tycho nightingale at pluribusnetworks com
Reviewed by: neel


252434 01-Jul-2013 kib

Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.

On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.

On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot. The effect would be a counter going off by
arbitrary value after zeroing. Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive. On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.

PowerPC64 is treated the same as amd64.

For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching. It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm). If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.

Noted by: bde
Sponsored by: The FreeBSD Foundation


252335 28-Jun-2013 grehan

Make sure all CPUID values are handled, instead of exiting the
bhyve process when an unhandled one is encountered.

Hide some additional capabilities from the guest (e.g. debug store).

This fixes the issue with FreeBSD 9.1 MP guests exiting the VM on
AP spinup (where CPUID is used when sync'ing the TSCs) and the
issue with the Java build where CPUIDs are issued from a guest
userspace.

Submitted by: tycho nightingale at pluribusnetworks com
Reviewed by: neel
Reported by: many


252280 27-Jun-2013 jkim

Move definitions required by userland applications out of acpica_machdep.h.


252032 20-Jun-2013 kib

Allow immediate operand.

Sponsored by: The FreeBSD Foundation


251988 19-Jun-2013 kib

Some clarifications and updates for the comments, mostly retrieved
from Bruce Evans. Trim the trailing spaces.

MFC after: 1 week


251976 18-Jun-2013 pluknet

Fix a gcc warning uncovered after r251745.

Reported by: Sergey V. Dyatko
Reviewed by: neel


251767 14-Jun-2013 gibbs

Upgrade Xen interface headers to Xen 4.2.1.

Move FreeBSD from interface version 0x00030204 to 0x00030208.
Updates are required to our grant table implementation before we
can bump this further.

sys/xen/hvm.h:
Replace the implementation of hvm_get_parameter(), formerly located
in sys/xen/interface/hvm/params.h. Linux has a similar file which
primarily stores this function.

sys/xen/xenstore/xenstore.c:
Include new xen/hvm.h header file to get hvm_get_parameter().

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
Correctly protect function definition and variables from being
included into assembly files in xen-os.h

Xen memory barriers are now prefixed with "xen_" to avoid conflicts
with OS native primatives. Define Xen memory barriers in terms of
the native FreeBSD primatives.

Sponsored by: Spectra Logic Corporation
Reviewed by: Roger Pau Monné
Tested by: Roger Pau Monné
Obtained from: Roger Pau Monné (bug fixes)


251745 14-Jun-2013 pluknet

Replace cpusetffs_obj with CPU_FFS, missed in r251703.

Reported by: bdrewery, O. Hartmann


251720 14-Jun-2013 neel

Remove unused macros PTESHIFT, PDESHIFT, PDPESHIFT and PML4ESHIFT.

Reviewed by: alc


251703 13-Jun-2013 jeff

- Add a BIT_FFS() macro and use it to replace cpusetffs_obj()

Discussed with: attilio
Sponsored by: EMC / Isilon Storage Division


251324 03-Jun-2013 kib

Assert that interrupts are enabled in the trap handlers on x86 before
calling generic code to deliver signals.

Discussed with: bde
Tested by: pho
MFC after: 1 week


251039 27-May-2013 kib

Use slightly more idiomatic expression to get the address of array.

Tested by: dim, pgj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


251038 27-May-2013 kib

The _MC_HASFPXSTATE and _MC_IA32_HASFPXSTATE flags have the same bit
value on purpose, but the ia32 context handling code is logically more
correct to use the _MC_IA32_HASFPXSTATE name for the flag.

Tested by: dim, pgj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


251037 27-May-2013 kib

The ia32_get_mcontext() does not need to set PCB_FULL_IRET. The
usermode context state is not changed by the get operation, and
get_mcontext() does not require full iret as well.

Tested by: dim, pgj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


251035 27-May-2013 kib

When reporting the fault details, also print %rsp.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


251033 27-May-2013 kib

When handling an exception from the attempt from loading the faulting
context on return from the trap handler, re-enable the interrupts on
i386 and amd64. The trap return path have to disable interrupts since
the sequence of loading the machine state is not atomic. The trap()
function which transfers the control to the special handler would
enable the interrupt, but an iret loads the previous eflags with PSL_I
clear. Then, the special handler calls trap() on its own, which now
sees the original eflags with PSL_I set and does not enable
interrupts.

The end result is that signal delivery and process exiting code could
be executed with interrupts disabled, which is generally wrong and
triggers several assertions.

For amd64, the interrupts are enabled conditionally based on PSL_I in
the eflags of the outer frame, as it is already done for
doreti_iret_fault. For i386, the interrupts are enabled
unconditionally, the ast loop could have opened a window with
interrupts enabled just before the iret anyway.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


250963 24-May-2013 achim

Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver.

Approved by: scottl (mentor)


250884 21-May-2013 attilio

o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


250851 21-May-2013 kib

Fix the hardware watchpoints on SMP amd64. Load the updated %dr
registers also on other CPUs, besides the CPU which happens to execute
the ddb. The debugging registers are stored in the pcpu area,
together with the command which is executed by the IPI stop handler
upon resume.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


250850 21-May-2013 kib

Add amd64-specific ddb command 'show phys2dmap', which calculates the
address in the direct map corresponding to the given physical address.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


250840 21-May-2013 marcel

Add basic support for FDT to i386 & amd64. This change includes:
1. Common headers for fdt.h and ofw_machdep.h under x86/include
with indirections under i386/include and amd64/include.
2. New modinfo for loader provided FDT blob.
3. Common x86_init_fdt() called from hammer_time() on amd64 and
init386() on i386.
4. Split-off FDT specific low-level console functions from FDT
bus methods for the uart(4) driver. The low-level console
logic has been moved to uart_cpu_fdt.c and is used for arm,
mips & powerpc only. The FDT bus methods are shared across
all architectures.
5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from: Juniper Networks, Inc.


250624 13-May-2013 ed

Improve readability of static assertions for OFFSET_* macros.

Instead of doing all sorts of weird casting of constants to
pointer-pointers, simply use the standard C offsetof() macro to obtain
the offset of the respective fields in the structures.


250544 12-May-2013 peter

Tidy up some CVS workarounds.


250495 11-May-2013 rpaulo

Fix several standard extended feature bits.

Submitted by: Oliver Pinter <oliver.pntr at gmail.com>


250427 10-May-2013 neel

Support array-type of stats in bhyve.

An array-type stat in vmm.ko is defined as follows:
VMM_STAT_ARRAY(IPIS_SENT, VM_MAXCPU, "ipis sent to vcpu");

It is incremented as follows:
vmm_stat_array_incr(vm, vcpuid, IPIS_SENT, array_index, 1);

And output of 'bhyvectl --get-stats' looks like:
ipis sent to vcpu[0] 3114
ipis sent to vcpu[1] 0

Reviewed by: grehan
Obtained from: NetApp


250423 09-May-2013 dchagin

Retire write-only PCB_GS32BIT pcb flag on amd64.


250415 09-May-2013 kib

Correct the type for the literal used on the left side of the shift up
to 63 bit positions.

Do not fill the save area and do not set the saved bit in the xstate
bit vector for the state which is not marked as enabled in xsave_mask.

Reported and tested by: Jim Ohlstein <jim@ohlste.in>
MFC after: 3 days


250338 07-May-2013 attilio

Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept. The change should also be useful
for consolidation and consistency.

Sponsored by: EMC / Isilon storage division
Obtained from: jeff
Reviewed by: alc


250175 02-May-2013 emaste

Switch to standard copyright license text

The initial version of this came from Sandvine but had "PROVIDED BY NETAPP,
INC" in the copyright text, presuambly because the license block was copied
from another file. Replace it with standard "AUTHOR AND CONTRIBUTORS" form.

Approvided by: grehan@


250153 01-May-2013 kib

Partially saved extended state must be handled always, i.e. for both
fpu-owned context, and for pcb-saved one. More, the XSAVE could do
partial save, same as XSAVEOPT, so qualifier for the handler should be
use_xsave and not use_xsaveopt.

Since xsave_area_desc is now needed regardless of the XSAVEOPT use,
remove the write-only use_xsaveopt variable.

In collaboration with: jhb
MFC after: 1 week


250152 01-May-2013 kib

The check to ensure that xstate_bv always has XFEATURE_ENABLED_X87 and
XFEATURE_ENABLED_SSE bits set is not needed. CPU correctly handles
any bitmask which is subset of the enabled bits in %XCR0.

More, CPU instructions XSAVE and XSAVEOPT could write the mask without
e.g. XFEATURE_ENABLED_SSE, after the VZEROALL. The check prevents the
restoration of the otherwise valid FPU save area.

In collaboration with: jhb
MFC after: 1 week


250079 29-Apr-2013 carl

Add a new driver to support the Intel Non-Transparent Bridge(NTB).

The NTB allows you to connect two systems with this device using a PCI-e
link. The driver is made of two modules:
- ntb_hw which is a basic hardware abstraction layer for the device.
- if_ntb which implements the ntb network device and the communication
protocol.

The driver is limited at the moment to CPU memcpy instead of using DMA, and
only Back-to-Back mode is supported. Also the network device isn't full
featured yet. These changes will be coming soon. The DMA change will also
bring in the ioat driver from the project branch it is on now.

This is an initial port of the GPL/BSD Linux driver contributed by Jon Mason
from Intel. Any bugs are my contributions.

Sponsored by: Intel
Reviewed by: jimharris, joel (man page only)
Approved by: jimharris (mentor)


249879 25-Apr-2013 grehan

Add RIP-relative addressing to the instruction decoder.
Rework the guest register fetch code to allow the RIP to
be extracted from the VMCS while the kernel decoder is
functioning.

Hit by the OpenBSD local-apic code.

Submitted by: neel
Reviewed by: grehan
Obtained from: NetApp


249601 18-Apr-2013 rpaulo

Print RDSEED, ADX, and SMAP.

Pointed out by: kib


249588 17-Apr-2013 gabor

- Correct spelling in comments

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)


249577 17-Apr-2013 rpaulo

Print more bits from the standard extended features CPUID which will be
available in the Haswell architecture (c.f. Intel Document #319433-012A).


249450 13-Apr-2013 neel

Create sysctl node 'hw.vmm.vmx' and populate it with oids that expose the VMX
hardware capabilities.

Obtained from: NetApp


249439 13-Apr-2013 kib

Fix the name of the pcb member in the comments.

Submitted by: Oliver Pinter <oliver.pntr@gmail.com>
MFC after: 3 days


249435 13-Apr-2013 neel

Use the MAKEDEV_CHECKNAME flag to check for an invalid device name and return
an error instead of panicking.

Obtained from: NetApp


249410 12-Apr-2013 trasz

Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE'
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.

With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.

Reviewed by: ken
Sponsored by: FreeBSD Foundation


249396 12-Apr-2013 neel

If vmm.ko could not be initialized correctly then prevent the creation of
virtual machines subsequently.

Submitted by: Chris Torek


249351 11-Apr-2013 neel

Make the code to check if VMX is enabled more readable by using macros
instead of magic numbers.

Discussed with: Chris Torek


249324 10-Apr-2013 neel

Unsynchronized TSCs on the host require special handling in bhyve:

- use clock_gettime(2) as the time base for the emulated ACPI timer instead
of directly using rdtsc().

- don't advertise the invariant TSC capability to the guest to discourage it
from using the TSC as its time base.

Discussed with: jhb@ (about making 'smp_tsc' a global)
Reported by: Dan Mack on freebsd-virtualization@
Obtained from: NetApp


249268 08-Apr-2013 glebius

Merge from projects/counters: counter(9).

Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.

See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.

In collaboration with: kib
Reviewed by: luigi
Tested by: ae, ray
Sponsored by: Nginx, Inc.


249265 08-Apr-2013 glebius

Merge from projects/counters:

Pad struct pcpu so that its size is denominator of PAGE_SIZE. This
is done to reduce memory waste in UMA_PCPU_ZONE zones.

Sponsored by: Nginx, Inc.


249174 05-Apr-2013 grehan

Don't panic when a valid divisor of 1 has been requested.

Obtained from: NetApp


249083 04-Apr-2013 mav

Remove all legacy ATA code parts, not used since options ATA_CAM enabled in
most kernels before FreeBSD 9.0. Remove such modules and respective kernel
options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the
atacontrol utility and some man pages. Remove useless now options ATA_CAM.

No objections: current@, stable@
MFC after: never


248938 31-Mar-2013 neel

Add counter to keep track of the number of timer interrupts generated by
the local apic for each virtual cpu.


248935 30-Mar-2013 neel

Add some more stats to keep track of all the reasons that a vcpu is exiting.


248855 28-Mar-2013 neel

Allow caller to skip 'guest linear address' validation when doing instruction
decode. This is to accomodate hardware assist implementations that do not
provide the 'guest linear address' as part of nested page fault collateral.

Submitted by: Anish Gupta (akgupt3 at gmail dot com)


248508 19-Mar-2013 kib

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


248449 18-Mar-2013 attilio

Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
resident/cached pages collections as the per-vm_page_t splay iterators
are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations. In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone. This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves. However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan. It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by: EMC / Isilon storage division
In collaboration with: alc, jeff
Tested by: flo, pho, jhb, davide
Tested by: ian (arm)
Tested by: andreast (powerpc)


248392 16-Mar-2013 neel

Fix the '-Wtautological-compare' warning emitted by clang for comparing the
unsigned enum type with a negative value.

Obtained from: NetApp


248389 16-Mar-2013 neel

Allow vmm stats to be specific to the underlying hardware assist technology.
This can be done by using the new macros VMM_STAT_INTEL() and VMM_STAT_AMD().
Statistic counters that are common across the two are defined using VMM_STAT().

Suggested by: Anish Gupta
Discussed with: grehan
Obtained from: NetApp


248280 14-Mar-2013 kib

Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks


248140 10-Mar-2013 alc

The kernel pmap is statically allocated, so there is really no need to
explicitly initialize its pm_root field to zero.

Sponsored by: EMC / Isilon Storage Division


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


247870 06-Mar-2013 bryanv

Remove the virtio dependency entry for the VirtIO device drivers. This
will prevent the kernel from linking if the device driver are included
without the virtio module. Remove pci and scbus for the same reason.

Also explain the relationship and necessity of the virtio and virtio_pci
modules. Currently in FreeBSD, we only support VirtIO PCI, but it could
be replaced with a different interface (like MMIO) and the device
(network, block, etc) will still function.

Requested by: luigi
Approved by: grehan (mentor)
MFC after: 3 days


247814 04-Mar-2013 ken

Re-enable CTL in GENERIC on i386 and amd64, but turn on the CTL disable
tunable by default.

This will allow GENERIC configurations to boot on small memory boxes, but
not require end users who want to use CTL to recompile their kernel. They
can simply set kern.cam.ctl.disable=0 in loader.conf.

The eventual solution to the memory usage problem is to change the way
CTL allocates memory to be more configurable, but this should fix things
for small memory situations in the mean time.

UPDATING: Explain the change in the CTL configuration, and
how users can enable CTL if they would like to use
it.

sys/conf/options: Add a new option, CTL_DISABLE, that prevents CTL
from initializing.

ctl.c: If CTL_DISABLE is turned on, don't initialize.

i386/conf/GENERIC,
amd64/conf/GENERIC: Re-enable device ctl, and add the CTL_DISABLE
option.


247622 02-Mar-2013 attilio

Merge from vmc-playground branch:
Rename the pv_entry_t iterator from pv_list to pv_next.
Besides being more correct technically (as the name seems to suggest
this is a list while it is an iterator), it will also be needed by
vm_radix work to avoid a nameclash on macro expansions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: flo, pho, jhb, davide


247615 02-Mar-2013 adrian

Disable the ctl driver in GENERIC.

It unfortunately steals a fair chunk of RAM at startup even if it's not
actively used, which prevents FreeBSD VMs of 128MB from successfully
booting and running.


247454 28-Feb-2013 davide

MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.


247400 27-Feb-2013 attilio

Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


247047 20-Feb-2013 kib

Convert machine/elf.h, machine/frame.h, machine/sigframe.h,
machine/signal.h and machine/ucontext.h into common x86 includes,
copying from amd64 and merging with i386.

Kernel-only compat definitions are kept in the i386/include/sigframe.h
and i386/include/signal.h, to reduce amd64 kernel namespace pollution.
The amd64 compat uses its own definitions so far.

The _MACHINE_ELF_WANT_32BIT definition is to allow the
sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions
on the amd64 compile host. The same hack could be usefully abused by
other code too.


246855 15-Feb-2013 jkim

Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is
no functional change.


246802 14-Feb-2013 kib

Print slightly more useful information on the 'bad pte' panic.

No objections from: alc
MFC after: 1 week


246801 14-Feb-2013 kib

Assert that user address is never qremoved.

No objections from: alc
MFC after: 1 week


246774 13-Feb-2013 neel

Requests for invalid CPUID leaves should map to the highest known leaf instead.

Reviewed by: grehan
Obtained from: NetApp


246686 11-Feb-2013 neel

Implement guest vcpu pinning using 'pthread_setaffinity_np(3)'.

Prior to this change pinning was implemented via an ioctl (VM_SET_PINNING)
that called 'sched_bind()' on behalf of the user thread.

The ULE implementation of 'sched_bind()' bumps up 'td_pinned' which in turn
runs afoul of the assertion '(td_pinned == 0)' in userret().

Using the cpuset affinity to implement pinning of the vcpu threads works with
both 4BSD and ULE schedulers and has the happy side-effect of getting rid
of a bunch of code in vmm.ko.

Discussed with: grehan


246384 06-Feb-2013 neel

Compute the number of initial kernel page table pages (NKPT) dynamically.

This eliminates the need to recompile the kernel when the default value
of NKPT is not big enough - for e.g. when loading large kernel modules
or memory disk images from the loader.

If NKPT is defined in the kernel configuration file then it overrides the
dynamic calculation.

Reviewed by: alc, kib


246248 02-Feb-2013 avg

cpususpend_handler: mark AP as resumed only after fully setting up lapic

Reviewed by: jhb
Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>,
KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after: 12 days


246247 02-Feb-2013 avg

x86 suspend/resume: suspend pics and pseudo-pics in reverse order

- change 'pics' from STAILQ to TAILQ
- ensure that Local APIC is always first in 'pics'

Reviewed by: jhb
Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>,
KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after: 12 days


246222 01-Feb-2013 eadler

Remove support for plip from the GENERIC kernel as no systems in the
last 10 years require this support.

Discussed with: db
Discussed with: kib
Reviewed by: imp
Reviewed by: jhb
Reviewed by: -hackers
Approved by: cperciva (mentor)


246191 01-Feb-2013 neel

Fix a broken assumption in the passthru implementation that the MSI-X table
can only be located at the beginning or the end of the BAR.

If the MSI-table is located in the middle of a BAR then we will split the
BAR into two and create two mappings - one before the table and one after
the table - leaving a hole in place of the table so accesses to it can be
trapped and emulated.

Obtained from: NetApp


246188 01-Feb-2013 neel

Increase the number of passthru devices supported by bhyve.

The maximum length of an environment variable puts a limitation on the
number of passthru devices that can be specified via a single variable.
The workaround is to allow user to specify passthru devices via multiple
environment variables instead of a single one.

Obtained from: NetApp


246108 30-Jan-2013 neel

Add emulation support for instruction "88/r: mov r/m8, r8".

This instruction moves a byte from a register to a memory location.

Tested by: tycho nightingale at pluribusnetworks com


246085 29-Jan-2013 jhb

Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by: Chagin Dmitry dmitry | gmail
MFC after: 2 weeks


245917 25-Jan-2013 grehan

Always allow access to the sysenter cs/esp/eip MSRs since they
are automatically saved and restored in the VMCS.

Reviewed by: neel
Obtained from: NetApp


245849 23-Jan-2013 jhb

Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are). Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after: 2 weeks


245704 21-Jan-2013 neel

Postpone vmm module initialization until after SMP is initialized - particularly
that 'smp_started != 0'.

This is required because the VT-x initialization calls smp_rendezvous()
to set the CR4_VMXE bit on all the cpus.

With this change we can preload vmm.ko from the loader.

Reported by: alfred@, sbruno@
Obtained from: NetApp


245678 20-Jan-2013 neel

Add svn properties to the recently merged bhyve source files.

The pre-commit hook will not allow any commits without the svn:keywords
property in head.


245652 19-Jan-2013 neel

Merge projects/bhyve to head.

'bhyve' was developed by grehan@ and myself at NetApp (thanks!).

Special thanks to Peter Snyder, Joe Caradonna and Michael Dexter for their
support and encouragement.

Obtained from: NetApp


245640 19-Jan-2013 jhb

Fix build with SMP disabled.`

Reported by: bf


245577 17-Jan-2013 jhb

Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7). The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.

MFC after: 2 weeks


245362 13-Jan-2013 bryanv

Add VirtIO to the i386 and amd64 GENERIC kernels

This also removes the kludge from r239009 that covered only
the network driver.

Reviewed by: grehan
Approved by: grehan (mentor)
MFC after: 1 week


245204 09-Jan-2013 neel

Add a "pause" to busy wait loops in the cpu reset path.

This should not matter much when running on bare metal but it makes the guest
more friendly when running inside a virtual machine.

Discussed with: jhb
Obtained from: NetApp


245003 03-Jan-2013 kib

Enable the UFS quotas for big-iron GENERIC kernels.

Discussed with: mckusick
MFC after: 2 weeks


244992 03-Jan-2013 des

As discussed on -current last October, remove the firewire drivers from
GENERIC.


244191 13-Dec-2012 jimharris

Revert r243960 based on feedback regarding keeping x86 headers unified
(mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@).

Alternate implementation will be made in a separate commit.


244144 12-Dec-2012 grehan

Implement an API to allow a hypervisor to save/restore
guest floating point state without having to know the
size of floating-point state.

Unstaticize fpurestore to allow the hypervisor to
save/restore guest state using fpusave/fpurestore
on the allocated FPU state area.

Reviewed by: kib
Obtained from: NetApp/bhyve
MFC after: 1 week


244077 10-Dec-2012 kib

Add amd64-specific ddb command "show pte". The command displays the
hierarchy of the page table entries which map the specified address.

Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


243960 06-Dec-2012 jimharris

Add amd64 implementations for 8-byte bus_space routines.

Submitted by: Carl Delsey <carl.r.delsey@intel.com>
Discussed with: jhb, rwatson
Reviewed by: jimharris
MFC after: 1 week


243836 03-Dec-2012 kib

Print the frame addresses for the backtraces on i386 and amd64. It
allows both to inspect the frame sizes and to manually peek into the
frames from ddb, if needed.

Reviewed by: dim
MFC after: 2 weeks


243737 01-Dec-2012 jkim

Remove duplicate code. Reduce diff between amd64 and i386.


243712 30-Nov-2012 jkim

Use volatile keywords properly.


243685 30-Nov-2012 jkim

Tidy up inline assembly. No functional change.


243132 16-Nov-2012 kib

Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h
to vm/vm_phys.h, where it belongs.

Requested and reviewed by: alc
MFC after: 2 weeks


243040 14-Nov-2012 kib

Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks


242828 09-Nov-2012 kib

Do not try to enable new features in the %cr4 if running under
hypervisor. Apparently, hypervisors failed to filter out 'Standard
Extended Features' report from CPUID, but deliver #gp when
corresponding bit in %cr4 is toggled.

This shall be reconsidered later, after hypervisors correct the bug.

Reported and tested by: joel
Reviewed by: avg
MFC after: 2 weeks


242534 03-Nov-2012 attilio

Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by: alc, jimharris


242433 01-Nov-2012 kib

Enable the new instructions for reading and writing bases for %fs,
%gs, when supported. Note that WRFSBASE and WRGSBASE are not very
useful on FreeBSD right now, because a return from the kernel mode to
userspace reloads the bases specified by the sysarch(2) syscall, most
likely.

Enable the Supervisor Mode Execution Prevention (SMEP) when
supported. Since the loader(8) performs hand-off to the kernel with
the page tables which contradict the SMEP, postpone enabling the SMEP
on BSP until pmap switched for the proper kernel tables.

Debugged with the help from: avg
Tested by: avg, Michael Moll <kvedulv@kvedulv.de>
MFC after: 1 month


242432 01-Nov-2012 kib

Provide the reading and display of the Standard Extended Features,
introduced with the IvyBridge CPUs. Provide the definitions for new
bits in CR3 and CR4 registers.

Tested by: avg, Michael Moll <kvedulv@kvedulv.de>
MFC after: 2 weeks


241880 22-Oct-2012 eadler

The 'testing memory' patch gets printed too many times

Approved by: cperciva (implicit)


241850 22-Oct-2012 eadler

Explain the upcoming delay by printing a message when the kernel
is about to begin testing memory.

Reviewed by: dteske, adri
Approved by: cperciva
MFC after: 1 week


241549 14-Oct-2012 kib

Print the %rip value for uprintf_signal.

MFC after: 1 week


241540 14-Oct-2012 avg

pciereg_cfg*: use assembly to access the mem-mapped cfg space

AMD BKDG for CPU families 10h and later requires that the memory
mapped config is always read into or written from al/ax/eax register.

Discussed with: kib, alc
Reviewed by: kib (earlier version)
MFC after: 25 days


241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


241374 09-Oct-2012 attilio

Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by: bde, kib
Discussed with: dim, theraven
MFC after: 2 weeks


241371 09-Oct-2012 attilio

Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by: marius


241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


241027 28-Sep-2012 jhb

- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
defined by standards ($PIR table, SMAP entries, etc.) available to
userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
and smbios(4) in <machine/pc/bios.h> and make them available to
userland.

MFC after: 2 weeks


241020 28-Sep-2012 alc

Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.


240773 21-Sep-2012 dim

After r205013, amd64 and i386 CPU family and model IDs were printed out
in hexadecimal, but without any 0x prefix, which can be very misleading.

MFC after: 3 days


240618 17-Sep-2012 jimharris

Integrate nvme(4) and nvd(4) into the amd64 and i386 builds.

Sponsored by: Intel


240455 13-Sep-2012 kib

Rename the IVY_RNG option to RDRAND_RNG.

Based on submission by: Arthur Mesh <arthurmesh@gmail.com>
MFC after: 2 weeks


240317 10-Sep-2012 alc

Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures. Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.

MFC after: 10 days


240244 08-Sep-2012 attilio

userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by: kib
MFC after: 1 week


240135 05-Sep-2012 kib

Add support for new Intel on-CPU Bull Mountain random number
generator, found on IvyBridge and supposedly later CPUs, accessible
with RDRAND instruction.

From the Intel whitepapers and articles about Bull Mountain, it seems
that we do not need to perform post-processing of RDRAND results, like
AES-encryption of the data with random IV and keys, which was done for
Padlock. Intel claims that sanitization is performed in hardware.

Make both Padlock and Bull Mountain random generators support code
covered by kernel config options, for the benefit of people who prefer
minimal kernels. Also add the tunables to disable hardware generator
even if detected.

Reviewed by: markm, secteam (simon)
Tested by: bapt, Michael Moll <kvedulv@kvedulv.de>
MFC after: 3 weeks


240126 05-Sep-2012 alc

Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the
comment describing them. Both the function names and the comment had grown
stale. Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page. Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.


240104 04-Sep-2012 delphij

Add hpt27xx to GENERIC kernel for amd64 and i386 systems.

MFC after: 2 weeks


240098 04-Sep-2012 jhb

Fix duplicate entries for mwl(4):
- Move mwlfw from {amd64,i386}/conf/NOTES to sys/conf/NOTES (mwl(4) is
already present in sys/conf/NOTES).
- Remove duplicate mwl(4) entries from {amd64,i386}/conf/NOTES.
- While here, add a description to the sfxge line in amd64/conf/NOTES.


239771 28-Aug-2012 jhb

Fix misspelled "Infiniband".

Submitted by: gcooper
MFC after: 3 days


239699 26-Aug-2012 gjb

Grammar fix: s/NIC's/NICs/

MFC after: 3 days


239255 14-Aug-2012 des

As discussed on -current, remove the hardcoded default maxswzone.

MFC after: 3 weeks


239252 14-Aug-2012 kib

Add a hackish debugging facility to provide a bit of information about
reason for generated trap. The dump of basic signal information and 8
bytes of the faulting instruction are printed on the controlling
terminal of the process, if the machdep.uprintf_signal syscal is
enabled.

The print is the only practical way to debug traps from a.out
processes I am aware of. Because I have to reimplement it each time I
debug an issue with a.out support on amd64, commit the hack to main
tree.

MFC after: 1 week


239251 14-Aug-2012 kib

Real hardware, as opposed to QEMU, does not allow to have a call gate
in long mode which transfers control to 32bit code segment. Unbreak
the lcall $7,$0 implementation on amd64 by putting the 64bit user code
segment' selector into call gate, and execute the 64bit trampoline
which converts the return frame into 32bit format and switches back to
32bit mode for executing int $0x80 trampoline.

Note that all jumps over the hoops are performed in the user mode.

MFC after: 1 week


239241 13-Aug-2012 jhb

Remove the deassert INIT IPI from the IPI startup sequence for APs.
It is not listed in the boot sequence in the MP specification (1.4),
and it is explicitly ignored on modern CPUs. It was only ever required
when bootstrapping systems with external APICs (that is, SMP machines
with 486s), which FreeBSD has never supported (and never will).

While here, tidy some comments and remove some banal ones.


239235 13-Aug-2012 jhb

Add a 10 millisecond delay after sending the initial INIT IPI. This
matches the algorithm in the MP specification (1.4). Previously we
were sending out the deassert INIT IPI immediately after the initial
INIT IPI was sent.


239228 13-Aug-2012 cperciva

Build modules along with the XENHVM kernels.

No objections from: freebsd-xen mailing list
MFC after: 1 week


239137 08-Aug-2012 alc

The assertion that I added in r238889 could legitimately fail when a
debugger creates a breakpoint. Replace that assertion with a narrower
one that still achieves my objective.

Reported and tested by: kib


239125 07-Aug-2012 kib

Do not apply errata 721 workaround when under hypervisor, since
typical hypervisor does not implement access to the required MSR,
causing #GP on boot.

Reported and tested by: olgeni
PR: amd64/170388
MFC after: 3 days


239123 07-Aug-2012 pluknet

Remove duplicate header inclusion of <sys/sysent.h>

Discussed with: bz


239072 05-Aug-2012 alc

Shave off a few more cycles from the average execution time of pmap_enter()
by simplifying the control flow and reducing the live range of "om".


238972 01-Aug-2012 kib

Add lfence().

MFC after: 1 week


238970 01-Aug-2012 alc

Revise pmap_enter()'s handling of mapping updates that change the
PTE's PG_M and PG_RW bits but not the physical page frame. First,
only perform vm_page_dirty() on a managed vm_page when the PG_M bit is
being cleared. If the updated PTE continues to have PG_M set, then
there is no requirement to perform vm_page_dirty(). Second, flush the
mapping from the TLB when PG_M alone is cleared, not just when PG_M
and PG_RW are cleared. Otherwise, a stale TLB entry may stop PG_M
from being set again on the next store to the virtual page. However,
since the vm_page's dirty field already shows the physical page as
being dirty, no actual harm comes from the PG_M bit not being set.
Nonetheless, it is potentially confusing to someone expecting to see
the PTE change after a store to the virtual page.


238914 30-Jul-2012 kib

Change (unused) prototype for stmxcsr() to match reality.

Noted by: jhb
MFC after: 1 week


238889 29-Jul-2012 alc

Shave off a few more cycles from pmap_enter()'s critical section. In
particular, do a little less work with the PV list lock held.


238723 23-Jul-2012 kib

Forcibly shut up clang warning about NULL pointer dereference.

MFC after: 3 weeks


238671 21-Jul-2012 kib

Constently use 2-space sentence breaks.

Submitted by: bde
MFC after: 1 week


238670 21-Jul-2012 kib

Stop caching curpcb in the local variable.

Requested by: bde
MFC after: 1 week


238669 21-Jul-2012 kib

The PT_I386_{GET,SET}XMMREGS and PT_{GET,SET}XSTATE operate on the
stopped threads. Implementation assumes that the thread's FPU context
is spilled into the PCB due to stop. This is mostly true, except when
FPU state for the thread is not initialized. Then the requests operate
on the garbage state which is currently left in the PCB, causing
confusion.

The situation is indeed observed after a signal delivery and before
#NM fault on execution of any FPU instruction in the signal handler,
since sendsig(9) drops FPU state for current thread, clearing
PCB_FPUINITDONE. When inspecting context state for the signal handler,
debugger sees the FPU state of the main program context instead of the
clear state supposed to be provided to handler.

Fix this by forcing clean FPU state in PCB user FPU save area by
performing getfpuregs(9) before accessing user FPU save area in
ptrace_machdep.c.

Note: this change will be merged to i386 kernel as well, where it is
much more important, since e.g. gdb on i386 uses PT_I386_GETXMMREGS to
inspect FPU context on CPUs that support SSE. Amd64 version of gdb
uses PT_GETFPREGS to inspect both 64 and 32 bit processes, which does
not exhibit the bug.

Reported by: bde
MFC after: 1 week


238668 21-Jul-2012 kib

Stop clearing x87 exceptions in the #MF handler on amd64. If user code
understands FPU hardware enough to catch SIGFPE and unmask exceptions
in control word, then it may as well properly handle return from
SIGFPE without causing an infinite loop of #MF exceptions due to
faulting instruction restart, when needed.

Clearing exceptions causes information loss for handlers which do
understand FPU hardware, and struct siginfo si_code member cannot be
considered adequate replacement for en_sw content due to translation.

Supposed reason for clearing the exceptions, which is IRQ13 handling
oddities, were never applicable to amd64.

Note: this change will be merged to i386 kernel as well, since we do
not support IRQ13 delivery of #MF notifications for some time.

Requested by: bde
MFC after: 1 week


238623 19-Jul-2012 kib

Introduce curpcb magic variable, similar to curthread, which is MD
amd64. It is implemented as __pure2 inline with non-volatile asm read
from pcpu, which allows a compiler to cache its results.

Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb.

Note that __curthread() uses magic value 0 as an offsetof(struct pcpu,
pc_curthread). It seems to be done this way due to machine/pcpu.h
needs to be processed before sys/pcpu.h, because machine/pcpu.h
contributes machine-depended fields to the struct pcpu definition. As
result, machine/pcpu.h cannot use struct pcpu yet.

The __curpcb() also uses a magic constant instead of offsetof(struct
pcpu, pc_curpcb) for the same reason. The constants are now defined as
symbols and CTASSERTs are added to ensure that future KBI changes do
not break the code.

Requested and reviewed by: bde
MFC after: 3 weeks


238610 19-Jul-2012 alc

Don't unnecessarily set PGA_REFERENCED in pmap_enter().


238598 18-Jul-2012 kib

On AMD64, provide siginfo.si_code for floating point errors when error
occurs using the SSE math processor. Update comments describing the
handling of the exception status bits in coprocessors control words.

Remove GET_FPU_CW and GET_FPU_SW macros which were used only once.
Prefer to use curpcb to access pcb_save over the longer path of
referencing pcb through the thread structure.

Based on the submission by: Ed Alley <wea llnl gov>
PR: amd64/169927
Reviewed by: bde
MFC after: 3 weeks


238597 18-Jul-2012 kib

Add stmxcsr.

Submitted by: Ed Alley <wea llnl gov>
PR: amd64/169927
MFC after: 3 weeks


238450 14-Jul-2012 kib

Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage
mostly meets the guidelines set by the Intel SDM:
1. We use XRSTOR and XSAVE from the same CPL using the same linear
address for the store area
2. Contrary to the recommendations, we cannot zero the FPU save area
for a new thread, since fork semantic requires the copy of the
previous state. This advice seemingly contradicts to the advice
from the item 6.
3. We do use XSAVEOPT in the context switch code only, and the area
for XSAVEOPT already always contains the data saved by XSAVE.
4. We do not modify the save area between XRSTOR, when the area is
loaded into FPU context, and XSAVE. We always spit the fpu context
into save area and start emulation when directly writing into FPU
context.
5. We do not use segmented addressing to access save area, or rather,
always address it using %ds basing.
6. XSAVEOPT can be only executed in the area which was previously
loaded with XRSTOR, since context switch code checks for FPU use by
outgoing thread before saving, and thread which stopped emulation
forcibly get context loaded with XRSTOR.
7. The PCB cannot be paged out while FPU emulation is turned off, since
stack of the executing thread is never swapped out.

The context switch code is patched to issue XSAVEOPT instead of XSAVE
if supported. This approach eliminates one conditional in the context
switch code, which would be needed otherwise.

For user-visible machine context to have proper data, fpugetregs()
checks for unsaved extension blocks and manually copies pristine FPU
state into them, according to the description provided by CPUID leaf
0xd.

MFC after: 1 month


238414 13-Jul-2012 alc

Wring a few cycles out of pmap_enter(). In particular, on a user-space
pmap, avoid walking the page table twice.


238311 09-Jul-2012 jhb

Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h>
on x86 and use that to implement stop_emulating() in the fpu/npx code.
Reimplement start_emulating() in the non-XEN case by using load_cr0() and
rcr0() instead of the 'lmsw' and 'smsw' instructions. Intel explicitly
discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in
the description of these instructions in Volume 2 of the ADM.

Reviewed by: kib
MFC after: 1 month


238310 09-Jul-2012 jhb

Partially revert r217515 so that the mem_range_softc variable is always
present on x86 kernels. This fixes the build of kernels that include
'device acpi' but do not include 'device mem'.

MFC after: 1 month


238179 06-Jul-2012 kib

Use assembler mnemonic instead of manually assembling, contination for r238142.

Reviewed by: jhb
MFC after: 1 month


238166 06-Jul-2012 jhb

Several fixes to the amd64 disassembler:
- Add generic support for opcodes that are escape bytes used for
multi-byte opcodes (such as the 0x0f prefix). Use this to replace
the hard-coded 0x0f special case and add support for three-byte
opcodes that use the 0x0f38 prefix.
- Decode all Intel VMX instructions. invept and invvpid in particular are
three-byte opcodes that use the 0x0f38 escape prefix.
- Rework how the special 'SDEP' size flag works such that the default
instruction name (i_name) is the instruction when the data size
prefix (0x66) is not specified, and the alternate name in i_extra is
used when the prefix is included.
- Add a new 'ADEP' size flag similar to 'SDEP' except that it chooses
between i_name and i_extra based on the address size prefix (0x67).
Use this to fix the decoding for jrcxz vs jecxz which is determined
by the address size prefix, not the operand size prefix. Also, jcxz
is not possible in 64-bit mode, but jrcxz is the default instruction
for that opcode.
- Add support for handling instructions that have a mandatory 'rep'
prefix (this means not outputting the 'repe ' prefix until determining
if it is used as part of an opcode). Make 'pause' less of a special
case this way.
- Decode 'cmpxchg16b' and 'cdqe' which are variants of other instructions
but with a REX.W prefix.

MFC after: 1 month


238163 06-Jul-2012 alc

Make pmap_enter()'s management of PV entries consistent with the other pmap
functions that manage PV entries. Specifically, remove the PV entry from
the containing PV list only after the corresponding PTE is destroyed.

Update the pmap's wired mapping count in pmap_enter() before the PV list
lock is acquired.


238142 05-Jul-2012 jhb

Now that our assembler supports the xsave family of instructions, use them
natively rather than hand-assembled versions. For xgetbv/xsetbv, add a
wrapper API to deal with xcr* registers: rxcr() and load_xcr().

Reviewed by: kib
MFC after: 1 month


238126 05-Jul-2012 alc

Calculate the new PTE value in pmap_enter() before acquiring any locks.

Move an assertion to the beginning of pmap_enter().


238124 05-Jul-2012 alc

Correct an error in r237513. The call to reserve_pv_entries() must come
before pmap_demote_pde() updates the PDE. Otherwise, pmap_pv_demote_pde()
can crash.

Crash reported by: kib
Patch tested by: kib


238109 04-Jul-2012 jhb

Decode the 'xsave', 'xrstor', 'xsaveopt', 'xgetbv', 'xsetbv', and
'rdtscp' instructions.

MFC after: 1 month


237901 01-Jul-2012 delphij

tws(4) is interfaced with CAM so move it to the same section.

Reported by: joel
MFC after: 3 days


237855 30-Jun-2012 alc

Optimize reserve_pv_entries() using the popcnt instruction.


237813 29-Jun-2012 alc

In r237592, I forgot that pmap_enter() might already hold a PV list lock
at the point that it calls get_pv_entry(). Thus, pmap_enter()'s PV list
lock pointer must be passed to get_pv_entry() for those rare occasions
when get_pv_entry() calls reclaim_pv_chunk().

Update some related comments.


237733 28-Jun-2012 alc

Avoid some unnecessary PV list locking in pmap_enter().


237684 28-Jun-2012 alc

Optimize pmap_pv_demote_pde().


237623 27-Jun-2012 alc

Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.


237604 26-Jun-2012 alc

Introduce RELEASE_PV_LIST_LOCK().


237592 26-Jun-2012 alc

Add PV list locking to pmap_enter(). Its execution is no longer serialized
by the pvh global lock.

Add a needed atomic operation to pmap_object_init_pt().


237551 25-Jun-2012 alc

Add PV chunk and list locking to pmap_change_wiring(), pmap_protect(), and
pmap_remove(). The execution of these functions is no longer serialized
by the pvh global lock.

Make some stylistic changes to the affected code for the sake of
consistency with related code elsewhere in the pmap.


237513 23-Jun-2012 alc

Introduce reserve_pv_entry() and use it in pmap_pv_demote_pde(). In order
to add PV list locking to pmap_pv_demote_pde(), it is necessary to change
the way that pmap_pv_demote_pde() allocates PV entries. Specifically,
once pmap_pv_demote_pde() begins modifying the PV lists, it can't allocate
any new PV chunks, because that could require the PV list lock to be
dropped. So, all necessary PV chunks must be allocated in advance. To my
surprise, this new approach is a few percent faster than the old one.


237433 22-Jun-2012 kib

Implement mechanism to export some kernel timekeeping data to
usermode, using shared page. The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with: bde
Reviewed by: jhb
Tested by: flo
MFC after: 1 month


237430 22-Jun-2012 kib

Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to
timekeeping information.

MFC after: 1 week


237414 22-Jun-2012 alc

Introduce CHANGE_PV_LIST_LOCK_TO_{PHYS,VM_PAGE}() to avoid duplication of
code.


237404 21-Jun-2012 alc

Update the PV stats in free_pv_entry() using atomics. After which, it is
no longer necessary for free_pv_entry() to be serialized by the pvh global
lock.

Retire pmap_insert_entry() and pmap_remove_entry(). Once upon a time,
these functions were called from multiple places within the pmap. Now,
each has only one caller.


237290 20-Jun-2012 alc

Add PV list locking to pmap_copy(), pmap_enter_object(), and
pmap_enter_quick(). These functions are no longer serialized by the pvh
global lock.

There is no need to release the PV list lock before calling free_pv_chunk()
in pmap_remove_pages().


237264 19-Jun-2012 alc

Condition the implementation of pv_entry_count on PV_STATS. On amd64,
pv_entry_count is purely informational. It does not serve any functional
purpose.

Add PV chunk locking to get_pv_entry().


237263 19-Jun-2012 np

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)


237243 18-Jun-2012 kib

Adjust the fix in r236953, by not generating the signal manually, but
performing the return to usermode using full return path. This
consolidates the handling of exceptional situations in less number of
places, and is less code as well.

Reviewed by: jhb
MFC after: 1 week


237228 18-Jun-2012 alc

Add PV chunk and list locking to pmap_page_exists_quick(),
pmap_page_is_mapped(), and pmap_remove_pages(). These functions
are no longer serialized by the pvh global lock.


237168 16-Jun-2012 alc

The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer. This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by: kib
MFC after: 6 weeks


237136 15-Jun-2012 adrian

Oops - use the actual 11n enable option.


237109 15-Jun-2012 adrian

Ok, ok. 802.11n can be on by default in GENERIC in -HEAD.

God help me.


237086 14-Jun-2012 alc

Update a couple comments to reflect r235598.

X-MFC after: r235598


237085 14-Jun-2012 alc

Correctly identify the function in a KASSERT().

MFC after: 3 days


237037 13-Jun-2012 jkim

- Remove unused code for CR3 and CR4.
- Fix few style(9) nits while I am here.


237027 13-Jun-2012 jkim

- Fix resumectx() prototypes to reflect reality.
- For i386, simply jump to resumectx() with PCB in %ecx.
- Fix a style(9) nit while I am here.


236953 12-Jun-2012 bz

Fix a problem where zero-length RDATA fields can cause named(8) to crash.
[12:03]

Correct a privilege escalation when returning from kernel if
running FreeBSD/amd64 on non-AMD processors. [12:04]

Fix reference count errors in IPv6 code. [EN-12:02]

Security: CVE-2012-1667
Security: FreeBSD-SA-12:03.bind
Security: CVE-2012-0217
Security: FreeBSD-SA-12:04.sysret
Security: FreeBSD-EN-12:02.ipv6refcount
Approved by: so (simon, bz)


236938 12-Jun-2012 iwasaki

Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c
as ipi_startup().


236930 11-Jun-2012 alc

Avoid unnecessary atomic operations for clearing PGA_WRITEABLE in
pmap_remove_pages(). This reduces pmap_remove_pages()'s running time by
4 to 11% in my tests.

MFC after: 1 week


236830 10-Jun-2012 iwasaki

Some fixes for r236772.

- Remove cpuset stopped_cpus which is no longer used.
- Add a short comment for cpuset suspended_cpus clearing.
- Fix the un-ordered x86/acpica/acpi_wakeup.c in conf/files.amd64 and i386.

Pointed-out by: attilio@


236772 09-Jun-2012 iwasaki

Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.

Reviewed by: attilio@, acpi@


236534 04-Jun-2012 alc

Various small changes to PV entry management:

Constify pc_freemask[].

pmap_pv_reclaim()
Eliminate "freemask" because it was a pessimization. Add a comment about
the resident count adjustment.

free_pv_entry() [i386 only]
Merge an optimization from amd64 (r233954).

get_pv_entry()
Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
(The right strategy needs more thought. Moreover, there were unintended
differences between the amd64 and i386 implementation.)

pmap_remove_pages()
Eliminate unnecessary ()'s.


236503 03-Jun-2012 avg

free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG

Those calls are useful with hardware watchdog drivers too.

MFC after: 3 weeks


236494 02-Jun-2012 alc

Isolate the global pv list lock from data and other locks to prevent false
sharing within the cache.


236456 02-Jun-2012 kib

Use plain store for atomic_store_rel on x86, instead of implicitly
locked xchg instruction. IA32 memory model guarantees that store has
release semantic, since stores cannot pass loads or stores.

Reviewed by: bde, jhb
Tested by: pho
MFC after: 2 weeks


236424 01-Jun-2012 jkim

Consistently use ACPI_SUCCESS() and ACPI_FAILURE() macros wherever possible.


236419 01-Jun-2012 jkim

Tidy up code clutter in SMP case a bit. No functional change.


236414 01-Jun-2012 jkim

Call AcpiSetFirmwareWakingVector() with interrupt disabled for consistency.


236409 01-Jun-2012 jkim

Improve style(9) in the previous commit.


236403 01-Jun-2012 iwasaki

Call AcpiLeaveSleepStatePrep() in interrupt disabled context
(described in ACPICA source code).

- Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c
and call AcpiLeaveSleepStatePrep() in interrupt disabled context.
- Add acpi_wakeup_machdep() to execute wakeup MD procedures and call
it twice in interrupt disabled/enabled context (ia64 version is
just dummy).
- Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in
order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep().
- Move identity mapping related code to acpi_install_wakeup_handler()
(i386 version) for preparation of x86/acpica/acpi_wakeup.c
(MFC candidate).

Reviewed by: jkim@
MFC after: 2 days


236378 01-Jun-2012 alc

Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by
introducing free_pv_chunk().


236291 30-May-2012 alc

Eliminate some purely stylistic differences among the amd64, i386 native,
and i386 xen PV entry allocators.


236027 25-May-2012 ed

Regenerate system call tables.


236026 25-May-2012 ed

Remove use of non-ISO-C integer types from system call tables.

These files already use ISO-C-style integer types, so make them less
inconsistent by preferring the standard types.


235973 25-May-2012 alc

Correct an error in pmap_pv_reclaim(). In a rare case, when it should have
returned NULL, it might instead return a pointer to a page that it had just
unmapped.


235941 24-May-2012 bz

MFp4 bz_ipv6_fast:

in_cksum.h required ip.h to be included for struct ip. To be
able to use some general checksum functions like in_addword()
in a non-IPv4 context, limit the (also exported to user space)
IPv4 specific functions to the times, when the ip.h header is
present and IPVERSION is defined (to 4).

We should consider more general checksum (updating) functions
to also allow easier incremental checksum updates in the L3/4
stack and firewalls, as well as ponder further requirements by
certain NIC drivers needing slightly different pseudo values
in offloading cases. Thinking in terms of a better "library".

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235695 20-May-2012 alc

Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c. This new r/w lock is used primarily to synchronize access
to the PV lists. However, it will be used in a somewhat unconventional
way. As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

Reviewed by: kib
X-MFC after: r235598


235598 18-May-2012 alc

Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues. Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero. However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed. The new
pmap_pv_reclaim() doesn't even try.

Reviewed by: kib
MFC after: 6 weeks


235556 17-May-2012 jhb

Centralize declaration of the debug.acpi sysctl node.


235555 17-May-2012 kib

Use singular form for a modifier.

Submitted by: alc
MFC after: 3 days


235538 17-May-2012 kib

Fix typo.

MFC after: 3 days


235226 10-May-2012 mav

Add `options GEOM_RAID` into i386 and amd64 GENERIC kernels.

ataraid(4) previously was present there and having GEOM RAID is convinient.
Unlike other classes GEOM RAID can be set up from BIOS before install and
users are expecting it to be detected automatically.


235150 09-May-2012 brooks

The DDB_CTF has little or nothing to do with the debugger so move it
next KDTRACE_HOOKS.


235063 05-May-2012 netchild

- >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
they serve mostly as examples of what you can do with the static probe;s
with moderate load the scripts may be overwhelmed, excessive lock-tracing
may influence program behavior (see the last design decission)

Design decissions:
- use "linuxulator" as the provider for the native bitsize; add the
bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
- Add probes only for locks which are acquired in one function and released
in another function. Locks which are aquired and released in the same
function should be easy to pair in the code, inter-function
locking is more easy to verify in DTrace.
- Probes for locks should be fired after locking and before releasing to
prevent races (to provide data/function stability in DTrace, see the
man-page of "dtrace -v ..." and the corresponding DTrace docs).


234989 03-May-2012 attilio

Revert part of r234723 by re-enabling the SMP protection for
intr_bind() on x86.
This has been requested by jhb and I strongly disagree with this,
but as long as he is the x86 and interrupt subsystem maintainer I will
follow his directives.

The disagreement cames from what we should really consider as a
public KPI. IMHO, if we really need a selection between the kernel
functions, we may need an explicit protection like _KERNEL_KPI, which
defines which subset of the kernel function might really be considered
as part of the KPI (for thirdy part modules) and which not.
As long as we don't have this mechanism I just consider any possible
function as usable by thirdy part code, thus intr_bind() included.

MFC after: 1 week


234785 29-Apr-2012 dim

Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after: 2 weeks


234743 27-Apr-2012 rmh

Increase DFLDSIZ from 128 MiB to 32 GiB. On amd64 there's plenty of virtual
memory available, so there is no need to be so conservative about it.

Reviewed by: arch


234723 26-Apr-2012 attilio

Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by: gibbs
Reviewed by: jhb, marius
MFC after: 1 week


234504 20-Apr-2012 brooks

Enable DTrace hooks in GENERIC.

Reviewed by: gnn
Approved by: core (jhb, imp)
Requested by: a cast of thousands
MFC after: 3 days


234360 16-Apr-2012 jkim

Regen for r234359.


234359 16-Apr-2012 jkim

Correct an argument type of iopl syscall for Linuxulator. This also fixes
a warning from Clang, i. e., "args->level < 0 is always false".


234358 16-Apr-2012 jkim

Regen for r234357.


234357 16-Apr-2012 jkim

Correct arguments of stat64, fstat64 and lstat64 syscalls for Linuxulator.


234354 16-Apr-2012 jkim

Regen for r234352.


234352 16-Apr-2012 jkim

- Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c. There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *. Probably
this was the source of MI/MD confusion.

Reviewed by: emulation


234208 13-Apr-2012 avg

add actual interrupt counters to back ipi_invlcache_counts

Otherwise one could run into a panic with COUNT_IPIS when cache
invalidation actually happened.

Reviewed by: jhb
MFC after: 1 week


234207 13-Apr-2012 avg

bump INTRCNT_COUNT values to reflect actual numbers of IPI counters

Maybe the numbers should be conditionalized on COUNT_IPIS

Reviewed by: jhb
MFC after: 1 week


234183 12-Apr-2012 jhb

Add OFED and the associated options and drivers to x86 LINT builds:
- Mark 'sdp' as requiring 'inet'.
- Always include "opt_inet.h" and "opt_inet6.h" and modify the IB
driver Makefiles to honor WITH/WITHOUT_INET/INET6/_SUPPORT options
to determine what should be enabled during a module build.
- Fix the mlxen(4) driver and the core IB code to compile without
if INET is disabled (including when both INET and INET6 are disabled).

Reviewed by: bz
MFC after: 2 weeks


234105 10-Apr-2012 marius

Fix !SMP build after r234074.

Reviewed by: attilio, jhb


234074 09-Apr-2012 attilio

BSP is not added to the mask of valid target CPUs for interrupts
in set_apic_interrupt_ids(). Besides, set_apic_interrupts_ids() is not
called in the !SMP case too.
Fix this by:
- Adding the BSP as an interrupt target directly in cpu_startup().
- Remove an obsolete optimization where the BSP are skipped in
set_apic_interrupt_ids().

Reported by: jh
Reviewed by: jhb
MFC after: 3 days
X-MFC: r233961
Pointy hat to: me


234059 09-Apr-2012 jhb

Recognize the RDRAND instruction feature.

Submitted by: Michael Fuckner michael fuckner net
MFC after: 3 days


233954 06-Apr-2012 alc

Micro-optimize free_pv_entry() for the expected case.


233872 04-Apr-2012 jhb

Add descriptions after the 'device' line for several NICs to match the
existing style.


233781 02-Apr-2012 jhb

Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible. Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).

MFC after: 2 weeks


233707 30-Mar-2012 jhb

Move the legacy(4) driver to x86.


233704 30-Mar-2012 jkim

Re-initialize model-specific MSRs when we resume CPUs.

MFC after: 1 week


233702 30-Mar-2012 jkim

Work around Erratum 721 for AMD Family 10h and 12h processors.

"Under a highly specific and detailed set of internal timing conditions,
the processor may incorrectly update the stack pointer after a long series
of push and/or near-call instructions, or a long series of pop and/or
near-return instructions. The processor must be in 64-bit mode for this
erratum to occur."

MFC after: 3 days


233676 29-Mar-2012 jhb

Use a more proper fix for enabling HT MSI mapping windows on Host-PCI
bridges. Rather than blindly enabling the windows on all of them, only
enable the window when an MSI interrupt is enabled for a device behind
the bridge, similar to what already happens for HT PCI-PCI bridges.

To implement this, each x86 Host-PCI bridge driver has to be able to
locate it's actual backing device on bus 0. For ACPI, use the _ADR
method to find the slot and function of the device. For the non-ACPI
case, the legacy(4) driver already scans bus 0 looking for Host-PCI
bridge devices. Now it saves the slot and function of each bridge that
it finds as ivars that the Host-PCI bridge driver can then use in its
pcib_map_msi() method.

This fixes machines where non-MSI interrupts were broken by the previous
round of HT MSI changes.

Tested by: bapt
MFC after: 1 week


233671 29-Mar-2012 jhb

- Rename VM_MEMATTR_UNCACHED to VM_MEMATTR_WEAK_UNCACHEABLE on x86 to
be less ambiguous and more clearly identify what it means. This
attribute is what Intel refers to as UC-, and it's only difference
relative to normal UC memory is that a WC MTRR will override a UC-
PAT entry causing the memory to be treated as WC, whereas a UC PAT
entry will always override the MTRR.
- Remove the VM_MEMATTR_UNCACHED alias from powerpc.


233628 28-Mar-2012 fabient

Add software PMC support.

New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after: 1 month


233433 24-Mar-2012 alc

Disable detailed PV entry accounting by default. Add a config option
to enable it.

MFC after: 1 week


233427 24-Mar-2012 marius

Add cas(4), gem(4) and hme(4) to x86 GENERICs as suggested by netchild@ in
<20120222095239.Horde.0hpYHJjmRSRPRKzXsoFRbYk@webmail.leidinger.net>.
According to some private emails received, it apparently is not unpopular
to use at least Quad GigaSwift cards driven by cas(4) in x86 machines.

MFC after: 1 week


233310 22-Mar-2012 joel

Add snd_cmi, snd_csa and snd_emu10kx to GENERIC on i386 and amd64.

The GPL infected parts which were blocking the inclusion of snd_csa
and snd_emu10kx in GENERIC have recently been removed from the tree.
I'm also adding snd_cmi to GENERIC, which I originally intended to
add when we enabled sound support by default.

Discussed with: jhb, pfg, Yuriy Tsibizov <yuriy.tsibizov@gfk.ru>
Approved by: jhb


233291 22-Mar-2012 alc

Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB. In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB. This is exactly as
recommended in AMD's documentation. In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry. In short, this saves us from
having to perform potentially costly TLB flushes. In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry. Usually, such spurious page faults are handled
by vm_fault() without incident. However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault(). This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with: kib
Tested by: gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after: 1 week


233290 22-Mar-2012 alc

Change pv_entry_count to a long. During the lifetime of FreeBSD 10.x,
physical memory sizes at the high-end will likely reach a point that
the number of pv entries could overflow an int.

Submitted by: kib


233271 21-Mar-2012 ed

Remove pty(4) from our kernel configurations.

As of FreeBSD 8, this driver should not be used. Applications that use
posix_openpt(2) and openpty(3) use the pts(4) that is built into the
kernel unconditionally. If it turns out high profile depend on the
pty(4) module anyway, I'd rather get those fixed. So please report any
issues to me.

The pty(4) module is still available as a kernel module of course, so a
simple `kldload pty' can be used to run old-style pseudo-terminals.


233256 21-Mar-2012 alc

Eliminate vm.pmap.shpgperproc and vm.pmap.pv_entry_max because they no
longer serve any purpose. Prior to r157446, they served a purpose
because there was a fixed amount of kernel virtual address space
reserved for pv entries at boot time. However, since that change pv
entries are accessed through the direct map, and so there is no limit
imposed by a fixed amount of kernel virtual address space.

Fix a couple of nearby style issues.

Reviewed by: jhb, kib
MFC after: 1 week


233250 20-Mar-2012 jkim

Merge ACPICA 20120320.


233249 20-Mar-2012 jkim

Fix another witness panic. We cannot enter critical section at all because
AcpiEnterSleepState() executes (optional) _GTS method since ACPICA 20120215
(r231844). To evaluate the method, we need malloc(9), which may sleep.

Reported by: bschmidt
MFC after: 3 days


233209 19-Mar-2012 tijl

Copy amd64 sysarch.h to x86 and merge with i386 sysarch.h. Replace
amd64/i386/pc98 sysarch.h with stubs.


233208 19-Mar-2012 jkim

Fix a witness panic introduced in r231797.

Reported by: bschmidt
Reviewed by: jhb
Pointy hat to: jkim
MFC after: 3 days


233207 19-Mar-2012 tijl

Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace
amd64/i386/pc98 specialreg.h with stubs.


233204 19-Mar-2012 tijl

Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs.


233203 19-Mar-2012 tijl

Move userland bits (and some common kernel bits) from amd64 and i386
segments.h to a new x86 segments.h.

Add __packed attribute to some structs (just to be sure).
Also make it clear that i386 GDT and LDT entries are used in ia64 code.


233185 19-Mar-2012 kib

Re-apply r233122 erronously reverted in r233168.

Submitted by: jhb
Pointy hat to: kib
MFC after: 2 weeks


233168 19-Mar-2012 kib

If we ever allow for managed fictitious pages, the pages shall be
excluded from superpage promotions. At least one of the reason is
that pv_table is sized for non-fictitious pages only.

Consistently check for the page to be non-fictitious before accesing
superpage pv list.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 2 weeks


233125 18-Mar-2012 tijl

Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.

Reviewed by: kib


233124 18-Mar-2012 tijl

Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98
reg.h with stubs.

The tREGISTER macros are only made visible on i386. These macros are
deprecated and should not be available on amd64.

The i386 and amd64 versions of struct reg have been renamed to struct
__reg32 and struct __reg64. During compilation either __reg32 or __reg64
is defined as reg depending on the machine architecture. On amd64 the i386
struct is also available as struct reg32 which is used in COMPAT_FREEBSD32
code.

Most of compat/ia32/ia32_reg.h is now IA64 only.

Reviewed by: kib (previous version)


233123 18-Mar-2012 tijl

Use exact width integer types in amd64/i386 reg.h to prepare for a merge.
The only real change is replacing long with int on i386.


233122 18-Mar-2012 alc

Style fix to pmap_protect().

Submitted by: bde


233097 17-Mar-2012 alc

With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists.

Reviewed by: kib


233044 16-Mar-2012 tijl

Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h.
Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed.
Create machine/npx.h on amd64 to allow compiling i386 code that uses
this header.

The original npx.h and fpu.h define struct envxmm differently. Both
definitions have been included in the new x86 header as struct __envxmm32
and struct __envxmm64. During compilation either __envxmm32 or __envxmm64
is defined as envxmm depending on machine architecture. On amd64 the i386
struct is also available as struct envxmm32.

Reviewed by: kib


232842 12-Mar-2012 alc

Simplify the error checking in one branch of trap_pfault() and update
the nearby comment.

Add missing whitespace to a return statement in trap_pfault().

Submitted by: kib [2]


232800 10-Mar-2012 netchild

regen


232799 10-Mar-2012 netchild

- add comments to syscalls.master and linux(32)_dummy about which linux
kernel version introduced the sysctl (based upon a linux man-page)
- add comments to sscalls.master regarding some names of sysctls which are
different than the linux-names (based upon the linux unistd.h)
- add some dummy sysctls
- name an unimplemented sysctl

MFC after: 1 month


232747 09-Mar-2012 jhb

Move i386's intr_machdep.c to the x86 tree and share it with amd64.


232619 06-Mar-2012 attilio

Disable the option VFS_ALLOW_NONMPSAFE by default on all the supported
platforms.
This will make every attempt to mount a non-mpsafe filesystem to the
kernel forbidden, unless it is expressely compiled with
VFS_ALLOW_NONMPSAFE option.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.


232614 06-Mar-2012 bz

Provide wbwd(4), a driver for the watchdog timer found on various
Winbond Super I/O chips.

With minor efforts it should be possible the extend the driver to support
further chips/revisions available from Winbond. In the simplest case
only new IDs need to be added, while different chipsets might require
their own function to enter extended function mode, etc.

Sponsored by: Sandvine Incorporated ULC (in 2011)
Reviewed by: emaste, brueffer
MFC after: 2 weeks


232561 05-Mar-2012 jkim

Fix few style nits.


232521 04-Mar-2012 rmh

Exclude USB drivers (except umass and ukbd) from main kernel image on i386
and amd64.

Reviewed by: hselasky, arch, usb
Approved by: kib (mentor)


232520 04-Mar-2012 tijl

Copy amd64 ptrace.h to x86 and merge with i386 ptrace.h. Replace
amd64/i386/pc98 ptrace.h with stubs.

For amd64 PT_GETXSTATE and PT_SETXSTATE have been redefined to match the
i386 values. The old values are still supported but should no longer be
used.

Reviewed by: kib


232492 04-Mar-2012 tijl

Copy amd64 trap.h to x86 and replace amd64/i386/pc98 trap.h with stubs.


232491 04-Mar-2012 tijl

Copy amd64 float.h to x86 and merge with i386 float.h. Replace
amd64/i386/pc98 float.h with stubs.


232416 03-Mar-2012 jkim

Add VESA option to GENERIC for amd64 and i386.

MFC after: 1 month


232276 28-Feb-2012 tijl

Copy amd64 stdarg.h to x86 and replace amd64/i386/pc98 stdarg.h with stubs.


232275 28-Feb-2012 tijl

Copy amd64 setjmp.h to x86 and replace amd64/i386/pc98 setjmp.h with stubs.


232266 28-Feb-2012 tijl

Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace
amd64/i386/pc98 endian.h with stubs.

In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
resolved by reimplementing the macro in terms of __bswap32(x). As a side
effect __bswap64_var(x) is now implemented using two bswap instructions on
i386 and should be much faster. __bswap32_const(x) has been reimplemented
in terms of __bswap16(x) for consistency.


232264 28-Feb-2012 tijl

Copy amd64 _stdint.h to x86 and merge with i386 _stdint.h. Replace
amd64/i386/pc98 _stdint.h with stubs.


232262 28-Feb-2012 tijl

Copy amd64 _limits.h to x86 and merge with i386 _limits.h. Replace
amd64/i386/pc98 _limits.h with stubs.


232261 28-Feb-2012 tijl

Copy amd64 _types.h to x86 and merge with i386 _types.h. Replace existing
amd64/i386/pc98 _types.h with stubs.


232228 27-Feb-2012 jhb

Resort the IDT_DTRACE_RET constant after it was changed to be less than
IDT_SYSCALL.


232227 27-Feb-2012 jhb

Correct function prototype for read_rflags().


232226 27-Feb-2012 jhb

Update incorrect comment.


231840 16-Feb-2012 jkim

Refine r231791. Install the resume event handler unconditionally.


231797 15-Feb-2012 jkim

Clean up RFLAG and CR3 register handling and nearby comments. For BSP, use
spinlock_enter()/spinlock_exit() to save/restore RFLAGS. We know interrupt
is disabled when returning from S3. For AP, we do not have to save/restore
it because IRET will do it for us any way. Do not save CR3 locally because
savectx() does it and BSP does not have to switch to kernel map for amd64.
Change contigmalloc(9) flag while I am in the neighborhood.


231791 15-Feb-2012 jkim

Set up an event handler to turn off speaker if user requested it. Speaker
will stop beeping after all device drivers are resumed. Use proper API to
"acquire" and "release" PIC timer2 for consistency and correctness.


231787 15-Feb-2012 jkim

Make ACPI resume beeper less cryptic. Set PIC timer2 mode properly.


231781 15-Feb-2012 jkim

Some BIOSes are known for corrupting low 64KB between suspend and resume.
Mask off the first 16 pages unless we appear to be running in a VM. This
address may be overridden by 'hw.physmem.start' tunable from loader.
Note Linux used to have a BIOS quirk table for this issue but it seems they
made it default recently.


231559 12-Feb-2012 rmh

Move WITHOUT_SOURCELESS_* files to sys/conf/ in order to avoid "universe"
target processing them as if they were standalone kernel config files.

Approved by: kib (mentor)
MFC after: 5 days


231441 10-Feb-2012 kib

In cpu_set_user_tls(), consistently set PCB_FULL_IRET pcb flag for
both 64bit and 32bit binaries, not for 64bit only.

The set of the flag is not neccessary there, because the only current
user of the cpu_set_user_tls() is create_thread(), which calls
cpu_set_upcall() before and cpu_set_upcall() itself sets PCB_FULL_IRET.
Change the function for consistency and preserve existing KPI for now.

MFC after: 1 week


231227 08-Feb-2012 jkim

Reset clock after atrtc(4) is properly resumed.


231169 07-Feb-2012 jkim

Do not EOI local APIC too early. Just do doreti normally after resuming.


230980 04-Feb-2012 rmh

Add "nodevice adw" to WITHOUT_SOURCELESS_UCODE.

Approved by: kib (mentor)
MFC after: 13 days


230972 04-Feb-2012 rmh

Add MK_SOURCELESS build option. Setting MK_SOURCELESS to "no" will disable
kernel modules that include binary-only code.

More fine-grained control is provided via MK_SOURCELESS_HOST (for native code
that runs on host CPU) and MK_SOURCELESS_UCODE (for microcode).

Reviewed by: julian, delphij, freebsd-arch
Approved by: kib (mentor)
MFC after: 2 weeks


230958 03-Feb-2012 jkim

Restore callee saved registers later and micro-optimize.


230957 03-Feb-2012 jkim

Fix a function prototype to reflect reality. No functional change.


230843 31-Jan-2012 jimharris

Add isci(4) driver for amd64 and i386 targets.

The isci driver is for the integrated SAS controller in the Intel C600
(Patsburg) chipset. Source files in sys/dev/isci directory are
FreeBSD-specific, and sys/dev/isci/scil subdirectory contains
an OS-agnostic library (SCIL) published by Intel to control the SAS
controller. This library is used primarily as-is in this driver, with
some post-processing to better integrate into the kernel build
environment.

isci.4 and a README in the sys/dev/isci directory contain a few
additional details.

This driver is only built for amd64 and i386 targets.

Sponsored by: Intel
Reviewed by: scottl
Approved by: scottl


230830 31-Jan-2012 jkim

- Restore XCR0 before restoring extended FPU states.
- Update my copyright dates.

Reviewed by: kib


230777 30-Jan-2012 jkim

Naturally align a newly added wakeup_fpusave.


230766 30-Jan-2012 kib

Move xrstor/xsave/xsetbv into fpu.c and reorder them.

Requested by: bde
MFC after: 1 month


230765 30-Jan-2012 kib

Synchronize the struct sigcontext definitions on x86 with mcontext_t.

Pointed out by: bde
MFC after: 1 month


230623 27-Jan-2012 kmacy

exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create

Reviewed by: alc, avg
MFC after: 2 weeks


230538 25-Jan-2012 kib

Order newly added functions alphabetically.

Requested by: bde
MFC after: 3 days


230475 23-Jan-2012 das

Add C11 macros describing subnormal numbers to float.h.

Reviewed by: bde


230426 21-Jan-2012 kib

Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs. As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
and the required size of FPU save area. The hw.use_xsave tunable is
provided for disabling XSAVE, and hw.xsave_mask may be used to
select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
(run-time sized) user save area on the top of the kernel stack,
right above the PCB. Reorganize the thread0 PCB initialization to
postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
well. FPU state is only useful for suspend, where it is saved in
dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
mcontext_t has a valid pointer to out-of-struct extended FPU
state. Signal handlers are supplied with stack-allocated fpu
state. The sigreturn(2) and setcontext(2) syscall honour the flag,
allowing the signal handlers to inspect and manipilate extended
state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
place in the fixed-sized mcontext_t to place variable-sized save
area. And, since mcontext_t is embedded into ucontext_t, makes it
impossible to fix in a reasonable way. Instead of extending
getcontext(2) syscall, provide a sysarch(2) facility to query
extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
consumers, making it opaque. Internally, struct fpu_kern_ctx now
contains a space for the extended state. Convert in-kernel consumers
of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after: 1 month


230270 17-Jan-2012 kib

Add definitions for the FPU extended state header, legacy extended
state and AVX state.

MFC after: 1 week


230269 17-Jan-2012 kib

Modernize the fpusave structures definitions by using uint*_t types.

MFC after: 1 week


230262 17-Jan-2012 kib

Implement xsetbv(), xsave() and xrstor() providing C access to the
similarly named CPU instructions.

Since our in-tree binutils gas is not aware of the instructions, and
I have to use the byte-sequence to encode them, hardcode the r/m operand
as (%rdi). This way, first argument of the pseudo-function is already
placed into proper register.

MFC after: 1 week


230261 17-Jan-2012 kib

Add definitions related to XCR0.

MFC after: 1 week


230260 17-Jan-2012 kib

Add macro IS_BSP() to check whether the current CPU is BSP.

MFC after: 1 week


230132 15-Jan-2012 uqs

Convert files to UTF-8


229997 12-Jan-2012 ken

Add the CAM Target Layer (CTL).

CTL is a disk and processor device emulation subsystem originally written
for Copan Systems under Linux starting in 2003. It has been shipping in
Copan (now SGI) products since 2005.

It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
(who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
available under a BSD-style license. The intent behind the agreement was
that Spectra would work to get CTL into the FreeBSD tree.

Some CTL features:

- Disk and processor device emulation.
- Tagged queueing
- SCSI task attribute support (ordered, head of queue, simple tags)
- SCSI implicit command ordering support. (e.g. if a read follows a mode
select, the read will be blocked until the mode select completes.)
- Full task management support (abort, LUN reset, target reset, etc.)
- Support for multiple ports
- Support for multiple simultaneous initiators
- Support for multiple simultaneous backing stores
- Persistent reservation support
- Mode sense/select support
- Error injection support
- High Availability support (1)
- All I/O handled in-kernel, no userland context switch overhead.

(1) HA Support is just an API stub, and needs much more to be fully
functional.

ctl.c: The core of CTL. Command handlers and processing,
character driver, and HA support are here.

ctl.h: Basic function declarations and data structures.

ctl_backend.c,
ctl_backend.h: The basic CTL backend API.

ctl_backend_block.c,
ctl_backend_block.h: The block and file backend. This allows for using
a disk or a file as the backing store for a LUN.
Multiple threads are started to do I/O to the
backing device, primarily because the VFS API
requires that to get any concurrency.

ctl_backend_ramdisk.c: A "fake" ramdisk backend. It only allocates a
small amount of memory to act as a source and sink
for reads and writes from an initiator. Therefore
it cannot be used for any real data, but it can be
used to test for throughput. It can also be used
to test initiators' support for extremely large LUNs.

ctl_cmd_table.c: This is a table with all 256 possible SCSI opcodes,
and command handler functions defined for supported
opcodes.

ctl_debug.h: Debugging support.

ctl_error.c,
ctl_error.h: CTL-specific wrappers around the CAM sense building
functions.

ctl_frontend.c,
ctl_frontend.h: These files define the basic CTL frontend port API.

ctl_frontend_cam_sim.c: This is a CTL frontend port that is also a CAM SIM.
This frontend allows for using CTL without any
target-capable hardware. So any LUNs you create in
CTL are visible in CAM via this port.

ctl_frontend_internal.c,
ctl_frontend_internal.h:
This is a frontend port written for Copan to do
some system-specific tasks that required sending
commands into CTL from inside the kernel. This
isn't entirely relevant to FreeBSD in general,
but can perhaps be repurposed.

ctl_ha.h: This is a stubbed-out High Availability API. Much
more is needed for full HA support. See the
comments in the header and the description of what
is needed in the README.ctl.txt file for more
details.

ctl_io.h: This defines most of the core CTL I/O structures.
union ctl_io is conceptually very similar to CAM's
union ccb.

ctl_ioctl.h: This defines all ioctls available through the CTL
character device, and the data structures needed
for those ioctls.

ctl_mem_pool.c,
ctl_mem_pool.h: Generic memory pool implementation used by the
internal frontend.

ctl_private.h: Private data structres (e.g. CTL softc) and
function prototypes. This also includes the SCSI
vendor and product names used by CTL.

ctl_scsi_all.c,
ctl_scsi_all.h: CTL wrappers around CAM sense printing functions.

ctl_ser_table.c: Command serialization table. This defines what
happens when one type of command is followed by
another type of command.

ctl_util.c,
ctl_util.h: CTL utility functions, primarily designed to be
used from userland. See ctladm for the primary
consumer of these functions. These include CDB
building functions.

scsi_ctl.c: CAM target peripheral driver and CTL frontend port.
This is the path into CTL for commands from
target-capable hardware/SIMs.

README.ctl.txt: CTL code features, roadmap, to-do list.

usr.sbin/Makefile: Add ctladm.

ctladm/Makefile,
ctladm/ctladm.8,
ctladm/ctladm.c,
ctladm/ctladm.h,
ctladm/util.c: ctladm(8) is the CTL management utility.
It fills a role similar to camcontrol(8).
It allow configuring LUNs, issuing commands,
injecting errors and various other control
functions.

usr.bin/Makefile: Add ctlstat.

ctlstat/Makefile
ctlstat/ctlstat.8,
ctlstat/ctlstat.c: ctlstat(8) fills a role similar to iostat(8).
It reports I/O statistics for CTL.

sys/conf/files: Add CTL files.

sys/conf/NOTES: Add device ctl.

sys/cam/scsi_all.h: To conform to more recent specs, the inquiry CDB
length field is now 2 bytes long.

Add several mode page definitions for CTL.

sys/cam/scsi_all.c: Handle the new 2 byte inquiry length.

sys/dev/ciss/ciss.c,
sys/dev/ata/atapi-cam.c,
sys/cam/scsi/scsi_targ_bh.c,
scsi_target/scsi_cmds.c,
mlxcontrol/interface.c: Update for 2 byte inquiry length field.

scsi_da.h: Add versions of the format and rigid disk pages
that are in a more reasonable format for CTL.

amd64/conf/GENERIC,
i386/conf/GENERIC,
ia64/conf/GENERIC,
sparc64/conf/GENERIC: Add device ctl.

i386/conf/PAE: The CTL frontend SIM at least does not compile
cleanly on PAE.

Sponsored by: Copan Systems, SGI and Spectra Logic
MFC after: 1 month


229085 31-Dec-2011 gavin

Default to not performing the early-boot memory tests when we detect we
are booting inside a VM. There are three reasons to disable this:

o It causes the VM host to believe that all the tested pages or RAM are
in use. This in turn may force the host to page out pages of RAM
belonging to other VMs, or otherwise cause problems with fair resource
sharing on the VM cluster.
o It adds significant time to the boot process (around 1 second/Gig in
testing)
o It is unnecessary - the host should have already verified that the
memory is functional etc.

Note that this simply changes the default when in a VM - it can still be
overridden using the hw.memtest.tests tunable.

MFC after: 4 weeks


228973 29-Dec-2011 rwatson

Add "options CAPABILITY_MODE" and "options CAPABILITIES" to GENERIC kernel
configurations for various architectures in FreeBSD 10.x. This allows
basic Capsicum functionality to be used in the default FreeBSD
configuration on non-embedded architectures; process descriptors are not
yet enabled by default.

MFC after: 3 months
Sponsored by: Google, Inc


228958 29-Dec-2011 jhb

Regen.


228957 29-Dec-2011 jhb

Implement linux_fadvise64() and linux_fadvise64_64() using
kern_posix_fadvise().

Reviewed by: silence on emulation@
MFC after: 2 weeks


228940 28-Dec-2011 delphij

Import the first release of HighPoint RocketRAID 27xx SAS 6Gb/s HBA card
driver. This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms.

Many thanks to HighPoint for providing this driver.

MFC after: 2 weeks


228935 28-Dec-2011 alc

Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold():
If the page lock acquisition is retried, then the underlying thread is
not unpinned.

Wrap nearby lines that exceed 80 columns.


228724 20-Dec-2011 delphij

Add comments in NOTES to say what viawd is.


228469 13-Dec-2011 ed

Replace __signed by signed.

The signed keyword is an integral part of the C syntax. There's no need
to use __signed.


228431 12-Dec-2011 fabient

Add watchdog support for VIA south bridge chipset.
Tested on VT8251, VX900 but CX700, VX800, VX855 should works.

MFC after: 1 month
Sponsored by: NETASQ


228085 28-Nov-2011 philip

Limit building sfxge(4) in-kernel to amd64 for the time being. We can put it
back after I fix the breakages on some of our more exotic platforms.

While here, add the driver to the amd64 NOTES, so it can be picked up in LINT
builds.


227843 22-Nov-2011 marius

- There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.


227776 21-Nov-2011 lstewart

- Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate()
system calls to provide feed-forward clock management capabilities to
userspace processes. ffclock_getcounter() returns the current value of the
kernel's feed-forward clock counter. ffclock_getestimate() returns the current
feed-forward clock parameter estimates and ffclock_setestimate() updates the
feed-forward clock parameter estimates.

- Document the syscalls in the ffclock.2 man page.

- Regenerate the script-derived syscall related files.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by: Julien Ridoux (jridoux at unimelb edu au)


227759 20-Nov-2011 attilio

Revert part of the r227758 which crept in.

Pointy hat: attilio
X-MFC: r227758


227758 20-Nov-2011 attilio

Introduce macro stubs in the mutex implementation that will be always
defined and will allow consumers, willing to provide options, file and
line to locking requests, to not worry about options redefining the
interfaces.
This is typically useful when there is the need to build another
locking interface on top of the mutex one.

The introduced functions that consumers can use are:
- mtx_lock_flags_
- mtx_unlock_flags_
- mtx_lock_spin_flags_
- mtx_unlock_spin_flags_
- mtx_assert_
- thread_lock_flags_

Spare notes:
- Likely we can get rid of all the 'INVARIANTS' specification in the
ppbus code by using the same macro as done in this patch (but this is
left to the ppbus maintainer)
- all the other locking interfaces may require a similar cleanup, where
the most notable case is sx which will allow a further cleanup of
vm_map locking facilities
- The patch should be fully compatible with older branches, thus a MFC
is previewed (infact it uses all the underlying mechanisms already
present).

Comments review by: eadler, Ben Kaduk
Discussed with: kib, jhb
MFC after: 1 month


227694 19-Nov-2011 ed

Regenerate system call tables.


227693 19-Nov-2011 ed

Make the Linux *at() calls a bit more complete.

Properly support:

- AT_EACCESS for faccessat(),
- AT_SYMLINK_FOLLOW for linkat().


227692 19-Nov-2011 ed

Regenerate system call tables.


227691 19-Nov-2011 ed

Improve *access*() parameter name consistency.

The current code mixes the use of `flags' and `mode'. This is a bit
confusing, since the faccessat() function as a `flag' parameter to store
the AT_ flag.

Make this less confusing by using the same name as used in the POSIX
specification -- `amode'.


227474 12-Nov-2011 theraven

Fix SIGATOMIC_M{IN,AX} on x86-64. These are meant to be the minimum values that are allowed in a sig_atomic_t, but it looks like they were just copied from the x86 versions, so these definitions violate the C and C++ specs. Mismatch was spotted by the libc++ test suite.

Approved by: dim (mentor)


227442 11-Nov-2011 kib

Weaken the part of assertions added in the r227394. Only check that the
process state is stopped.

MFC after: 1 week


227441 11-Nov-2011 rstone

Correct the types of the arguments to return probes of the syscall
provider. Previously we were erroneously supplying the argument types of
the corresponding entry probe.

Reviewed by: rpaulo
MFC after: 1 week


227399 09-Nov-2011 kib

Attempt to improve formatting and content of several comments for
amd64 and i386 MD code.

Based on suggestions by: bde
MFC after: 1 week


227394 09-Nov-2011 kib

Stopped process may legitimately have some threads sleeping and not
suspended, if the sleep is uninterruptible.

Reported and tested by: pho
MFC after: 1 week


227333 08-Nov-2011 attilio

Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by: gianni
Reviewed by: kib


227332 08-Nov-2011 kevlo

Enable PCI MMC/SD support by default on i386 and amd64


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227290 07-Nov-2011 rstone

Fix the DTrace pid return trap interrupt vector. Previously we were using
31, but that vector is reserved.

Without this fix, running dtrace -p <pid> would either cause the target
process to crash or the kernel to page fault.

Obtained from: rpaulo
MFC after: 3days


227006 01-Nov-2011 marius

Add a PCI front-end to esp(4) allowing it to support AMD Am53C974 and
replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel
configuration files. Besides duplicating functionality, amd(4), which
previously also supported the AMD Am53C974, unlike esp(4) is no longer
maintained and has accumulated enough bit rot over time to always cause
a panic during boot as long as at least one target is attached to it
(see PR 124667).

PR: 124667
Obtained from: NetBSD (based on)
MFC after: 3 days


226925 30-Oct-2011 marcel

Revert rev. 226893: subr_syscall.c is being included from C files and
on amd64 with FREEBSD32 enabled, this means that systrace_probe_func
gets defined twice.


226893 29-Oct-2011 marcel

Define systrace_probe_func in subr_syscall.c where it's used, instead
of defining it in MD code. This eliminates porting to other architectures.


226843 27-Oct-2011 alc

Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc(). While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.


226835 27-Oct-2011 kensmith

Adjust the debugger options slightly. This should help me do the right
thing when changing the debugging options as part of head becoming a new
stable branch. It may also help people who for one reason or another want
to run head but don't want it slowed down by the debugging support.

Reviewed by: kib


226607 21-Oct-2011 das

People porting FreeBSD to new architectures ought not have to
implement a deprecated FPU control interface in addition to the
standard one. To make this clearer, further deprecate ieeefp.h
by not declaring the function prototypes except on architectures
that implement them already.

Currently i386 and amd64 implement the ieeefp.h interface for
compatibility, and for fp[gs]etprec(), which doesn't exist on
most other hardware. Powerpc, sparc64, and ia64 partially implement
it and probably shouldn't, and other architectures don't implement it
at all.


226547 19-Oct-2011 kensmith

Add a warning about why sbp(4) is commented out so that curious folks
are forewarned they might wind up with a hole in their foot if they
decide to give it a try.

Suggested by: dougb


226510 18-Oct-2011 kensmith

Comment out the sbp(4) driver for architectures that support it.

As part of the 8.0-RELEASE cycle this was done in stable/8 (r199112)
but was left alone in head so people could work on fixing an issue that
caused boot failure on some motherboards. Apparently nobody has worked
on it and we are getting reports of boot failure with the 9.0 test builds.
So this time I'll comment out the driver in head (still hoping someone
will work on it) and MFC to stable/9.

Submitted by: Alberto Villa <avilla at FreeBSD dot org>


226498 18-Oct-2011 des

Trace attempts to call restricted MD syscalls.


226112 07-Oct-2011 kib

Remove unused define.

MFC after: 1 month


226026 04-Oct-2011 delphij

Add the 9750 SATA+SAS 6Gb/s RAID controller card driver, tws(4). Many
thanks for their contiued support to FreeBSD.

This is version 10.80.00.003 from codeset 10.2.1 [1]

Obtained from: LSI http://kb.lsi.com/Download16574.aspx [1]


225943 03-Oct-2011 kib

Do not allow the kernel to access usermode pages without installed
fault handler. Panic immediately in such situation, on i386 and amd64.

Reviewed by: avg, jhb
MFC after: 1 week


225936 03-Oct-2011 attilio

Add some improvements in the idle table callbacks:
- Replace instances of manual assembly instruction "hlt" call
with halt() function calling.
- In cpu_idle_mwait() avoid races in check to sched_runnable() using
the same pattern used in cpu_idle_hlt() with the 'hlt' instruction.
- Add comments explaining the logic behind the pattern used in
cpu_idle_hlt() and other idle callbacks.

In collabouration with: jhb, mav
Reviewed by: adri, kib
MFC after: 3 weeks


225618 16-Sep-2011 kmacy

Auto-generated code from sys_ prefixing makesyscalls.sh change

Approved by: re(bz)


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


225576 15-Sep-2011 kib

Put amd64_syscall() prototype in md_var.h.

Requested by: jhb
Reviewed by: alc, jhb
Approved by: re (bz)
MFC after: 2 weeks


225575 15-Sep-2011 kib

Microoptimize the return path for the fast syscalls on amd64. Arrange
the code to have the fall-through path to follow the likely target.
Do not use intermediate register to reload user %rsp.

Proposed by: alc
Reviewed by: alc, jhb
Approved by: re (bz)
MFC after: 2 weeks


225483 11-Sep-2011 kib

The jump target shall be after the padding, not into it.

Reported by: alc
Approved by: re (bz)
MFC after: 2 weeks


225482 11-Sep-2011 brueffer

Fix a zyd(4) comment typo that was copy+pasted into most kernel config files.

PR: 160276
Submitted by: MATSUMIYA Ryo <matsumiya@mma.club.uec.ac.jp>
Approved by: re (kib)
MFC after: 1 week


225475 11-Sep-2011 kib

Perform amd64-specific microoptimizations for native syscall entry
sequence. The effect is ~1% on the microbenchmark.

In particular, do not restore registers which are preserved by the
C calling sequence. Align the jump target. Avoid unneeded memory
accesses by calculating some data in syscall entry trampoline.

Reviewed by: jhb
Approved by: re (bz)
MFC after: 2 weeks


225474 11-Sep-2011 kib

Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by: jhb
Approved by: re (bz)
MFC after: 2 weeks


225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


225201 26-Aug-2011 jhb

Enable the puc(4) driver on amd64 and i386 in GENERIC. This allows
devices supported by puc(4) to work "out of the box" since puc.ko does
not work "out of the box".

Reviewed by: marcel
Approved by: re (kib)
MFC after: 1 week


225194 26-Aug-2011 jhb

Make NKPT a kernel option on amd64 so that it can be set to a non-default
value from kernel config files.

Reviewed by: alc
Approved by: re (kib)
MFC after: 1 week


225048 20-Aug-2011 bz

In HEAD when doing no further checkes there is no reason use the
temporary variable and check with if as TUNABLE_*_FETCH do not
alter values unless successfully found the tunable.

Reported by: jhb, bde
MFC after: 3 days
X-MFC with: r224516
Approved by: re (kib)


224778 11-Aug-2011 rwatson

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


224746 09-Aug-2011 kib

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


224699 07-Aug-2011 rmacklem

Change all the sample kernel configurations to use
NFSCL, NFSD instead of NFSCLIENT, NFSSERVER since
NFSCL and NFSD are now the defaults. The client change is
needed for diskless configurations, so that the root
mount works for fstype nfs.
Reported by seanbru at yahoo-inc.com for i386/XEN.

Approved by: re (hrs)


224516 30-Jul-2011 bz

Introduce a tunable to disable the time consuming parts of bootup
memtesting, which can easily save seconds to minutes of boot time.
The tunable name is kept general to allow reusing the code in
alternate frameworks.

Requested by: many
Discussed on: arch (a while a go)
Obtained from: Sandvine Incorporated
Reviewed by: sbruno
Approved by: re (kib)
MFC after: 2 weeks


224217 19-Jul-2011 attilio

Bump MAXCPU for amd64, ia64 and XLP mips appropriately.
From now on, default values for FreeBSD will be 64 maxiumum supported
CPUs on amd64 and ia64 and 128 for XLP. All the other architectures
seem already capped appropriately (with the exception of sparc64 which
needs further support on jalapeno flavour).

Bump __FreeBSD_version in order to reflect KBI/KPI brekage introduced
during the infrastructure cleanup for supporting MAXCPU > 32. This
covers cpumask_t retiral too.

The switch is considered completed at the present time, so for whatever
bug you may experience that is reconducible to that area, please report
immediately.

Requested by: marcel, jchandra
Tested by: pluknet, sbruno
Approved by: re (kib)


224207 19-Jul-2011 attilio

Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by: pluknet
Approved by: re (kib)


224187 18-Jul-2011 attilio

- Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
This area can be widely improved by applying the same to other
architectures and likely finding an unified approach among them and
move the whole code to be MI. More work in this area is expected to
happen fairly soon.

No MFC is previewed for this patch.

Tested by: pluknet
Reviewed by: jhb
Approved by: re (kib)


223796 05-Jul-2011 jkim

Correct cpu_monitor() and cpu_mwait() for amd64. These instructions take
%rcx as "extensions" in long mode. If any unused bit is set in %rcx, these
instructions cause general protection fault. Fix style nits and synchronize
i386 with amd64.


223758 04-Jul-2011 attilio

With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by: marcel, marius, alc
Tested by: pluknet
MD testing by: marcel, marius, gonzo, andreast


223732 02-Jul-2011 alc

When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by: kib


223692 30-Jun-2011 jonathan

Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.

Approved by: mentor(rwatson), re(bz)


223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


223668 29-Jun-2011 jonathan

We may split today's CAPABILITIES into CAPABILITY_MODE (which has
to do with global namespaces) and CAPABILITIES (which has to do with
constraining file descriptors). Just in case, and because it's a better
name anyway, let's move CAPABILITIES out of the way.

Also, change opt_capabilities.h to opt_capsicum.h; for now, this will
only hold CAPABILITY_MODE, but it will probably also hold the new
CAPABILITIES (implying constrained file descriptors) in the future.

Approved by: rwatson
Sponsored by: Google UK Ltd


223440 22-Jun-2011 jhb

Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to
the x86 tree. The $PIR code is still only enabled on i386 and not amd64.
While here, make the qpi(4) driver on conditional on 'device pci'.


223433 22-Jun-2011 jhb

Oops, missed these in 223424.

Reported by: jkim


223428 22-Jun-2011 jhb

Use uintXX_t instead of u_intXX_t.


223424 22-Jun-2011 jhb

Add a helper routine to conditionally modify the start address of a
resource allocation from an x86 Host-PCI bridge driver so that it can be
reused by the ACPI Host-PCI bridge driver (and eventually the MPTable
Host-PCI bridge driver) instead of duplicating the same logic. Note that
this means that hw.acpi.host_mem_start is now replaced with the
hw.pci.host_mem_start tunable that was already used in the non-ACPI case.
This also removes hw.acpi.host_mem_start on ia64 where it was not
applicable (the implementation was very x86-specific).

While here, adjust the logic to apply the new start address on any
"wildcard" allocation even if that allocation comes from a subset of
the allowable address range.

Reviewed by: imp (1)


223254 18-Jun-2011 kib

Fix vfork. Add comments.


223098 14-Jun-2011 hselasky

Enable USB 3.0 support by default in i386 and amd64 GENERIC kernels.

Discussed with: joel @ and thompsa @
MFC after: 7 days


222980 11-Jun-2011 joel

Enable sound support by default on i386 and amd64.

The generic sound driver has been added, along with enough
device-specific drivers to support the most common audio
chipsets.

We've discussed enabling it from time to time over the years
and we've received numerous requests from users, so we decided
that shipping 9.0 with working audio by default would be the
best thing to do.

Bug reports should be sent to the multimedia@ mailing list, as
usual.

Approved by: mav
No objection: re


222929 10-Jun-2011 jhb

Implement BUS_ADJUST_RESOURCE() for the x86 drivers that sit between the
Host-PCI bridge drivers and nexus.


222853 08-Jun-2011 avg

remove code for dynamic offlining/onlining of CPUs on x86

The code has definitely been broken for SCHED_ULE, which is a default
scheduler. It may have been broken for SCHED_4BSD in more subtle ways,
e.g. with manually configured CPU affinities and for interrupt devilery
purposes.
We still provide a way to disable individual CPUs or all hyperthreading
"twin" CPUs before SMP startup. See the UPDATING entry for details.

Interaction between building CPU topology and disabling CPUs still
remains fuzzy: topology is first built using all availble CPUs and then
the disabled CPUs should be "subtracted" from it. That doesn't work
well if the resulting topology becomes non-uniform.

This work is done in cooperation with Attilio Rao who in addition to
reviewing also provided parts of code.

PR: kern/145385
Discussed with: gcooper, ambrisko, mdf, sbruno
Reviewed by: attilio
Tested by: pho, pluknet
X-MFC after: never


222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


222756 06-Jun-2011 avg

don't use cpuid level 4 in x86 cpu topology detection if it's not supported

This regression was introduced in r213323.
There are probably no Intel cpus that support amd64 mode, but do not
support cpuid level 4, but it's better to keep i386 and amd64 versions
of this code in sync.

Discovered by: pho
Tested by: pho
MFC after: 2 weeks


222282 25-May-2011 kevlo

Bring back r222275. runfw(4) will statically link in rt2870.fw.uu
to the kernel, though I have MODULES_OVERRIDE="" in GENERIC.

Spotted by: thompsa


222275 25-May-2011 kevlo

run(4) needs firmware loaded to work


222043 17-May-2011 jkim

Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features.
Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel
AVX instructions. The SSE5 bit was recycled as XOP extended instruction
bit, CVT16 was deprecated in favor of F16C (half-precision float conversion
instructions for AVX), and the remaining FMA4 (4-operand FMA instructions)
gained a separate CPUID bit. Replace non-existent references with today's
CPUID specifications.


221855 13-May-2011 mdf

Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk. Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file. (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by: alc
MFC after: 1 week
MFC with: r221853


221784 11-May-2011 dchagin

Remove wrong comment.

MFC after: 1 week.


221743 10-May-2011 jkim

Add SC_PIXEL_MODE to GENERIC for amd64 and i386.

Requested by: many


221703 09-May-2011 jkim

Implement boot-time TSC synchronization test for SMP. This test is executed
when the user has indicated that the system has synchronized TSCs or it has
P-state invariant TSCs. For the former case, we may clear the tunable if it
fails the test to prevent accidental foot-shooting. For the latter case, we
may set it if it passes the test to notify the user that it may be usable.


221527 06-May-2011 avg

prepare code that does topology detection for amd cpus for bulldozer

This also introduces a new detection path for family 10h and newer
pre-bulldozer cpus, pre-10h hardware should not be affected.

Tested by: Gary Jennejohn <gljennjohn@googlemail.com>
(with pre-10h hardware)
MFC after: 2 weeks


221394 03-May-2011 jhb

Enable the new PCI-PCI bridge driver on amd64 and i386 by default. It can
be disabled via 'nooptions NEW_PCIB'.


221393 03-May-2011 jhb

Reimplement how PCI-PCI bridges manage their I/O windows. Previously the
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests. Now the
driver actively manages the I/O windows.

This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices. The
suballocations are managed by creating an rman for each I/O window. The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus. Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus. If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.

When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free. From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed. The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.

Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows). If
that fails, the request is passed up to the parent PCI bus directly
however.

The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window. This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.

The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.

The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled. This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now. Once all platforms support the new driver, the
kernel option and old driver will be removed.

PR: kern/143874 kern/149306
Tested by: mav


221324 02-May-2011 jhb

Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver,
generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge
drivers.


221296 01-May-2011 bschmidt

Add the remaining wireless drivers.

Discussed with: joel


221200 29-Apr-2011 kevlo

Add urtw(4)


221188 28-Apr-2011 jkim

Define "Hypervisor Present" bit. This bit is used by several hypervisors to
identify CPUs running under emulation. Currently QEMU-KVM, Xen-HVM, VMware,
and MS Hyper-V are known to set this bit.

MFC after: 3 days


221173 28-Apr-2011 attilio

Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by: Sandvine Incorporated
Discussed with: emaste, des
MFC after: 2 weeks


221124 27-Apr-2011 rmacklem

This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.


221071 26-Apr-2011 mav

- Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
- To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
- Add some more details to UPDATING.


221069 26-Apr-2011 sobomax

With the typical memory size of the system in tenth of gigabytes
counting memory being dumped in 16MB increments is somewhat silly.
Especially if the dump fails and everything you've got for debugging
is screen filled with numbers in 16 decrements... Replace that with
percentage-based progress with max 10 updates all fitting into one
line.

Collapse other very "useful" piece of crash information (total ram) into
the same line to save some more space.

MFC after: 1 week


221032 25-Apr-2011 rmacklem

Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.

Reviewed by: jhb
MFC after: 2 weeks


220982 24-Apr-2011 mav

Switch the GENERIC kernels for all architectures to the new CAM-based ATA
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).

ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.


220803 18-Apr-2011 kib

Make pmap_invalidate_cache_range() available for consumption on amd64.

Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


220631 14-Apr-2011 jkim

Add a function rdtsc32() to read lower 32 bits from TSC and discard upper
32 bits. Some times compiler inserts unnecessary instructions to preserve
unused upper 32 bits even when it is casted to a 32-bit value. It reduces
such compiler mistakes where every cycle counts.


220629 14-Apr-2011 jkim

Consistently use __volatile as the rest of this file.


220628 14-Apr-2011 jkim

Prefer C99 standard integers to reduce diff from i386 version.


220584 13-Apr-2011 jkim

Reduce errors in effective frequency calculation.


220583 12-Apr-2011 jkim

Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and
MPERF MSRs are available. It was disabled in r216443. Remove the earlier
hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little
bit more reliable now.


220580 12-Apr-2011 jkim

Add forgotten declarations for tsc_perf_stat from the previous commit.


220579 12-Apr-2011 jkim

Probe capability to find effective frequency. When the TSC is P-state
invariant, APERF/MPERF ratio can be used to find effective frequency.


220578 12-Apr-2011 jkim

Add definitions for CPUID instruction 6, ECX information.


220461 08-Apr-2011 kib

Remove setting of PCB_FULL_IRET at the places where we are going to call
update_gdt_{f,g}sbase. The functions set the flag when td == curthread,
and sysarch is always called with curthread.

Reviewed by: jhb, jkim
MFC after: 1 week


220460 08-Apr-2011 kib

Disable local interrupts before testing the PCB_FULL_IRET flag.
Thread might be preempted after testing, which causes the flag to be
cleared. If ast was not delivered, we will do sysret with potentially
wrong fs/gs bases.

Reviewed by: jhb, jkim
MFC after: 1 week (together with r220430, r220452)


220453 08-Apr-2011 rstone

Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi
and machdep.kdb_on_nmi.

Approved by: emaste (mentor)
MFC after: 1 week


220452 08-Apr-2011 jhb

Fix a bug in the previous change to restore the fast path for syscall
return. The ast() function may cause a context switch in which case
PCB_FULL_IRET would be set in the pcb. However, the code was not
rechecking the flag after ast() returned and would not properly restore
the FSBASE and GSBASE MSRs. To fix, recheck the PCB_FULL_IRET flag after
ast() returns.

While here, trim an instruction (and memory access) from the doreti path
and fix a typo in a comment.

MFC after: 1 week


220451 08-Apr-2011 jhb

Catch up to PCB_FULL_IRET becoming a pcb flag rather than a full field.

MFC after: 3 days


220433 07-Apr-2011 jkim

Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now. More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly). Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).


220431 07-Apr-2011 jhb

pcb_flags is an int, so use testl rather than testq.

Pointy hat to: jhb
Submitted by: jkim
MFC after: 1 week


220430 07-Apr-2011 jhb

If a system call does not request a full interrupt return, use a fast
path via the sysretq instruction to return from the system call. This was
removed in 190620 and not quite fully restored in 195486. This resolves
most of the performance regression in system call microbenchmarks between
7 and 8 on amd64.

Reviewed by: kib
MFC after: 1 week


220429 07-Apr-2011 jkim

Remove stale checks for RDTSC support. amd64 must have TSC support anyway.


220238 01-Apr-2011 kib

Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.

In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
on amd64.

The changes are hidden under COMPAT_43.

MFC after: 1 month


220186 31-Mar-2011 avg

Revert r220032:linux compat: add SO_PASSCRED option with basic handling

I have not properly thought through the commit. After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred. And that is not implemented yet.

Pointyhat to: avg


220185 31-Mar-2011 adrian

Break out the ath PCI logic into a separate device/module.

Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.

Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.


220143 29-Mar-2011 trasz

Revert part of r220137, committed by mistake - RACCT is _not_ supposed
to be enabled in GENERIC.


220137 29-Mar-2011 trasz

Add racct. It's an API to keep per-process, per-jail, per-loginclass
and per-loginclass resource accounting information, to be used by the new
resource limits code. It's connected to the build, but the code that
actually calls the new functions will come later.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


220090 28-Mar-2011 alc

The new binutils has correctly redefined MAXPAGESIZE on amd64 as 0x200000
instead of 0x100000. As a side effect, an amd64 kernel now loads at
physical address 0x200000 instead of 0x100000. This is probably for the
best because it avoids the use of a 2MB page mapping for the first 1MB of
the kernel that also spans the fixed MTRRs. However, getmemsize() still
thinks that the kernel loads at 0x100000, and so the physical memory between
0x100000 and 0x200000 is lost. Fix this problem by replacing the hard-wired
constant in getmemsize() by a symbol "kernphys" that is defined by the
linker script.

In collaboration with: kib


220058 27-Mar-2011 alc

Amd64 doesn't have a lazypmap ipi.


220032 26-Mar-2011 avg

linux compat: add SO_PASSCRED option with basic handling

This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by: dchagin, nox
MFC after: 2 weeks


220030 26-Mar-2011 avg

linux compat: add non-dummy capget and capset system calls, regenerate

And drop dummy definitions for those system calls.
This may transiently break the build.

PR: kern/149168
Submitted by: John Wehle <john@feith.com>
Reviewed by: netchild
MFC after: 2 weeks


220028 26-Mar-2011 avg

linux compat: add non-dummy capget and capset system calls

PR: kern/149168
Submitted by: John Wehle <john@feith.com>
Reviewed by: netchild
MFC after: 2 weeks


220026 26-Mar-2011 dchagin

Export the correct AT_PLATFORM value.
Since signal trampolines are copied to the shared page do not need to
leave place on the stack for it. Forgotten in the previous commit.

MFC after: 1 Week


220021 26-Mar-2011 alc

Move an external declaration to the appropriate header file.


220018 26-Mar-2011 jkim

Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta
CPUs. These CPUs need explicit MSR configuration to expose ceratin CPU
capabilities (e.g., CMPXCHG8B) to work around compatibility issues with
ancient software. Unfortunately, Rise mP6 does not set the CX8 bit in CPUID
and there is no MSR to expose the feature although all mP6 processors are
capable of CMPXCHG8B according to datasheets I found from the Net. Clean up
and simplify VIA PadLock detection while I am in the neighborhood.


219819 21-Mar-2011 jeff

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


219775 19-Mar-2011 bz

For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it. LINT will
still build it.

While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases. In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.

Discussed with: qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR: kern/148018, kern/155604, kern/144917, kern/146792
MFC after: 2 weeks


219673 15-Mar-2011 jkim

Deprecate tsc_present as the last of its real consumers finally disappeared.


219647 14-Mar-2011 davidch

- Initial release of bxe(4) to support Broadcom NetXtreme II 10GbE.
(BCM57710, BCM57711, BCM57711E)

MFC after: One month


219609 13-Mar-2011 dchagin

Enable shared page use for amd64/linux32 and i386/linux binaries.
Move signal trampoline code from the top of the stack to the shared page.

MFC after: 2 Weeks


219560 12-Mar-2011 avg

add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls

Regenerate system call and systrace support files.

PR: kern/152822
Submitted by: Artem Belevich <fbsdlist@src.cx>
Reviewed by: jhb (earlier version)
MFC after: 3 weeks


219559 12-Mar-2011 avg

add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls

This commits makes necessary changes in syscall/sysent generation
infrastructure.

PR: kern/152822
Submitted by: Artem Belevich <fbsdlist@src.cx>
Reviewed by: jhb (ealier version)
MFC after: 3 weeks


219525 11-Mar-2011 avg

amd64/NOTES: use a greater number in KSTACK_PAGES example

This is a minor cosmetic change - the users are more likely to want to
increase (rather than decrease) default kernel stack size,
which is already 4 pages on amd64.

MFC after: 4 days


219523 11-Mar-2011 mdf

Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.


219473 11-Mar-2011 jkim

Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns
off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as
a CPU ticker. Note tsc_present does not change by this tunable.


219468 10-Mar-2011 mdf

Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name. Also consistenly use strlcpy().

Suggested by: Warner Losh


219461 10-Mar-2011 jkim

Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.


219435 09-Mar-2011 julian

Add a small change to the comment in the GENRIC config files that include udbp

Submitted by: Chris Forgron, cforgeron at acsi dot ca
MFC after: 1 week


219405 08-Mar-2011 dchagin

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


219364 07-Mar-2011 dchagin

Remove dead code.

MFC after: 1 Week


219157 02-Mar-2011 alc

Make a change to the implementation of the direct map to improve performance
on processors that support 1 GB pages. Specifically, if the end of physical
memory is not aligned to a 1 GB page boundary, then map the residual
physical memory with multiple 2 MB page mappings rather than a single 1 GB
page mapping. When a 1 GB page mapping is used for this residual memory,
access to the memory is slower than when multiple 2 MB page mappings are
used. (I suspect that the reason for this slowdown is that the TLB is
actually being loaded with 4 KB page mappings for the residual memory.)

X-MFC after: r214425


219134 01-Mar-2011 rwatson

Continue to introduce Capsicum capability mode:

White list sysarch calls allowed in capability mode; arguably, there
should be some link between the capability mode model and the privilege
model here. Sysarch is a morass similar to ioctl, in many senses.

Submitted by: anderson
Discussed with: benl, kris, pjd
Sponsored by: Google, Inc.
Obtained from: Capsicum Project
MFC after: 3 months


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218773 17-Feb-2011 alc

Remove pmap fields that are either unused or not fully implemented.

Discussed with: kib


218744 16-Feb-2011 dchagin

To avoid excessive code duplication create wrapper for fill regs
from stack frame. Change the trap() code to use newly created function
instead of explicit regs assignment.


218720 15-Feb-2011 dchagin

For realtime signals fill the sigval value.


218658 13-Feb-2011 dchagin

Sort include files in the alphabetical order.


218616 12-Feb-2011 dchagin

Move linux_clone(), linux_fork(), linux_vfork() to a MI path.


218613 12-Feb-2011 dchagin

In preparation for moving linux_clone() to a MI path
introduce linux_set_upcall_kse().


218612 12-Feb-2011 dchagin

In preparation for moving linux_clone () to a MI path
move the TLS code in a separate function.

Use function parameter instead of direct using register.


218611 12-Feb-2011 dchagin

Regen for r218610.


218610 12-Feb-2011 dchagin

The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one.


218327 05-Feb-2011 kib

Clear the padding when returning context to the usermode, for
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.

Noted and discussed with: bde
MFC after: 2 weeks


218195 02-Feb-2011 mdf

Put the general logic for being a CPU hog into a new function
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after: 1 week


218103 30-Jan-2011 dchagin

Regen for r218101.

MFC after: 1 Month.


218101 30-Jan-2011 dchagin

Change linux futex syscall definition to match actual linux one.

MFC after: 1 Month.


218100 30-Jan-2011 dchagin

The kern_wait() code already removes the SIGCHLD signal for the waited
process. Removing other SIGCHLD signals is not needed and may cause
problems.

Pointed out by: jilles

MFC after: 1 Month.


218059 29-Jan-2011 dchagin

My style(9) bug.

Pointed out by: kib

MFC after: 1 Month.


218030 28-Jan-2011 dchagin

Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after: 1 Month.


218028 28-Jan-2011 dchagin

To avoid excessive code duplication move struct rusage translation
to a separate function.

MFC after: 1 Month.


217991 27-Jan-2011 kib

linux_sigreturn() loads the struct trapframe from l_sigcontext
members, thus making a signed extension of 32 bit register
context. If the register is not touched in usermode between
return from signal and next syscall entry, the sign-extension
part of 64bit register is not cleared, causing
linux32_fetch_syscall_args() to read wrong values.

Use unsigned type for the registers in the linux sigcontext.

Reported by: Jacob Frelinger <jacob.frelinger duke edu>, arundel
In collaboration with: dchagin
MFC after: 1 week


217896 26-Jan-2011 dchagin

Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after: 1 month


217886 26-Jan-2011 mdf

Set td_kstack_pages for thread0. This was already being done for most
architectures, but i386 and amd64 were missing it.

Submitted by: Mohd Fahadullah <mfahadullah AT isilon DOT com>


217688 21-Jan-2011 pluknet

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


217604 19-Jan-2011 kib

Use CTLFLAG_RDTUN for read-only sysctl that exports tunable.

Reminded by: pjd
MFC after: 6 days


217564 18-Jan-2011 kib

Make the length of the LDT a loader tunable, machdep.max_ldt_segment,
and export it with read-only sysctl. Remove unused defines.

Reviewed by: jhb (previous version)
MFC after: 1 week


217563 18-Jan-2011 kib

Use malloc(9) instead of kmem_alloc(9) for temporal copy of the
user-supplied descriptor array.

Noted and reviewed by: jhb (previous version)
MFC after: 1 week


217543 18-Jan-2011 jhb

- Remove some always-true checks (checking for unsigned < 0).
- Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT
when descriptors are provided. Specifically, allow the 'start == 0'
and 'num == 0' special case used to free all LDT entries that previously
failed with EINVAL.

Submitted by: clang via rdivacky (some of 1)
Reviewed by: kib


217515 17-Jan-2011 jkim

Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.

MFC after: 1 month


217506 17-Jan-2011 jkim

Avoid preemption while manipulating CRs and MTRRs.

Tested by: ariff


217424 14-Jan-2011 jkim

Remove redundant, bogus, and even harmful uses of setting TS bit in CR0.
It is done from fpstate_drop() when it is really necessary.

Reviewed by: kib
MFC after: 1 week


217368 13-Jan-2011 mdf

Fix up a few more sysctl(9) mis-typing found in various LINT builds.


217360 13-Jan-2011 jhb

If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared. This causes the local APIC interrupt routine
to fail to find an interrupt to service. Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC. If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by: steve


217326 12-Jan-2011 mdf

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the kernel changes.


217192 09-Jan-2011 kib

Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by: alc


217157 08-Jan-2011 tijl

Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98
headers with stubs.

Approved by: kib (mentor)


217151 08-Jan-2011 kib

Create shared (readonly) page. Each ABI may specify the use of page by
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.

Enable shared page use for amd64, both native and 32bit FreeBSD
binaries. Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.

Reviewed by: alc


217147 08-Jan-2011 tijl

On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by: bde [1]
Approved by: kib (mentor)


217145 08-Jan-2011 tijl

Fix types of some values in machine/_limits.h.

On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by: bde
Approved by: kib (mentor)


217097 07-Jan-2011 kib

Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.


216673 22-Dec-2010 jkim

Increase size of pcb_flags to four bytes.

Requested by: bde, jhb


216634 22-Dec-2010 jkim

Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags. These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP. Add comments about the rationale[1]. Use these functions
wherever possible. Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency. Turn pcb_full_iret into a PCB flag as
it is safe now. Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by: kib
Submitted by: kib[1]
MFC after: 2 months


216592 20-Dec-2010 tijl

Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by: imp (previous version), jhb
Approved by: kib (mentor)


216524 18-Dec-2010 kib

Inform a compiler which asm statements in the x86 implementation of
atomics change eflags.

Reviewed by: jhb
MFC after: 2 weeks


216443 14-Dec-2010 jkim

Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case. Now we support cpu_get_nominal_mhz()
for the case, instead. Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.


216405 13-Dec-2010 rwatson

Add options NO_ADAPTIVE_SX to the XENHVM kernel configuration, matching
its similar disabling of adaptive mutexes and rwlocks. The existing
comment on why this is the case also applies to sx locks.

MFC after: 3 days
Discussed with: attilio


216394 12-Dec-2010 kib

In fpudna()/npxdna(), mark FPU context initialized and optionally
mark user FPU context initialized, if current context is user context.
It was reversed in r215865, by inadequate change of this code fragment
to a call to fpuuserinited()/npxuserinited().

The issue is only relevant for in-kernel users of FPU.

Reported by: Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net>
Tested by: Mike Tancsa
MFC after: 3 days


216365 10-Dec-2010 rwatson

Derive the XENHVM kernel from GENERIC, adding only the options required
to support PV drivers (such as xenpci), and non-adptive locking (along
with a comment about why).

This change eliminates the synchronisation problem between GENERIC and
XENHVM, which had become severely rotted in HEAD, and in 8-STABLE
included non-production kernel debugging features such as WITNESS.

However, it comes at the cost of enabling devices and options that may
not be present under Xen (such as random ethernet cards). For now, opt
for a simpler kernel configuration file rather than using nooptions/
nodevice to enumerate and eliminate them. This leads to a somewhat
larger XENHVM kernel.

This is an MFC candidate for 8-STABLE before 8.2, in order to provide
a production-worthy XENHVM kernel configuration for amd64.

Discussed with: gibbs, cperciva
Reported by: Piete Brooks <Piete.Brooks at cl.cam.ac.uk>
Sponsored by: DARPA, AFRL
MFC after: 3 days


216316 09-Dec-2010 cperciva

Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c
(which are identical) with a single x86/x86/busdma_machdep.c.


216312 08-Dec-2010 jkim

Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC.
Remove a confusing comment about converting to MHz as we never did.


216308 08-Dec-2010 cperciva

On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192,
while on i386 we have MAX_BPAGES=512. Implement this difference via
'#ifdef __i386__'.

With this commit, the i386 and amd64 busdma_machdep.c files become
identical; they will soon be replaced by a single file under sys/x86.


216306 08-Dec-2010 cperciva

MFi386 r1.94: If XEN, make pmap_kextract = pmap_kextract_ma. This is a
no-op currently, since FreeBSD/amd64 doesn't have (paravirtualized) Xen
support, but if/when that support is ever added we'll want this, and
until then it's harmless.


216304 08-Dec-2010 cperciva

MFi386 r1.81, r1.82, r1.84: Reorganize code to reduce cache pressure and
branch mispredictions.

No objections from: scottl


216283 08-Dec-2010 jkim

Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.

Discussed with: avg


216276 07-Dec-2010 jkim

Remove stale comments about P-state invariant TSC and fix style(9) nits.


216275 07-Dec-2010 jkim

Do not register a event handler for CPU freqency changes when it is found
P-state invariant. This is continuation of r216274.


216274 07-Dec-2010 jkim

Now the P-state invariant TSC is probed early enough, do not register event
handlers for CPU freqency changes when it is found P-state invariant.
Adjust a comment about non-existent tsc_freq_max() while I am here.


216272 07-Dec-2010 jkim

Probe P-state invariant TSC from rightful place.


216255 07-Dec-2010 kib

Update some comments related to use of amd64 full context switch.
In exec_linux_setregs(), use locally cached pointer to pcb to set
pcb_full_iret.
In set_regs(), note that full return is needed when code that sets
segment registers is enabled.

MFC after: 1 week


216253 07-Dec-2010 kib

Retire write-only PCB_FULLCTX pcb flag on amd64.

Reminded by: Petr Salinger <Petr.Salinger seznam cz>
Tested by: pho
MFC after: 1 week


216231 06-Dec-2010 kib

Do not leak %rdx value in the previous image to the new image after
execve(2). Note that ia32 binaries already handle this properly,
since ia32_setregs() resets td_retval[1], but not exec_setregs().

We still do not conform to the amd64 ABI specification, since %rsp
on the image startup is not aligned to 16 bytes.

PR: amd64/124134
Discussed with: Petr Salinger <Petr.Salinger seznam cz>
(who convinced me that there is indeed several bugs)
MFC after: 1 week


216163 03-Dec-2010 jkim

Revert r216161. It is not necessary because we zero-fill BSS anyway.

Requested by: jhb


216161 03-Dec-2010 jkim

Explicitly initialize TSC frequency. To calibrate TSC frequency, we use
DELAY(9) and it may use TSC in turn if TSC frequency is non-zero.

MFC after: 3 days


216159 03-Dec-2010 jkim

Do not change CPU ticker frequency if TSC is P-state invariant. Note this
change was meant to be committed with r184102 (and its subsequent MFCs) but
it fell off somehow.

Pointyhat to: jkim
MFC after: 3 days


216143 03-Dec-2010 brucec

Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.


216134 02-Dec-2010 brucec

Disallow passing in a count of zero bytes to the bus_space(9) functions.

Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR: kern/80980
Discussed with: jhb


216012 28-Nov-2010 kib

Calling fill_fpregs() for curthread is legitimate, and ELF coredump
does this.

Reported and tested by: pho
MFC after: 5 days


215878 26-Nov-2010 alc

Make the size of the direct map easily configurable. Changing NDMPML4E
now suffices.

Increase the size of the direct map to 1TB.

An earler version of this patch was tested by sbruno@.


215865 26-Nov-2010 kib

Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs()
functions, they are unused. Remove 'user' from npxgetuserregs()
etc. names.

For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU
context storage. This eliminates the need for ugly copying with
overwrite of the newly added and reserved fields in ucontext on i386
to satisfy alignment requirements for fpusave() and fpurstor().

pc98 version was copied from i386.

Suggested and reviewed by: bde
Tested by: pho (i386 and amd64)
MFC after: 1 week


215856 26-Nov-2010 tijl

Merge amd64/i386 _align.h by aligning on the size of register_t (copied
from powerpc).

Reviewed by: imp, jhb
Approved by: kib (mentor)


215854 26-Nov-2010 uqs

Remove kernel support for BB profiling, now that kernbb(8) is gone, too.

PR: bin/83558
Reviewed by: jkim


215845 25-Nov-2010 dim

Apply the same fix as in r215823 to sys/amd64/amd64/fpu.c: use
unambiguous inline assembly to load a float variable.


215801 24-Nov-2010 dim

Change ambiguous (or invalid, depending on how strict you want to be :)
assembly instruction "movw %rcx,2(%rax)" to "movw %cx,2(%rax)", since
the intent was to move 16 bits of data, in this case.

Found by: clang
Reviewed by: kib


215754 23-Nov-2010 jkim

Remove a stale tunable introduced in r215703.


215753 23-Nov-2010 jkim

Reinitialize PAT MSR via pmap_init_pat() while resuming. This function does
better job since r215703 and it is safer now.


215748 23-Nov-2010 avg

specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX

CPUID.6 is defined as Thermal and Power Management Leaf by both Intel
and AMD.

Reviewed by: jhb
MFC after: 7 days


215703 22-Nov-2010 jkim

- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual. Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-). Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as
WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the
default of WB and UC. This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency. Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks. With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more. Now this function is
identical on both amd64 and i386.

Reviewed by: jhb
Tested by: RM (reuf_m at hotmail dot com)
Ryszard Czekaj (rychoo at freeshell dot net)
army.of.root (army dot of dot root at googlemail dot com)
MFC after: 3 days


215681 22-Nov-2010 jhb

Remove some bogus, self-referential mergeinfo.


215524 19-Nov-2010 avg

specialreg.h: add definitions for MPERF/APERF pair of MSRs

These MSRs can be used to determine actual (average) performance as
compared to a maximum defined performance.
Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both
Intel and AMD processors.

MFC after: 5 days


215523 19-Nov-2010 avg

specialreg.h: add AMD-specific "Hardware Configuration Register" MSR

It seems that this MSR has been available in a range of AMD processors
families for quite a while now.

Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in
the i386 version.
Note2: perhaps some additional name component is needed to distinguish
AMD-specific MSRs.

MFC after: 5 days


215522 19-Nov-2010 avg

specialreg.h: add definition for AMD Core Performance Boost bit

This bit indicates availability of the feature.

MFC after: 4 days


215415 16-Nov-2010 jkim

Restore CR0 after MTRR initialization for correctness sakes. There will be
no noticeable change because we enable caches before we enter here for both
BSP and AP cases. Remove another pointless optimization for CR4.PGE bit
while I am here.


215414 16-Nov-2010 jkim

Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this
code but probably it only worked by chance because modifying CR4.PGE bit
causes invlidation of entire TLBs. Since these are very rare events, this
micro-optimization seems useless.

Reviewed by: jhb


215321 14-Nov-2010 kib

Do not use __FreeBSD_version prefix for the special osrel version.
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.

Reported and tested by: swell.k gmail com
MFC after: 1 week


215309 14-Nov-2010 kib

Use symbolic names instead of hardcoding values for magic p_osrel constants.

MFC after: 1 week


215140 11-Nov-2010 jkim

Move identical copies of apm_bios.h to sys/x86/include, replace them with
stubs, and adjust PC98 stub accordingly.

Reviewed by: imp, nyan


215133 11-Nov-2010 avg

amd64: introduce minidump version 2

After KVA space was increased to 512GB on amd64 it became impractical
to use PTEs as entries in the minidump map of dumped pages, because size
of that map alone would already be 1GB.
Instead, we now use PDEs as page map entries and employ two stage lookup
in libkvm: virtual address -> PDE -> PTE -> physical address. PTEs are
now dumped as regular pages. Fixed page map size now is 2MB.

libkvm keeps support for accessing amd64 minidumps of version 1.
Support for 1GB pages is added.

Many thanks to Alan Cox for his guidance, numerous reviews, suggestions,
enhancments and corrections.

Reviewed by: alc [kernel part]
MFC after: 15 days


215097 10-Nov-2010 jkim

Make APM emulation look more closer to its origin. Use device_get_softc(9)
instead of hardcoding acpi(4) unit number as we have device_t for it.


215072 10-Nov-2010 jkim

Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new
file acpi_apm.c, and place it on sys/x86/acpica.


215054 09-Nov-2010 jhb

- Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
in the _mtx_* or __mtx_* namespaces. While here, change the names to more
closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by: bde (1, 2)


215051 09-Nov-2010 attilio

Move the mptable.h under x86/include/.

Sponsored by: Sandvine Incorporated
MFC after: 14 days


215024 09-Nov-2010 jkim

Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home.


215023 09-Nov-2010 jkim

Reduce diff between platforms and fix style(9) bugs.


215012 08-Nov-2010 jhb

Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is
identical on both platforms.


215002 08-Nov-2010 jhb

A few small style and whitespace fixes.


214954 07-Nov-2010 alc

Don't call pmap_demote_DMAP() on MTRR entries from the BIOS that are marked
as "bogus".

Reported by: Jia-Shiun Li


214835 05-Nov-2010 jhb

Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger. Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock. However, trap interrupts for single-stepping
can still occur even when interrupts are disabled. Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased. Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero. To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with: bde
MFC after: 1 month


214774 04-Nov-2010 avg

x86 topo_probe: do not probe smp topology if only one cpu is visible

This could lead to a division by zero if hardware is multi-core and/or
multi-threaded, but for some (quite unusual) reason FreeBSD sees only
one logical processor. This could happen, for example, if neither MADT
nor MP Table are presented by BIOS.

Also:
- assert in topo_probe_0x4 that BSP is accounted for
- neither cpu_cores nor cpu_logical should be zero after successful
probing, so either being zero is an indication of failed probing

Reported by: vwe, Dan Allen <danallen46@airwired.net>
Tested by: Dan Allen <danallen46@airwired.net>
MFC after: 3 days


214631 01-Nov-2010 jhb

Move <machine/apicreg.h> to <x86/apicreg.h>.


214630 01-Nov-2010 jhb

Move the <machine/mca.h> header to <x86/mca.h>.


214576 30-Oct-2010 alc

Add another safety belt to pmap_demote_DMAP().


214563 30-Oct-2010 alc

Don't demote in pmap_demote_DMAP() if the specified length is zero.


214457 28-Oct-2010 attilio

Merge nexus.c from amd64 and i386 to x86 subtree.

Sponsored by: Sandvine Incorporated
Tested by: gianni


214448 28-Oct-2010 jhb

Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine
when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[]
is only populated for multiple-CPU machines. This also matches what the
code does when SMP is not enabled.

PR: bin/151616
Tested by: "Damian S. Kolodziejczyk" damkol | gmail
Submitted by: avg
MFC after: 1 week


214446 28-Oct-2010 attilio

Merge the mptable support from MD bits to x86 subtree.

Sponsored by: Sandvine Incorporated
Discussed with: jhb


214425 27-Oct-2010 alc

[1] According to the x86 architectural specifications, no virtual-to-
physical page mapping should span two or more MTRRs of different types.
Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can
ensure that the direct map region doesn't have such a mapping.

[2] Fix a couple of nearby style errors in amd64_mrset().

[3] Re-enable the use of 1GB page mappings for implementing the direct
map. (See also r197580 and r213897.)

Tested by: kib@ on a Westmere-family processor [3]
MFC after: 3 weeks


214373 26-Oct-2010 attilio

Merge dump_machdep.c i386/amd64 under the x86 subtree.

Sponsored by: Sandvine Incorporated
Tested by: gianni


214347 25-Oct-2010 jhb

Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
by intr_disable().

Requested by: bde


214346 25-Oct-2010 jhb

Use intr_disable() and intr_restore() instead of frobbing the flags register
directly to disable interrupts.

Reviewed by: bde (earlier version)
MFC after: 2 weeks


213897 15-Oct-2010 alc

Update pmap_extract() to handle 1GB page mappings. Some device drivers
use pmap_extract() rather than pmap_kextract() on direct map addresses.
Thus, pmap_extract() needs to be able to deal with 1GB page mappings if
we are to use 1GB page mappings for the direct map. (See r197580.)


213748 12-Oct-2010 jkim

Remove trailing ", " from `sysctl machdep.idle_available' output.


213716 12-Oct-2010 kib

Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with: bz, rwatson
Approved by: re (bz, kensmith)
MFC after: 2 weeks


213545 08-Oct-2010 kib

Regen.


213544 08-Oct-2010 kib

Fix typo.

Submitted by: arundel
MFC after: 3 days


213452 05-Oct-2010 kib

Display PCID capability of CPU and add CPUID define for it.

MFC after: 1 week


213382 03-Oct-2010 kib

The makectx() function, used by kdb_trap() to reconstruct pcb from
trap frame when trap initiated kdb entry, incorrectly calculated the
value of %rsp for trapped thread.

According to Intel(R) 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide, Part 1, rev. 035, 6.14.2 64-Bit Mode
Stack Frame, "64-bit mode ... pushes SS:RSP unconditionally, rather than
only on a CPL change."
Even assuming the conditional push of the %ss:%rsp, the calculation
was still wrong because sizeof(tf_ss) + sizeof(tf_rsp) == 16 on amd64.

Always use the tf_rsp from trap frame. The change supposedly fixes
stepping when using kgdb backend for kdb.

Submitted by: Zhouyi Zhou <zhouzhouyi gmail com>
PR: amd64/151167
Reviewed by: avg
MFC after: 1 week


213323 01-Oct-2010 avg

i386 and amd64 mp_machdep: improve topology detection for Intel CPUs

This patch is significantly based on previous work by jkim.
List of changes:
- added comments that describe topology uniformity assumption
- added reference to Intel Processor Topology Enumeration article
- documented a few global variables that describe topology
- retired weirdly set and used logical_cpus variable
- changed fallback code for mp_ncpus > 0 case, so that CPUs are treated
as being different packages rather than cores in a single package
- moved AMD-specific code to topo_probe_amd [jkim]
- in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT
and core masks and match APIC IDs against those masks [started by
jkim]
- in topo_probe_0x4() drop code for double-checking topology parameters
by looking at L1 cache properties [jkim]
- in topo_probe_0xb() add fallback path to topo_probe_0x4() as
prescribed by Intel [jkim]

Still to do:
- prepare for upcoming AMD CPUs by using new mechanism of uniform
topology description [pointed by jkim]
- probe cache topology in addition to CPU topology and probably use that
for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have
the same CPU topology, but Athlon cores do not share L2 cache while
Core2's do (no L3 cache in both cases)
- think of supporting non-uniform topologies if they are ever
implemented for platforms in question
- think how to better described old HTT vs new HTT distinction, HTT vs
SMT can be confusing as SMT is a generic term
- more robust code for marking CPUs as "logical" and/or "hyperthreaded",
use HTT mask instead of modulo operation
- correct support for halting logical and/or hyperthreaded CPUs, let
scheduler know that it shouldn't schedule any threads on those CPUs

PR: kern/145385 (related)
In collaboration with: jkim
Tested by: Sergey Kandaurov <pluknet@gmail.com>,
Jeremy Chadwick <freebsd@jdc.parodius.com>,
Chip Camden <sterling@camdensoftware.com>,
Steve Wills <steve@mouf.net>,
Olivier Smedts <olivier@gid0.org>,
Florian Smeets <flo@smeets.im>
MFC after: 1 month


213282 29-Sep-2010 neel

Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.

The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by: NetApp
Submitted by: Will McGovern (will at netapp dot com)
Reviewed by: mjacob, jhb


213098 24-Sep-2010 davidxu

Now userland POSIX semaphore is based on umtx. The kernel module
is only used to support binary compatible, if want to run old
binary, you need to kldload the module.


212861 19-Sep-2010 nork

Add support 'device tpm' for amd64.
Add tpm(4)'s default setting to /boot/defaults/loader.conf.
Add 'device tpm' to NOTES for amd64 and i386.

Discussed with: takawata
Approved by: imp (mentor)


212784 17-Sep-2010 avg

amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory

KVA space is abundant on amd64, so there is no reason to limit kernel
map size to a fraction of available physical memory. In fact, it could
be larger than physical memory.

This should help with memory auto-tuning for ZFS and shouldn't affect
other workloads.
This should reduce number of circumstances for "kmem_map too small"
panics, but probably won't eliminate them entirely due to potential kmem
fragmentation.

In fact, you might want/need to limit maximum ARC size after this commit
if you need to resrve more memory for applications.

This change was discussed on arch@ and nobody said "don't do it".

MFC after: 6 weeks


212541 13-Sep-2010 mav

Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.


212420 10-Sep-2010 ken

MFp4 (//depot/projects/mps/...)

Bring in a driver for the LSI Logic MPT2 6Gb SAS controllers.

This driver supports basic I/O, and works with SAS and SATA drives and
expanders.

Basic error recovery works (i.e. timeouts and aborts) as well.

Integrated RAID isn't supported yet, and there are some known bugs.

So this isn't ready for production use, but is certainly ready for
testing and additional development. For the moment, new commits to this
driver should go into the FreeBSD Perforce repository first
(//depot/projects/mps/...) and then get merged into -current once
they've been vetted.

This has only been added to the amd64 GENERIC, since that is the only
architecture I have tested this driver with.

Submitted by: scottl
Discussed with: imp, gibbs, will
Sponsored by: Yahoo, Spectra Logic Corporation


212413 10-Sep-2010 avg

bus_add_child: change type of order parameter to u_int

This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to: r212213
MFC after: 10 days


212177 03-Sep-2010 rdivacky

Change the parameter passed to the inline assembly to u_short
as we are dealing with 16bit segment registers. Change mov
to movw.

Approved by: rpaulo (mentor)
Reviewed by: kib, rink


212026 30-Aug-2010 jkim

Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use
these values in the function.


211924 28-Aug-2010 rpaulo

Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.

Sponsored by: The FreeBSD Foundation


211804 25-Aug-2010 rpaulo

Call the necessary DTrace function pointers when we have different kinds
of traps.

Sponsored by: The FreeBSD Foundation


211752 24-Aug-2010 rpaulo

Add two DTrace trap type values. Used by fasttrap.

Sponsored by: The FreeBSD Foundation


211518 19-Aug-2010 attilio

Revert part of the r211149 as I erroneously ported the logical_cpus from
Yahoo! patchset as a mask (and according manipulating variables) while
it is actually a CPU count.

Submitted by: neel
MFC after: 1 month
X-MFC: 211149


211515 19-Aug-2010 jhb

Remove unused KTRACE includes.


211424 17-Aug-2010 gahr

- The iMac9,1 needs the PAT workaround as well

Approved by: cognet


211412 17-Aug-2010 kib

Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by: marius (sparc64)
MFC after: 1 month


211292 13-Aug-2010 jkim

Reset switchtime to zero rather than the current CPU ticker (TSC) value.
It is more appropriate in this context because TSC MSR is reset to zero
when the CPU is restarted from S3 and above. Move acpi_resync_clock() back
to where it was before r211202. It does not make a difference any more.


211220 12-Aug-2010 attilio

Revert r211176:
As long as interrupts are disabled and there is not explicit call to
sched_add() there can't be any preemption there, thus the calls may be
consistent.

Reported by: kib, jhb


211202 12-Aug-2010 jkim

Reset switchtime and switchticks after resynchronizing the system clock.
This should fix weird runtime problem after resume on amd64. It also fixes
"calcru: runtime went backwards" warnings with bootverbose.


211197 11-Aug-2010 jhb

Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int. Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.


211176 11-Aug-2010 attilio

IPI handlers may run generally with interrupts disabled because they
are served via an interrupt gate.

However, that doesn't explicitly prevent preemption and thread
migration thus scheduler pinning may be necessary in some handlers.
Fix that.

Tested by: gianni
MFC after: 1 month


211151 10-Aug-2010 attilio

Fix a typo due to a stale version of the patch.

Reported by: gianni, rdivacky
MFC after: 1 month
X-MFC: 211149


211149 10-Aug-2010 attilio

Fix some places that may use cpumask_t while they still use 'int' types.
While there, also fix some places assuming cpu type is 'int' while
u_int is really meant.

Note: this will also fix some possible races in per-cpu data accessings
to be addressed in further commits.

In collabouration with: Yahoo! Incorporated (via sbruno and peter)
Tested by: gianni
MFC after: 1 month


211117 09-Aug-2010 attilio

Simplify the logic for handling ipi_selected() and ipi_cpu() in the
amd64/i386 case.

Reviewed by: jhb
Tested by: gianni
MFC after: 1 month
X-MFC: 210939


211082 08-Aug-2010 dwmalone

Don't pass sizeof(u_int) to an argument of SYSCLT_PROC that ends up not
being used.


211006 07-Aug-2010 kib

Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS
in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well.

MFC after: 1 week


210947 06-Aug-2010 bschmidt

Fix whitespace nits.

PR: conf/148989
Submitted by: pluknet <pluknet at gmail.com>
MFC after: 3 days


210942 06-Aug-2010 jkim

Remove unnecessary casting and simplify code. We are not there yet. ;-)


210940 06-Aug-2010 jkim

Correct argument order of acpi_restorecpu(), which was forgotten in r210804.


210939 06-Aug-2010 jhb

Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month


210868 05-Aug-2010 jhb

Change the MPTable and $PIR PCI-PCI bridge drivers to inherit from the
generic PCI-PCI bridge driver and only override specific methods. This
should fix suspend/resume of PCI-PCI bridges using these drivers.


210810 03-Aug-2010 jkim

Remove an unnecessary register load.


210804 03-Aug-2010 jkim

savectx() has not been used for fork(2) for about 15 years. [1]
Do not clobber FPU thread's PCB as it is more harmful. When we resume CPU,
unconditionally reload FPU state.

Pointed out by: bde [1]


210780 02-Aug-2010 jkim

Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make
better use of cache lines by placing it before pcb_save (now pcb_user_save),
which is moved to the end of pcb since r210777.


210777 02-Aug-2010 jkim

- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1]
savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus,
saving additional information does not hurt and it may be even beneficial.
Unfortunately, struct pcb has grown larger to accommodate more data.
Move 512-byte long pcb_user_save to the end of struct pcb while I am here.
- savectx() now saves FPU state unconditionally and copy it to the PCB of
FPU thread if necessary. This gives panic dump and kdb a chance to take
a look at the current FPU state even if the FPU is "supposedly" not used.
- Resuming CPU now unconditionally reinitializes FPU. If the saved FPU
state was irrelevant, it could be in an unknown state.

Suggested by: bde [1]


210774 02-Aug-2010 jhb

Tweak the logic to disable CLFLUSH in virtual environments to work around
problems with flushing the local APIC register range so that it checks
vm_guest directly.

Reviewed by: kib, alc
MFC after: 2 weeks


210665 30-Jul-2010 delphij

In rdmsr_safe, use zero extend (by doing a 32-bit movl over
eax to itself) instead of a sign extend.

Discussed with: stas
MFC after: 1 month


210624 29-Jul-2010 delphij

Improve cputemp(4) driver wrt newer Intel processors, especially
Xeon 5500/5600 series:

- Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place
of Tj(max) when a sane value is available, as documented
in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane
value we mean 70C - 100C for now);
- Print the probe results when booting verbose;
- Replace cpu_mask with cpu_stepping;
- Use CPUID_* macros instead of rolling our own.

Approved by: rpaulo
MFC after: 1 month


210623 29-Jul-2010 jhb

Mark the __curthread() functions as __pure2 and remove the volatile keyword
from the inline assembly. This allows the compiler to cache invocations of
curthread since it's value does not change within a thread context.

Submitted by: zec (i386)
MFC after: 1 week


210615 29-Jul-2010 jkim

Fix another fallout from r208833. savectx() is used to save CPU context
for crash dump (dumppcb) and kdb (stoppcbs). For both cases, there cannot
have a valid pointer in pcb_save. This should restore the previous
behaviour.


210614 29-Jul-2010 jkim

Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h.


210577 28-Jul-2010 jhb

The corrected error count field is dependent on CMCI, not TES.

MFC after: 1 week


210564 28-Jul-2010 mdf

Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size. The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class. This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused. At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by: -arch (mostly silence)
Reviewed by: zml
Approved by: zml (mentor)


210555 28-Jul-2010 alc

The interpreter name should no longer be treated as a buffer that can be
overwritten. (This change should have been included in r210545.)

Submitted by: kib


210550 27-Jul-2010 jhb

Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.

Reviewed by: alc


210521 26-Jul-2010 jkim

Simplify fldcw() macro. There is no reason to use pointer here. No object
file change after this commit (verified with md5).


210520 26-Jul-2010 jkim

Add missing ldmxcsr() prototype for lint case.


210518 26-Jul-2010 jkim

Reduce diff against fenv.h:

Mark all inline asms as volatile for safety. No object file change after
this commit (verified with md5).


210517 26-Jul-2010 jkim

FNSTSW instruction can use AX register as an operand.

Obtained from: fenv.h


210514 26-Jul-2010 jkim

Re-implement FPU suspend/resume for amd64. This removes superfluous uses
of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs().
Also, we do not touch PCB flags any more.

MFC after: 1 month


210501 26-Jul-2010 kib

Remove unneeded includes.

Submitted by: alc
MFC after: 1 week


210432 23-Jul-2010 kib

Regen


210431 23-Jul-2010 kib

Remove the linux_exec_copyin_args(), freebsd32_exec_copyin_args() may
server as well. COMPAT_FREEBSD32 is a prerequisite for COMPAT_LINUX32.

Reviewed by: alc
MFC after: 3 weeks


210429 23-Jul-2010 alc

Eliminate a little bit of duplicated code.


210369 22-Jul-2010 kib

When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by: jhb, nwhitehorn
MFC after: 2 weeks


210179 16-Jul-2010 mav

Add hints for i8254 timer on i386 and amd64. Some people report about
systems with PnP/ACPI not reporting i8254 timer. In some cases it can be
fatal, as i8254 can be the only available time counter hardware. From other
side we are now heavily depend on i8254 timer and till the last time it's
init/usage was completely hardcoded. So this change just restores previous
behavior in more regular fashion.


210131 15-Jul-2010 mav

Move functions declaration to MI code, following implementation.


210124 15-Jul-2010 alc

Optimize pmap_remove()'s handling of PG_G mappings. Specifically,
instead of calling pmap_invalidate_page() for each PG_G mapping, call
pmap_invalidate_range() for each range of PG_G mappings. In addition,
eliminate a redundant call to pmap_invalidate_page(). Both
pmap_remove_pte() and pmap_remove_page() called pmap_invalidate_page()
when the mapping had the PG_G attribute. Now, only pmap_remove_page()
calls pmap_invalidate_page(). Altogether, these changes eliminate 53%
of the TLB shootdowns for a "buildworld" on a ZFS file system. On
FFS, the reduction is 3%.

MFC after: 6 weeks


210113 15-Jul-2010 bschmidt

- Update 6000 firmware to 9.221.4.1
- Add 6050 firmware

MFC after: 2 weeks


209995 13-Jul-2010 imp

Remove obsolete undef of COPY_SIGCODE. It appears to have not been
used in FreeBSD in quite some time (maybe since before 4.4-lite :)

Submitted by: bde


209957 12-Jul-2010 jkim

Move i386-inherited logic of building ACPI headers for acpi_wakeup.c into
better places and remove intermediate makefile and shell scripts. This
makes parallel kernel build little bit safer for amd64.


209956 12-Jul-2010 jhb

Remove a dead test. We already exclude NMI traps from this code in an
earlier condition.

MFC after: 1 week


209955 12-Jul-2010 kib

When switching the thread from the processor, store %dr7 content
into the pcb before disabling watchpoints. Otherwise, when the
thread is restored on a processor, watchpoints are still disabled.

Submitted by: Tijl Coosemans <tijl coosemans org>
(I would be much happier if Tijl commited this himself)
MFC after: 1 week


209887 10-Jul-2010 alc

Reduce the number of global TLB shootdowns generated by pmap_qenter().
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown. Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns. The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown. With this change, that is no longer so.

For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.

MFC after: 3 weeks


209862 09-Jul-2010 kib

For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by: alc [1]
Reviewed by: alc
MFC after: 3 weeks


209789 08-Jul-2010 alc

Correctly maintain the per-cpu field "curpmap" on amd64 just like we
do on i386. The consequences of not doing so on amd64 became apparent
with the introduction of the COUNT_IPIS and COUNT_XINVLTLB_HITS
options. Specifically, single-threaded applications were generating
unnecessary IPIs to shoot-down the TLB on other processors. However,
this is clearly nonsensical because a single-threaded application is
only running on the current processor. The reason that this happens
is that pmap_activate() is unable to properly update the old pmap's
field "pm_active" without the correct "curpmap". So, in effect, stale
bits in "pm_active" were leading pmap_protect(), pmap_remove(),
pmap_remove_pages(), etc. to flush the TLB contents on some arbitrary
processor that wasn't even running the same application.

Reviewed by: kib
MFC after: 3 weeks


209763 07-Jul-2010 rpaulo

Fix style issues with the previous commit, namely
use-tab-instead-of-space and don't use underscores in macro variables.

Pointed out by: bde


209758 07-Jul-2010 kevlo

Add the u3g(4) driver. I can't find any reason why it's not here.


209731 06-Jul-2010 rpaulo

Introduce USD_{SET,GET}{BASE,LIMIT}. These help setting up the user
segment descriptor hi and lo values. Idea from Solaris.

Reviewed by: kib


209649 02-Jul-2010 mav

Revert r209638. After commit, there appeared to be more people who liked
previous name of stray interrupt counters, then responded to the list.


209638 01-Jul-2010 mav

Make stray irq counters have format alike to other counters. Unified format
makes string processing (for example by `systat -vm`) easier.


209613 30-Jun-2010 jhb

Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.


209581 28-Jun-2010 kib

Regenerate


209483 23-Jun-2010 kib

Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.

The DF clearing for signal handlers was done some time ago.

MFC after: 1 week


209463 23-Jun-2010 kib

Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() for
get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also,
note that usercontext is not initialized anymore in fpstate_drop().

Systematically replace references to npxgetregs() and npxsetregs()
by npxgetuserregs() and npxsetuserregs() in comments.

Noted by: bde


209432 22-Jun-2010 mav

Some style fixes for r209371.

Submitted by: jhb@


209371 20-Jun-2010 mav

Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.


209313 18-Jun-2010 kib

Only enable kdtrace hook in the LINT on the architectures that implement it.


209252 17-Jun-2010 kib

In the ia32_{get,set}_fpcontext(), use fpu{get,set}userregs instead
of fpu{get,set}regs.

Noted by: bde
MFC after: 1 month


209248 17-Jun-2010 mav

Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.

Add counters for timer-related IPIs.

Reviewed by: jhb@ (previous version)


209212 15-Jun-2010 jhb

Restore the machine check register banks on resume. For banks being
monitored via CMCI, reset the interrupt threshold to 1 on resume.

Reviewed by: jkim
MFC after: 2 weeks


209208 15-Jun-2010 kib

Remove two obsoleted comments, add a note about 32bit compatibility.

MFC after: 1 month


209204 15-Jun-2010 kib

Rename CRITSECT_ASSERT to CRITICAL_ASSERT.

Suggested by: jhb
MFC after: 1 month


209198 15-Jun-2010 kib

Use critical sections instead of disabling local interrupts to ensure
the consistency between PCPU fpcurthread and the state of the FPU.

Explicitely assert that the calling conventions for fpudrop() are
adhered too. In cpu_thread_exit(), add missed critical section entrance.

Reviewed by: bde
Tested by: pho
MFC after: 1 month


209174 14-Jun-2010 jkim

Fix ACPI suspend/resume on amd64, which was broken since r208833.
We need actual storage for FPU state to save and restore.


209155 14-Jun-2010 mav

Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu()
for pre-bound IRQs during boot, submit there LAPIC ID, same as in other
places, not CPU ID.


209059 11-Jun-2010 jhb

Update several places that iterate over CPUs to use CPU_FOREACH().


209048 11-Jun-2010 alc

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


208994 10-Jun-2010 kan

Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requests
in Linux emulation layer. Linux seems to only require that pos is
page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter
checking is too strict to allow some Linux binaries to run. tsMuxeR is
one example of such a binary.

Discussed with: jhb
MFC after: 1 week


208990 10-Jun-2010 alc

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


208922 08-Jun-2010 jhb

Move the MD support for PCI message signalled interrupts to the x86 tree
as it is identical for i386 and amd64.


208921 08-Jun-2010 jhb

Move the machine check support code to the x86 tree since it is identical
on i386 and amd64.

Requested by: alc


208919 08-Jun-2010 jhb

Move the I/O APIC code to the x86 tree since it is identical on i386 and
amd64.


208915 08-Jun-2010 jhb

- Use a bit more care when moving I/O APIC interrupts between CPUs. Mask
the interrupt followed by a brief delay if it is not currently masked
before moving the interrupt.
- Move the icu_lock out of ioapic_program_intpin() and into callers. This
closes a race where ioapic_program_intpin() could use a stale value of
the masked state to compute the masked bit in the register.

Reviewed by: mav
MFC after: 2 weeks


208877 06-Jun-2010 kib

Style-compilant order of declarations.

Noted by: bde
MFC after: 1 month


208833 05-Jun-2010 kib

Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches. There is also a
facility to allow the kernel thread to use pcb save_area.

Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.

KPI discussed with: fabient
Tested by: pho, fabient
Hardware provided by: Sentex Communications
MFC after: 1 month


208667 31-May-2010 alc

Eliminate a stale comment.


208657 30-May-2010 alc

Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.


208645 29-May-2010 alc

When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().


208621 28-May-2010 jhb

Defer initializing machine checks for the boot CPU until the local APIC is
fully configured.

MFC after: 1 month


208609 28-May-2010 alc

Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released. This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().


208574 26-May-2010 alc

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


208507 24-May-2010 jhb

Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached. CMCI also includes a count of events when reporting
corrected errors in the bank's status register. Note that individual
banks may or may not support CMCI. If they do, each bank includes its own
threshold register that determines when the interrupt fires. Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl). The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by: Sailaja Bangaru sbappana at yahoo com
MFC after: 1 month


208504 24-May-2010 alc

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


208494 24-May-2010 mav

- Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.


208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


208452 23-May-2010 mav

Unify local_apic.c for x86 archs,


208392 21-May-2010 jhb

- Adjust the whitespace for the lines that output fields in 'show pcpu' in
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
the idle process is now shared across all idle threads.

MFC after: 1 month


208332 20-May-2010 phk

Rename an argument from "exp" to "expect" since the former makes FlexeLint
uneasy, in case anybody think it might be exp(3) in libm.

This also makes it consistent with other archs.


208311 19-May-2010 jhb

Add constants for the optional EOI suppression support in local APICs and
EOI registers in I/O APICs.


208175 16-May-2010 alc

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


208026 13-May-2010 kib

Do not use .extern, it is not strictly needed with gas and it is custom
to omit it.

Requested by: bde
MFC after: 6 days


207958 12-May-2010 kib

Route all returns from the interrupts and faults through the doreti_iret
labeled iretq instruction.

Suppose that multithreaded process executes two threads, currently
scheduled on different processors. Let assume that thread A executes
using %cs or %ss pointing into the descriptor from LDT. If IPI comes
which handler does not return by jump to doreti, and meantime thread B
invalidates descriptor pointed to by %cs or %ss, then iretq from IPI
handler could fault.

Routing the return by doreti_iret allows kernel to catch the situation
and recover from it by sending signal to the usermode.

Tested by: pho
MFC after: 1 week


207957 12-May-2010 kib

Remove unneeded overrides of the segment registers in the inner trap
frame upon segment register load fault. The doreti procedure does not
load segment registers when returning to the kernel frame, and current
values in the segment descriptor cache already allow the kernel mode
to run, not modified by faulted loaded.

Suggested by: bde
Tested by: pho
MFC after: 1 week


207796 08-May-2010 alc

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


207736 07-May-2010 mckusick

Merger of the quota64 project into head.

This joint work of Dag-Erling Smørgrav and myself updates the
FFS quota system to support both traditional 32-bit and new 64-bit
quotas (for those of you who want to put 2+Tb quotas on your users).

By default quotas are not compiled into the kernel. To include them
in your kernel configuration you need to specify:

options QUOTA # Enable FFS quotas

If you are already running with the current 32-bit quotas, they
should continue to work just as they have in the past. If you
wish to convert to using 64-bit quotas, use `quotacheck -c 64';
if you wish to revert from 64-bit quotas back to 32-bit quotas,
use `quotacheck -c 32'.

There is a new library of functions to simplify the use of the
quota system, do `man quotafile' for details. If your application
is currently using the quotactl(2), it is highly recommended that
you convert your application to use the quotafile interface.
Note that existing binaries will continue to work.

Special thanks to John Kozubik of rsync.net for getting me
interested in pursuing 64-bit quota support and for funding
part of my development time on this project.


207702 06-May-2010 alc

Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free(). Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.


207676 05-May-2010 kib

Add definitions for Intel AESNI CPUID bits and print the capabilities
on boot.

Hardware provided by: Sentex Communications
MFC after: 1 week


207673 05-May-2010 joel

Switch to our preferred 2-clause BSD license.

Approved by: kmacy


207570 03-May-2010 kib

Style and comment adjustements.

Suggested and reviewed by: bde
MFC after: 3 days


207463 01-May-2010 kib

Remove debugging code that was not used once since commit.

Suggested by: bde
MFC after: 1 week


207410 30-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


207329 28-Apr-2010 attilio

- Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by: Sandvine Incorporated
PR: threads/116181
Reviewed by: emaste, marcel
MFC after: 3 weeks


207269 27-Apr-2010 kib

Style: use #define<TAB> instead of #define<SPACE>.

Noted by: bde, pluknet gmail com
MFC after: 11 days


207213 26-Apr-2010 kmacy

missed pv access before pmap lock


207210 25-Apr-2010 kmacy

Incremental reduction of delta with head_page_lock_2 branch

- replace modification of pmap resident_count with pmap_resident_count_{inc,dec}
- the pv list is protected by the pmap lock, but in several cases we are relying
on the vm page queue mutex, move pv_va read under the pmap lock


207207 25-Apr-2010 thompsa

Set USB_DEBUG like the other platforms, I had turned it off to test the build
before committing r207077.

Spotted by: marius


207205 25-Apr-2010 alc

Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation. Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection. This could, by itself, be
fixed, but I don't see the point given the above argument.

Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.


207161 24-Apr-2010 kmacy

apply style(9) changes applied to head_page_lock_2

requested by: kib@


207155 24-Apr-2010 alc

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


207152 24-Apr-2010 kib

Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by: pluknet
Reviewed by: imp, jhb, nwhitehorn
MFC after: 2 weeks


207081 22-Apr-2010 jkim

If a conditional jump instruction has the same jt and jf, do not perform
the test and jump unconditionally.


207077 22-Apr-2010 thompsa

Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had
the illusion of a tunable setting but was always turned on regardless.

MFC after: 1 week


206992 21-Apr-2010 kib

As was done in r155238 for i386 and in r155239 for amd64, clear the carry
flag for ia32 binary executed on amd64 host in get_mcontext().

PR: kern/92110 (one more time)
Reported by: stas
MFC after: 1 week


206901 20-Apr-2010 rpaulo

Rename the cyclic global variable lapic_cyclic_clock_func to just
cyclic_clock_func. This will make more sense when we start developing non
x86 cyclic version.


206625 14-Apr-2010 yongari

Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet.
This driver was written by Alexander Pohoyda and greatly enhanced
by Nikolay Denev. I don't have these hardwares but this driver was
tested by Nikolay Denev and xclin.

Because SiS didn't release data sheet for this controller, programming
information came from Linux driver and OpenSolaris. Unlike other open
source driver for SiS190/191, sge(4) takes full advantage of TX/RX
checksum offloading and does not require additional copy operation in
RX handler.
The controller seems to have advanced offloading features like VLAN
hardware tag insertion/stripping, TCP segmentation offload(TSO) as
well as jumbo frame support but these features are not available
yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw>
who sent fix for receiving VLAN oversized frames.


206623 14-Apr-2010 kib

ld_gs_base is executing with stack containing only the frame,
temporary pushed %rflags has been popped already.

Pointy hat to: kib
MFC after: 3 days


206553 13-Apr-2010 kib

Change printf() calls to uprintf() for sigreturn() and trap() complaints
about inacessible or wrong mcontext, and for dreaded "kernel trap with
interrupts disabled" situation. The later is changed when trap is
generated from user mode (shall never be ?).

Normalize the messages to include both pid and thread name.

MFC after: 1 week


206459 10-Apr-2010 kib

Handle a case when non-canonical address is loaded into the fsbase or
gsbase MSR.

MFC after: 3 days


206089 02-Apr-2010 fabient

- Support for uncore counting events: one fixed PMC with the uncore
domain clock, 8 programmable PMC.
- Westmere based CPU (Xeon 5600, Corei7 980X) support.
- New man pages with events list for core and uncore.
- Updated Corei7 events with Intel 253669-033US December 2009 doc.
There is some removed events in the documentation, they have been
kept in the code but documented in the man page as obsolete.
- Offcore response events can be setup with rsp token.

Sponsored by: NETASQ


205851 29-Mar-2010 jhb

Add a handler for the local APIC error interrupt. For now it just prints
out the current value of the local APIC error register when the interrupt
fires.

MFC after: 1 week


205850 29-Mar-2010 jhb

Cosmetic tweak to use a type suffix instead of a cast to force a constant
to be a long.


205792 28-Mar-2010 ed

Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.

A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.


205778 28-Mar-2010 alc

Correctly handle preemption of pmap_update_pde_invalidate().

X-MFC after: r205573


205642 25-Mar-2010 nwhitehorn

Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by: jhb


205448 22-Mar-2010 jhb

Remove unneeded type specifiers from 64-bit constants. The compiler
infers their natural type from the constants' values.

Submitted by: bde
MFC after: 3 days


205403 21-Mar-2010 alc

Eliminate a pointless TLB invalidation from pmap_bootstrap(). No mappings
whatsoever are changed between the earlier load_cr3() and this invalidation.


205402 21-Mar-2010 alc

I am told by AMD that the machine check hardware on the instruction TLB
won't generate bogus exceptions. Therefore, the implementation of the
"unofficial" workaround needn't mask L1TP errors by the instruction cache
unit.


205334 19-Mar-2010 avg

pmap amd64/i386: fix a typo in a comment

MFC after: 3 days


205332 19-Mar-2010 jhb

Use the same policy for rejecting / not-reject ACPI tables with incorrect
checksums as the base acpi(4) driver. This fixes a problem where the MADT
parser would reject the MADT table during early boot causing the MP Table
to be, but then the acpi(4) driver would attach and use non-SMP interrupt
routing.

Tested by: Alastair Hogge agh of coolrhaug com
MFC after: 1 week


205214 16-Mar-2010 jhb

- Extend the machine check record structure to include several fields useful
for parsing model-specific and other fields in machine check events
including the global machine check capabilities and status registers,
CPU identification, and the FreeBSD CPU ID.
- Report these added fields in the console log of a machine check so that
a record structure can be reconstituted from the console messages.
- Parse new architectural errors including memory controller errors.

MFC after: 1 week


205116 13-Mar-2010 ed

Remove COMPAT_43TTY from stock kernel configuration files.

COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.


205063 12-Mar-2010 jhb

Fix the previous attempt to fix kernel builds of HEAD on 7.x. Use the
__gnu_inline__ attribute for PMAP_INLINE when using the 7.x compiler to
match what 7.x uses for PMAP_INLINE.


205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


205013 11-Mar-2010 jhb

Print out the family and model from the cpu_id. This is especially useful
given the advent of the extended family and extended model fields. The
values are printed in hex to match their common usage in documentation.

Submitted by: Alexander Best
MFC after: 1 week


204957 10-Mar-2010 kib

Fall back to wbinvd when region for CLFLUSH is >= 2MB.

Submitted by: Kevin Day <toasty dragondata com>
Reviewed by: jhb
MFC after: 2 weeks


204913 09-Mar-2010 jhb

Now that the workaround for the AMD 10h CPUs is in place, re-enable machine
checks by default on amd64.

Discussed with: alc


204907 09-Mar-2010 alc

Implement AMD's recommended workaround for Erratum 383 on Family 10h
processors. With this workaround, superpage promotion can be re-enabled
under virtualization. Moreover, machine check exceptions can safely be
enabled when FreeBSD is running natively on Family 10h processors.

Most of the credit should go to Andriy Gapon for diagnosing the error and
working with Borislav Petkov at AMD to document it. Andriy also reviewed
and tested my patches.

Discussed with: jhb
MFC after: 3 weeks


204646 03-Mar-2010 joel

The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from: NetBSD


204641 03-Mar-2010 attilio

Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.
Right now, some atrtc seems reporting strange diagnostic error* making the
current pattern bogus.

In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.

Sponsored by: Sandvine Incorporated
Tested by: yongari, Richard Todd
<rmtodd at ichotolot dot servalan dot com>
MFC: 3 weeks
X-MFC: r202387, 204309


204518 01-Mar-2010 jhb

Print the contents of the miscellaneous (MISC) register to the console if
it is valid along with the other register values when a machine check is
encountered.

MFC after: 1 week


204420 27-Feb-2010 alc

When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by: jhb, kib
MFC after: 3 days


204309 25-Feb-2010 attilio

Introduce the new kernel sub-tree x86 which should contain all the code
shared and generalized between our current amd64, i386 and pc98.

This is just an initial step that should lead to a more complete effort.
For the moment, a very simple porting of cpufreq modules, BIOS calls and
the whole MD specific ISA bus part is added to the sub-tree but ideally
a lot of code might be added and more shared support should grow.

Sponsored by: Sandvine Incorporated
Reviewed by: emaste, kib, jhb, imp
Discussed on: arch
MFC: 3 weeks


204214 22-Feb-2010 gibbs

Enforce stronger semantics for bus-dma alignment (currently only on amd64).
Now all contiguous regions returned from bus-dma will be aligned to the
alignment constraint and all but the last region are guaranteed to be
a multiple of the alignment in length. This also means that the relative
alignment of two adjacent bytes in the I/O stream have a difference of 1
even if they are not physically contiguous.

The old code, when needing to perform a copy in order to align data, only
copied the amount of data needed to reach the next page boundary. This
often left an unaligned end to the segment. Drivers such as Xen's blkfront
can't deal with such segments.

The downside to this approach is that, once an unaligned region is encountered,
the remainder of the I/O will be bounced. However, bouncing should be rare.
It is typically caused by non-performance critical userland programs that
don't bother to align their I/O buffers (e.g. bsdlabel). In-kernel I/O
buffers are always aligned to at least a page boundary.

Reviewed by: scottl
MFC after: 2 weeks


204161 21-Feb-2010 alc

Since create_pagetables() zeroes the page tables, pmap_bootstrap() needn't
zero *CMAP1.


204120 20-Feb-2010 ed

Remove redundant inclusion of <sys/cdefs.h>.

In my previous commit I should have moved the inclusion to the top,
instead of adding a second one.


204118 20-Feb-2010 ed

Add <sys/cdefs.h>.

This header file uses __packed, without including <sys/cdefs.h>. This
means it cannot be used in the way described in sysarch(3) by only
including <machine/sysarch.h>.


204041 18-Feb-2010 ed

Allow the pmap code to be built with GCC from FreeBSD 7 again.

This patch basically gives us the best of both worlds. Instead of
forcing the compiler to emulate GNU-style inline semantics even though
we're using ISO C99, it will only use GNU-style inlining when the
compiler is configured that way (__GNUC_GNU_INLINE__).

Tested by: jhb


203938 15-Feb-2010 attilio

Adjust style (following the already existing rules) for the newly
introduced option DEADLKRES.

Reported by: danfe, julian, avg


203758 10-Feb-2010 attilio

Add the options DEADLKRES (introducing the deadlock resolver thread) in
the 'debugging' section of any HEAD kernel and enable for the mainstream
ones, excluding the embedded architectures.
It may, of course, enabled on a case-by-case basis.

Sponsored by: Sandvine Incorporated
Requested by: emaste
Discussed with: kib


203691 08-Feb-2010 brucec

Update documentation for the iwn and iwnfw drivers: they support the 1000, 5150, 6000 and 6050 devices too, with firmware modules for the 4965, 1000, 5000, 5150 and 6000.

Add documentation for mwl and all the wireless firmware drivers.

Approved by: rrs (mentor)


203367 02-Feb-2010 rnoland

Enable MTRR on all VIA CPUs that claim support (amd64).

This is the amd64 part of r203289.

Noticed by: jhb
MFC after: 2 weeks


203288 31-Jan-2010 rnoland

Welcome drm support for VIA unichrome chips.

MFC after: 2 weeks


203160 29-Jan-2010 avg

add static qualifier to definition of a function already declared static

This is for improving code readibility only.

MFC after: 1 week


202919 24-Jan-2010 trasz

Fix array overflow. This routine is only called from procfs,
which is not mounted by default, and I've been unable to trigger
a panic without this fix applied anyway.

Reviewed by: kib, cperciva


202897 23-Jan-2010 alc

Simplify the mapping of the system message buffer. Use the direct map just
like ia64 does.


202882 23-Jan-2010 kib

For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number
and arguments after ptracestop(), if debugger modified anything in the
process environment. Since procfs stopeven requires number of syscall
arguments in p_xstat, this cannot be solved by moving stop/trace point
before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reported by: Ali Polatel <alip exherbo org>
Briefly discussed with: jhb
MFC after: 3 weeks


202634 19-Jan-2010 jhb

Move the examples for the 'hints' and 'env' keywords from various GENERIC
kernel configs into NOTES.

Reviewed by: imp


202628 19-Jan-2010 ed

Recommit r193732:

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc

It was backed out, because it prevented us from building kernels using a
7.x compiler. Now that most people use 8.x, there is nothing that holds
us back. Even if people run 7.x, they should be able to build a kernel
if they run `make kernel-toolchain' or `make buildworld' first.


202387 15-Jan-2010 attilio

Handling all the three clocks (hardclock, softclock, profclock) with the
LAPIC may lead to aliasing for softclock and profclock because frequencies
are sized in order to fit mainly hardclock.
atrtc used to take care of the softclock and profclock and it does still
do, if the LAPIC can't handle the clocks properly.

Revert the change when the LAPIC started taking charge of all three of
them and let atrtc handle softclock and profclock if not explicitly
requested. Such request can be made setting != 0 the new tunable
machdep.lapic_allclocks or if the new device ATPIC is not present
within the i386 kernel config (atrtc is linked to atpic presence).

Diagnosed by: Sandvine Incorporated
Reviewed by: jhb, emaste
Sponsored by: Sandvine Incorporated
MFC: 3 weeks


202286 14-Jan-2010 jhb

Update the ident for the XENHVM kernel config to match the filename.

MFC after: 1 week


202161 12-Jan-2010 gavin

Spell "Hz" correctly wherever it is user-visible.

PR: bin/142566
Submitted by: N.J. Mann njm njm.me.uk
Approved by: ed (mentor)
MFC after: 2 weeks


202097 11-Jan-2010 marcel

Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after: 1 week


202085 11-Jan-2010 alc

Simplify pmap_init(). Additionally, correct a harmless misbehavior on i386.
Specifically, where locore had created large page mappings for the kernel,
the wrong vm page array entries were being initialized. The vm page array
entries for the pages containing the kernel were being initialized instead
of the vm page array entries for page table pages.

MFC after: 1 week


202047 10-Jan-2010 alc

Eliminate unused declarations.


202019 10-Jan-2010 imp

Add INCLUDE_CONFIG_FILE in GENERIC on all non-embedded platforms.

# This is the resolution of removing it from DEFAULTS...

MFC after: 5 days


201890 09-Jan-2010 kib

Set md_ldt (pointer to the LDT) after md_ldt_sd (system segment
descriptor for the LDT) is populated. md_ldt is used by context-switch
code as indicator that LDT segment register shall be loaded with
GUSERLDT segment instead of 0, so context switch at the wrong time may
cause attempt to load non-populated descriptor.

Use store with the barrier to prevent other CPUs from seeing updated
md_ldt but not seeing updated md_ldt_sd. Multithreaded process may
context-switch to another thread of the process on another CPU and read
md_ldt.

MFC after: 1 week


201813 08-Jan-2010 bz

In sys/<arch>/conf/Makefile set TARGET to <arch>. That allows
sys/conf/makeLINT.mk to only do certain things for certain
architectures.

Note that neither arm nor mips have the Makefile there, thus
essentially not (yet) supporting LINT. This would enable them
do add special treatment to sys/conf/makeLINT.mk as well chosing
one of the many configurations as LINT.

This is a hack of doing this and keeping it in a separate commit
will allow us to more easily identify and back it out.

Discussed on/with: arch, jhb (as part of the LINT-VIMAGE thread)
MFC after: 1 month


201534 04-Jan-2010 imp

Revert 200594. This file isn't intended for these sorts of things.


201443 03-Jan-2010 brooks

Add vlan(4) to all GENERIC kernels.

MFC after: 1 week


201369 01-Jan-2010 obrien

Quiet variable "shadows" warning:
sys/vmmeter.h: warning: shadowed declaration is here
machine/cpufunc.h: In function 'insw':
machine/cpufunc.h: warning: declaration of 'cnt' shadows a global declaration
..snip..


201223 29-Dec-2009 rnoland

Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.

This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by: jhb@
MFC after: Not in this lifetime...


200670 18-Dec-2009 jhb

- Create a separate section in in the MI NOTES file for PCI wireless NIC
drivers and move bwi(4) there from the PCI Ethernet NIC section.
- Move ath(4) and ral(4) to the MI NOTES file.

Reviewed by: rpaulo


200594 16-Dec-2009 dougb

Add INCLUDE_CONFIG_FILE, and a note in comments about how to also
include the comments with CONFIGARGS


200444 12-Dec-2009 kib

For ia32 syscall(), call cpu_set_syscall_retval(). Update comment inside
cpu_set_syscall_retval() accordingly.

MFC after: 1 week


200280 08-Dec-2009 jkim

Simplify a macro not to generate unncessary symbols.


200064 03-Dec-2009 avg

mca: small enhancements related to cpu quirks

- use utility macros for CPU family/model checking
- limit Intel P6 quirk to pre-Nehalem models (taken from OpenSolaris)
- add AMD GartTblWkEn quirk for families 0Fh and 10h; I haven't experienced
any problems without the quirk but both Linux and OpenSolaris do this
- slightly re-arrange quirk code to provide for the future generalization
and separation of vendor-specific quirk functions

Reviewed by: jhb
MFC after: 1 week


200033 02-Dec-2009 avg

mca: improve status checking, recording and reporting

- directly print mca information in case we fail to allocate memory
for a record
- include bank number into mca record
- print raw mca status value for extended information

Reviewed by: jhb
MFC after: 10 days


199969 30-Nov-2009 avg

amdsbwd: new driver for AMD SB600/SB7xx watchdog timer

The hardware is compliant with WDRT specification, so I originally
considered including generic WDRT watchdog support, but decided
against it, because I couldn't find anyone to the code for me.
WDRT seems to be not very popular.
Besides, generic WDRT porbably requires a slightly different driver
approach.

Reviewed by: des, gavin, rpaulo
MFC after: 3 weeks


199968 30-Nov-2009 avg

x86 cpu features: add MOVBE reporting and flag

The check is glimpsed from Linux and OpenSolaris.
MOVBE instruction is found in Intel Atom processors.


199868 27-Nov-2009 alc

Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY. The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with: kib


199721 23-Nov-2009 jkim

- Add more aggressive BPF JIT optimization. This is in more favor of i386
while the previous commit was more amd64-centric.
- Use calloc(3) instead of malloc(3)/memset(3) in user land[1].

Submitted by: ed[1]


199619 21-Nov-2009 jkim

Add an experimental and rudimentary JIT optimizer to reduce unncessary
overhead from short BPF filter programs such as "get the first 96 bytes".


199615 20-Nov-2009 jkim

General style cleanup, no functional change.


199603 20-Nov-2009 jkim

- Allocate scratch memory on stack instead of pre-allocating it with
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9). It has no
performance benefit over malloc(9)/free(9)[2].

Requested by: rwatson[1]
Pointed out by: rwatson, jhb, alc[2]


199531 19-Nov-2009 jkim

Fix tinderbox build for i386 and sync amd64 with it.


199498 18-Nov-2009 jkim

- Change internal function bpf_jit_compile() to return allocated size of
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.


199492 18-Nov-2009 jkim

- Make BPF JIT compiler working again in userland. We are limiting size of
generated native binary to page size for now.
- Update copyright date and fix some style nits.


199319 16-Nov-2009 phk

Uppercase the UL suffix on a constant, so Flexelint doesn't worry that
'u1' might have been intended. No, that does not make sense and yes
I have told them.


199253 13-Nov-2009 kib

Amd64 init_secondary() calls initializecpu() while curthread is still
not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to
initializecpu() that results in hang because AP are started when kernel
environment is already dynamic and thus needs to acquire mutex, that is
too early in AP start sequence to work.

Extract the code that should be executed only once, because it sets
up global variables, from initializecpu() to initializecpucache(),
and call the later only from hammer_time() executed on BSP. Now,
TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage.

In collaboration with: Mykola Dzham <freebsd levsha org ua>
Reviewed by: jhb
Tested by: ed, battlez


199215 12-Nov-2009 kuriyama

- Style nits.
- Remove unneeded TUNABLE_INT().

Suggested by: avg, kib


199184 11-Nov-2009 avg

reflect that pg_ps_enabled is a tunable, not just a read-only sysctl

Nod from: jhb


199135 10-Nov-2009 kib

Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by: marcel
Reviewed by: marcel, davidxu
PowerPC, ARM, ia64 changes: marcel
Sparc64 tested and reviewed by: marius, also sunv reviewed
MIPS tested by: gonzo
MFC after: 1 month


199104 09-Nov-2009 rdivacky

Make isa_dma functions MPSAFE by introducing its own private lock. These
functions are selfcontained (ie. they touch only isa_dma.c static variables
and hardware) so a private lock is sufficient to prevent races. This changes
only i386/amd64 while there are also isa_dma functions for ia64/sparc64.
Sparc64 are ones empty stubs and ia64 ones are unused as ia64 does not
have isa (says marcel).

This patch removes explicit locking of Giant from a few drivers (there
are some that requires this but lack ones - this patch fixes this) and
also removes the need for implicit locking of Giant from attach routines
where it's provided by newbus.

Approved by: ed (mentor, implicit)
Reviewed by: jhb, attilio (glanced by)
Tested by: Giovanni Trematerra <giovanni.trematerra gmail com>
IA64 clue: marcel


199067 09-Nov-2009 kuriyama

- Add hw.clflush_disable loader tunable to avoid panic (trap 9) at
map_invalidate_cache_range() even if CPU is not Intel.
- This tunable can be set to -1 (default), 0 and 1. -1 is same as
current behavior, which automatically disable CLFLUSH on Intel CPUs
without CPUID_SS (should be occured on Xen only). You can specify 1
when this panic happened on non-Intel CPUs (such as AMD's). Because
disabling CLFLUSH may reduce performance, you can try with setting 0
on Intel CPUs without SS to use CLFLUSH feature.

Reviewed by: kib
Reported by: karl, kuriyama
Related to: kern/138863


198950 05-Nov-2009 attilio

Strip from messages for users external URLs the project cannot directly
control.

Requested by: kib, rwatson


198931 04-Nov-2009 jkim

Tweak memory allocation for amd64 suspend/resume CPU context.


198868 04-Nov-2009 attilio

Opteron rev E family of processor expose a bug where, in very rare
ocassions, memory barriers semantic is not honoured by the hardware
itself. As a result, some random breakage can happen in uninvestigable
ways (for further explanation see at the content of the commit itself).

As long as just a specific familly is bugged of an entire architecture
is broken, a complete fix-up is impratical without harming to some
extents the other correct cases.
Considering that (and considering the frequency of the bug exposure)
just print out a warning message if the affected machine is identified.

Pointed out by: Samy Al Bahra <sbahra at repnop dot org>
Help on wordings by: jeff
MFC: 3 days


198554 28-Oct-2009 jhb

Fix some problems with effective mmap() offsets > 32 bits. This was
partially fixed on amd64 earlier. Rather than forcing linux_mmap_common()
to use a 32-bit offset, have it accept a 64-bit file offset. This offset
is then passed to the real mmap() call. Rather than inventing a structure
to hold the normal linux_mmap args that has a 64-bit offset, just pass
each of the arguments individually to linux_mmap_common() since that more
closes matches the existing style of various kern_foo() functions.

Submitted by: Christian Zander @ Nvidia
MFC after: 1 week


198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198422 23-Oct-2009 jkim

Try hiding annoying text cursor after the video controller is reset.


198341 21-Oct-2009 marcel

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


198170 16-Oct-2009 kib

Move intr_describe() out of #ifdef SMP; the function is always required.

Reviewed by: jhb


198134 15-Oct-2009 jhb

Add a facility for associating optional descriptions with active interrupt
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
the nexus(4) driver supply a custom bus_describe_intr method that invokes
a new intr_describe() MD routine which in turn looks up the associated
interrupt event and invokes intr_event_describe_handler().

Requested by: many
Reviewed by: scottl
MFC after: 2 weeks


198043 13-Oct-2009 jhb

Move the USB wireless drivers down into their own section next to the USB
ethernet drivers.

Submitted by: Glen Barber glen.j.barber @ gmail
MFC after: 1 month


197933 10-Oct-2009 kib

Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with: bz
Reviewed by: kan
Tested by: bz (i386, amd64), bsam (linux)
MFC after: some time


197910 09-Oct-2009 attilio

atomic_cmpset_barr_* was added in order to cope with compilers willing to
specify their own version of atomic_cmpset_* which could have been
different than the membar version.

Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.

In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.

Requested by: jhb
Reviewed by: jhb
Tested by: Giovanni Trematerra <giovanni dot trematerra at
gmail dot com>


197863 08-Oct-2009 jkim

Clean up amd64 suspend/resume code.

- Allocate memory for wakeup code after ACPI bus is attached. The early
memory allocation hack was inherited from i386 but amd64 does not need it.
- Exclude real mode IVT and BDA explicitly. Improve comments about memory
allocation and reason for the exclusions. It is a no-op in reality, though.
- Remove an unnecessary CLD from wakeup code and re-align.


197824 06-Oct-2009 attilio

- All the functions in atomic.h needs to be in "physical" form (like
not defined through macros or similar) in order to be later compiled in
the kernel and offer this way the support for modules (and
compatibility among the UP case and SMP case).
Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
and specifying a template. Note that the new DEFINE_CMPSET_GEN()
template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.

[1] Reported by: Paul B. Mahol <onemda at gmail dot com>


197803 06-Oct-2009 attilio

Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used. GCC, however, does that aggressively, even in
presence of volatile operands. The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).

Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.

Reported by: jhb
Reviewed by: jhb
Tested by: rdivacky, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>


197729 03-Oct-2009 bz

Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by: kib
MFC after: 1 month


197663 01-Oct-2009 kib

As a workaround, for Intel CPUs, do not use CLFLUSH in
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.

XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.

Tested by: csjp
Reviewed by: alc
MFC after: 3 days


197653 01-Oct-2009 rpaulo

Improve 802.11s comment.

Spotted by: dougb
MFC after: 1 day


197647 30-Sep-2009 avg

cpufunc.h: unify/correct style of c extension names

i386 and amd64 archs only.
inline => __inline. [1]
__asm__ => __asm. [2]

Reviewed by: kib, jhb [1]
Suggested by: kib [2]
MFC after: 1 week


197580 28-Sep-2009 alc

Temporarily disable the use of 1GB page mappings by the direct map. There
are currently two problems with the use of 1GB page mappings by the direct
map. First, at least one device driver uses pmap_extract() rather than
DMAP_TO_PHYS() to translate a direct map address to a physical address.
Unfortunately, neither pmap_extract() nor pmap_kextract() yet support 1GB
page mappings. Second, pmap_bootstrap() needs to interrogate the MTRRs to
ensure that a 1GB page mapping doesn't span two MTRRs of different types.

Reported and tested by: Daniel O'Connor
MFC after: 3 days


197536 27-Sep-2009 jkim

Copy apm(4) emulation from sys/i386/acpica/acpi_machdep.c and
install apm(8) and apm_bios.h on amd64.


197518 26-Sep-2009 bz

lindev(4) [1] is supposed to be a collection of linux-specific pseudo
devices that we also support, just not by default (thus only LINT or
module builds by default).

While currently there is only "/dev/full" [2], we are planning to see more
in the future. We may decide to change the module/dependency logic in the
future should the list grow too long.

This is not part of linux.ko as also non-linux binaries like kFreeBSD
userland or ports can make use of this as well.

Suggested by: rwatson [1] (name)
Submitted by: ed [2]
Discussed with: markm, ed, rwatson, kib (weeks ago)
Reviewed by: rwatson, brueffer (prev. version)
PR: kern/68961
MFC after: 6 weeks


197455 24-Sep-2009 emaste

Add a backtrace to the "fpudna in kernel mode!" case, to help track down
where this comes from.

Reviewed by: bde


197450 24-Sep-2009 avg

number of cleanups in i386 and amd64 pci md code

o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
unsigned before comparison with maximum value to cut off negative
values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
values are already checked in the subsequent switch

Reviewed by: jhb
MFC after: 1 week


197439 23-Sep-2009 jhb

Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
mapping the RSDT or XSDT and searching for a table with a given signature
out into acpica_machdep.c for both amd64 and i386.


197410 22-Sep-2009 jhb

- Split the logic to parse an SMAP entry out into a separate function on
amd64 similar to i386. This fixes a bug on amd64 where overlapping
entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
instead of an append to support systems with out-of-order SMAP entries.

PR: amd64/138220
Reported by: James R. Van Artsdalen james of jrv org
MFC after: 3 days


197397 22-Sep-2009 delphij

Build x86bios only for i386/amd64 for now. More work is required
to make these functional on other architectures, and the current
code breaks sparc64 and powerpc.

Spotted by: tinderbox via des


197389 21-Sep-2009 kib

If CPU happens to be in usermode when a T_RESERVED trap occured,
then trapsignal is called with ksi.ksi_signo = 0. For debugging kernels,
that should end up in panic, for non-debugging kernels behaviour is
undefined.

Do panic regardeless of execution mode at the moment of trap.

Reviewed by: jhb
MFC after: 1 month


197380 21-Sep-2009 delphij

Automatically depend on x86emu when vesa or dpms is being built into
kernel. With this change the user no longer need to remember building
this option.

Submitted by: swell.k at gmail.com


197379 21-Sep-2009 delphij

Enable s3pci on amd64 which works on top of VESA, and allow
static building it into kernel on i386 and amd64.

Submitted by: swell.k at gmail.com


197317 18-Sep-2009 alc

When superpages are enabled, add the 2 or 4MB page size to the array of
supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


197316 18-Sep-2009 alc

Add a new sysctl for reporting all of the supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


197070 10-Sep-2009 jkim

Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.


197064 10-Sep-2009 des

As jhb@ pointed out to me, r197057 was incorrect, not least because these
are generated files.


197025 09-Sep-2009 delphij

- Teach vesa(4) and dpms(4) about x86emu. [1]
- Add vesa kernel options for amd64.
- Connect libvgl library and splash kernel modules to amd64 build.
- Connect manual page dpms(4) to amd64 build.
- Remove old vesa/dpms files.

Submitted by: paradox <ddkprog yahoo com> [1], swell k at gmail.com
(with some minor tweaks)


196994 08-Sep-2009 phk

Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.


196968 08-Sep-2009 phk

Move multi-include protection back up to the top of the file and
name after the physical file rather than the aliased name.


196771 02-Sep-2009 jkim

Fix confusing comments about default PAT entries.


196769 02-Sep-2009 jkim

- Work around ACPI mode transition problem for recent NVIDIA 9400M chipset
based Intel Macs. Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason. For these platforms, we just
use the old PAT layout. Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.

Reported by: rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by: jhb


196745 02-Sep-2009 jhb

Don't attempt to bind the current thread to the CPU an IRQ is bound to
when removing an interrupt handler from an IRQ during shutdown. During
shutdown we are already bound to CPU 0 and this was triggering a panic.

MFC after: 3 days


196707 31-Aug-2009 jhb

Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.

Discussed with: alc
MFC after: 3 days


196653 30-Aug-2009 bz

Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.

Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.

The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo was matched first, as the fallback to ld-elf32.so.1 does
not exist in that case.

Reported and tested by: ticso
In collaboration with: kib
MFC after: 3 days


196643 29-Aug-2009 rnoland

Swap the start/end virtual addresses in pmap_invalidate_cache_range().

This fixes the functionality on non SelfSnoop hardware.

Found by: rnoland
Submitted by: alc
Reviewed by: kib
MFC after: 3 days


196512 24-Aug-2009 bz

Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
MFC after: 3 days


196412 20-Aug-2009 jkim

Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.

Tested by: bz
Approved by: re (kib)


196390 19-Aug-2009 ed

Make the MacBookPro3,1 hardware boot again.

Tested by: Patrick Lamaiziere <patfbsd davenulle org>
Approved by: re (kib)


196318 17-Aug-2009 kib

Correct a critical accounting error in pmap_demote_pde(). Specifically,
when pmap_demote_pde() allocates a page table page to implement a
user-space demotion, it must increment the pmap's resident page count.
Not doing so, can lead to an underflow during address space termination
that causes pmap_remove() to exit prematurely, before it has destroyed
all of the mappings within the specified range. The ultimate effect or
symptom of this error is an assertion failure in vm_page_free_toq()
because the page being freed is still mapped.

This error is only possible when superpage promotion is enabled. Thus,
it only affects FreeBSD versions greater than 7.2.

Tested by: pho, alc
Reviewed by: alc
Approved by: re (rwatson)
MFC after: 1 week


196224 14-Aug-2009 jhb

Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
routines in the local APIC code that the hwpmc(4) driver can use to
manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly
enables the interrupt when it is succesfully initialized and disables
the interrupt when it is unloaded. This avoids enabling the interrupt
on unsupported CPUs which may result in spurious NMIs.

Reported by: rnoland
Reviewed by: jkoshy
Approved by: re (kib)
MFC after: 2 weeks


196196 13-Aug-2009 attilio

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


196033 02-Aug-2009 ed

Make the MacBook3,1 boot again.

Approved by: re (kib)


195907 27-Jul-2009 rpaulo

Refine the MacBook hack to only match early models that have Intel ICH.

Discussed with: kjim
Approved by: re (kib)


195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


195820 22-Jul-2009 kib

When the page caching attributes are changed, after new mapping is
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.

Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.

Proposed and reviewed by: alc
Approved by: re (kensmith)


195774 19-Jul-2009 alc

Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386. Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages. Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache. Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address. For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it. It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by: re (kib)


195749 18-Jul-2009 alc

An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by: re (kib)


195666 13-Jul-2009 jkim

Match PCI Express root bridge _HID directly instead of
relying on _CID.

Reviewed by: jhb
Approved by: re (kib)


195649 12-Jul-2009 alc

Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.

vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by: re (kib)


195618 11-Jul-2009 rpaulo

Implementation of the upcoming Wireless Mesh standard, 802.11s, on the
net80211 wireless stack. This work is based on the March 2009 D3.0 draft
standard. This standard is expected to become final next year.
This includes two main net80211 modules, ieee80211_mesh.c
which deals with peer link management, link metric calculation,
routing table control and mesh configuration and ieee80211_hwmp.c
which deals with the actually routing process on the mesh network.
HWMP is the mandatory routing protocol on by the mesh standard, but
others, such as RA-OLSR, can be implemented.

Authentication and encryption are not implemented.

There are several scripts under tools/tools/net80211/scripts that can be
used to test different mesh network topologies and they also teach you
how to setup a mesh vap (for the impatient: ifconfig wlan0 create
wlandev ... wlanmode mesh).

A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled
by default on GENERIC kernels for i386, amd64, sparc64 and pc98.

Drivers that support mesh networks right now are: ath, ral and mwl.

More information at: http://wiki.freebsd.org/WifiMesh

Please note that this work is experimental. Also, please note that
bridging a mesh vap with another network interface is not yet supported.

Many thanks to the FreeBSD Foundation for sponsoring this project and to
Sam Leffler for his support.
Also, I would like to thank Gateworks Corporation for sending me a
Cambria board which was used during the development of this project.

Reviewed by: sam
Approved by: re (kensmith)
Obtained from: projects/mesh11s


195535 10-Jul-2009 kib

When amd64 CPU cannot load segment descriptor during trap return to
usermode, it generates GPF, that is mirrored to user mode as SIGSEGV.
The offending register in mcontext should contain the value loading of
which generated the GPF, and it is so on i386. On amd64, we currently
report segment descriptor in tf_err, while segment register contains the
corrected value loaded by trap handler.

Fix the issue by behaving like i386, reloading segment register in trap
frame after signal frame is pushed onto user stack.

Noted and tested by: pho
Approved by: re (kensmith)


195486 09-Jul-2009 kib

Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by: jeff
Tested (and tested, and tested ...) by: pho
Approved by: re (kensmith)


195416 06-Jul-2009 alc

When pmap_change_attr() changes the PAT setting on a kernel mapping, it has
to simultaneously change the PAT setting for the same pages within the
direct map region. This may require the demotion of a 2MB page mapping and
the allocation of a page table page. This revision gives the highest
possible priority (VM_ALLOC_INTERRUPT) to this page allocation, so that
pmap_change_attr() is less likely to fail. (In general, kernel page table
page allocations have the highest priority, so this is not creating a new
precedent.)

(Demotion of 1GB page mappings within the direct map already specifies
VM_ALLOC_INTERRUPT to vm_page_alloc(), so only pmap_demote_pde() must be
changed.)

Approved by: re (kib)


195415 06-Jul-2009 jhb

After the per-CPU IDT changes, the IDT vector of an interrupt could change
when the interrupt was moved from one CPU to another. If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled. This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range. However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled. The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.

Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method. While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.

Approved by: re (kensmith)


195410 06-Jul-2009 jhb

MFi386: Add a 'show idt' command to DDB to display the non-default function
pointers in the interrupt descriptor table.

Approved by: re (kensmith)


195376 05-Jul-2009 sam

Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by: bde, imp, marcel
Approved by: re (kensmith)


195295 02-Jul-2009 ed

Enable POSIX semaphores on all non-embedded architectures by default.

More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.

Proposed by: miwi
Patch by: Florian Smeets <flo kasimir com>
Approved by: re (kib)


195249 01-Jul-2009 jhb

Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.

Approved by: re (kib)


195228 01-Jul-2009 dfr

Don't include rpcv2.h - it has been removed.

Submitted by: ed@
Approved by: re


195188 30-Jun-2009 avg

remove unused/unneeded extern declarations

This should result in no changes to compiled code.

Reviewed by: alc
Approved by: re (kib)
MFC after: 1 day


195105 27-Jun-2009 rwatson

Catch missed AUDIT_ARG() -> AUDIT_ARG_CMD() on amd64.

Submitted by: Florian Smeets <flo at kasimir.com>
Approved by: re (kib) (implicit)
MFC after: 1 week


195104 27-Jun-2009 rwatson

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


195060 26-Jun-2009 alc

Correct the #endif comment.

Noticed by: jmallett
Approved by: re (kib)


195033 26-Jun-2009 alc

This change is the next step in implementing the cache control functionality
required by video card drivers. Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures. In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with: jhb


195002 25-Jun-2009 jhb

Fix kernels compiled without SMP support. Make intr_next_cpu() available
for UP kernels but as a stub that always returns the single CPU's local
APIC ID.

Reported by: kib


194985 25-Jun-2009 jhb

- Restore the behavior of pre-allocating IDT vectors for MSI interrupts.
This is mostly important for the multiple MSI message case where the
IDT vectors for the entire group need to be allocated together. This
also restores the assumptions made by the PCI bus code that it could
invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
assigned to interrupt sources on activation. Instead of assigning the
CPU via pic_assign_cpu() before calling enable_intr(), allow the
different interrupt source drivers to ask the MD interrupt code which
CPU to use when they allocate an IDT vector. I/O APIC interrupt pins
do this in their pic_enable_intr() routines giving the same behavior as
before. MSI sources do it when the IDT vectors are allocated during
msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.

Tested by: rnoland


194889 24-Jun-2009 jhb

Whitespace fix.


194790 23-Jun-2009 mav

Make algorithm a bit more bulletproof.


194784 23-Jun-2009 jeff

Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.

Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas


194776 23-Jun-2009 mav

Fix variable name.


194772 23-Jun-2009 mav

Rework r193814:
While general idea of patch was good, it was not working properly due the way
it was implemented. When we are using same timer interrupt for several of
hard/prof/stat purposes we should not send several IPIs same time to other
CPUs. Sending several IPIs same time leads to terrible accounting/profiling
results due to strong synchronization effect, when the second interrupt
handler accounts processing of the first one.
Interlink timer events in a such way, that no more then one IPI is sent for
any original timer interrupt.


194611 22-Jun-2009 alc

Eliminate dead code. These definitions should have been deleted with the
introduction of i686_mem.c in r45405.

Merge adjacent #ifdef _KERNEL/#endif blocks.


194269 15-Jun-2009 ps

I have several machines where the following warning is printed:
warning: no time-of-day clock registered, system time will not be set accurately

Provide hints to atrtc on amd64 since it's not being described in
ACPI on some systems.

Reviewed by: jhb


194237 15-Jun-2009 mav

Forbid multi-vector MSI interrupt vectors migration to another CPU once
allocated. MSI have strict vectors allocation requirements, which are not
satisfied now during reallocation. This is not the best possible solution,
but better then just broken, as it was.

No objections: current@, arch@, jhb@


194209 14-Jun-2009 alc

Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt(). This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous. Unfortunately, this is not always the case. For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption. Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous. This
revision also changes some inconsistent if not buggy behavior. For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid. However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page. The amd64 version has a bug of
my own creation. It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance. I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms. However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity). These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with: jhb@


194204 14-Jun-2009 ed

Enable PRINTF_BUFR_SIZE on i386 and amd64 by default.

In the past there have been some reports of PRINTF_BUFR_SIZE not
functioning correctly. Instead of having garbled console messages, we
should just see whether the issues are still there and analyze them.

Approved by: re


193880 10-Jun-2009 yongari

Add alc(4), a driver for Atheros AR8131/AR8132 PCIe ethernet
controller. These controllers are also known as L1C(AR8131) and
L2C(AR8132) respectively. These controllers resembles the first
generation controller L1 but usage of different descriptor format
and new register mappings over L1 register space requires a new
driver. There are a couple of registers I still don't understand
but the driver seems to have no critical issues for performance and
stability. Currently alc(4) supports the following hardware
features.
o MSI
o TCP Segmentation offload
o Hardware VLAN tag insertion/stripping
o Tx/Rx interrupt moderation
o Hardware statistics counters(dev.alc.%d.stats)
o Jumbo frame
o WOL
AR8131/AR8132 also supports Tx checksum offloading but I disabled
it due to stability issues. I'm not sure this comes from broken
sample boards or hardware bugs. If you know your controller works
without problems you can still enable it. The controller has a
silicon bug for Rx checksum offloading, so the feature was not
implemented.
I'd like to say big thanks to Atheros. Atheros kindly sent sample
boards to me and answered several questions I had.

HW donated by: Atheros Communications, Inc.


193864 09-Jun-2009 kmacy

opt in to flowtable on i386/amd64


193855 09-Jun-2009 kmacy

remove flowtable from DEFAULTS


193819 09-Jun-2009 bz

Unbreak the build for amd64 after r193814 using correct variable names.


193814 09-Jun-2009 ariff

When using i8254 as the only kernel timer source:

- Interpolate stat/prof clock using clkintr() in a similar fashion to
local APIC timer, since statclock usually run slower.

- Liberate hardclockintr() from taking the burden of handling both stat
and prof clock interrupt. Instead, send IPIs within clkintr() to handle
those.


193804 09-Jun-2009 ariff

Move C1E workaround into its own idle function. Previous workaround works
only during initial booting process, while there are laptops/BIOSes that
tend to act 'smarter' by force enabling C1E if the main power adapter
being pulled out, rendering previous workaround ineffective. Given the
fact that we still rely on local APIC to drive timer interrupt, this
workaround should keep all Turion (probably Phenom too) X\d+ alive whether
its on battery power or not.

URL: http://lists.freebsd.org/pipermail/freebsd-acpi/2008-April/004858.html
http://lists.freebsd.org/pipermail/freebsd-acpi/2008-May/004888.html

Tested by: Peter Jeremy <peterjeremy at optushome d com d au>


193750 08-Jun-2009 jkim

Rewrite OsdSynch.c to reflect the latest ACPICA more closely:

- Implement ACPI semaphore (ACPI_SEMAPHORE) with condvar(9) and mutex(9).
- Implement ACPI mutex (ACPI_MUTEX) with mutex(9).
- Implement ACPI lock (ACPI_SPINLOCK) with spin mutex(9).


193734 08-Jun-2009 ed

Revert my change; reintroduce __gnu89_inline.

It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.

Requested by: rwatson


193732 08-Jun-2009 ed

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc


193729 08-Jun-2009 alc

Now that amd64's kernel map is 512GB (SVN rev 192216), there is no reason
to cap its buffer map at 1GB.

MFC after: 6 weeks


193535 05-Jun-2009 kib

Put intrcnt, eintrcnt, intrnames and eintrnames into the .data section.

Noted by: "Tseng, Kuo-Lang" <kuo-lang.tseng intel com>, bde
MFC after: 3 days


193530 05-Jun-2009 jkim

Import ACPICA 20090521.


193334 02-Jun-2009 rwatson

Remove MAC kernel config files and add "options MAC" to GENERIC, with the
goal of shipping 8.0 with MAC support in the default kernel. No policies
will be compiled in or enabled by default, but it will now be possible to
load them at boot or runtime without a kernel recompile.

While the framework is not believed to impose measurable overhead when no
policies are loaded (a result of optimization over the past few months in
HEAD), we'll continue to benchmark and optimize as the release approaches.
Please keep an eye out for performance or functionality regressions that
could be a result of this change.

Approved by: re (kensmith)
Obtained from: TrustedBSD Project


193264 01-Jun-2009 dchagin

Implement accept4 syscall.

Approved by: kib (mentor)
MFC after: 1 month


193235 01-Jun-2009 rwatson

Regenerate generated syscall files following changes to struct sysent in
r193234.


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


192440 20-May-2009 jhb

Don't bother reading the initial value of the machine check banks during
startup on Pentium 4 CPUs. This wasn't safe to do on APs during AP startup,
was of limited value, and won't be used for future processors.


192343 18-May-2009 jhb

- Add a tunable 'hw.mca.enabled' that can be used to enable/disable the
machine check code. Disable it by default for now.
- When computing the mask of bits that determines a non-restartable event
during a machine check exception, or-in the overflow flag rather than
replacing the other flags.

PR: i386/134586 [2]
Submitted by: Andi Kleen andi-fbsd firstfloor.org


192342 18-May-2009 jhb

Add a read-only sysctl hw.pci.mcfg to mirror the tunable by the same name.

MFC after: 1 week


192331 18-May-2009 jhb

Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend
using 128 byte alignment for locks. (See IA-32 SDM Vol 3A 7.11.6.7)


192323 18-May-2009 marcel

Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.


192227 16-May-2009 kmacy

correct range in comment
pointed out by alc


192224 16-May-2009 kmacy

update vm map comment

pointed out by Larry Rosenman


192216 16-May-2009 kmacy

Increase default kernel map to 512GB

I briefly discussed this with alc. It could lead to problems for greater than 64GB.
However, that seems unlikely in practice.


192206 16-May-2009 dchagin

Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and
SOCK_NONBLOCK flags, that allow to save fcntl() calls.

Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.

Approved by: kib (mentor)
MFC after: 1 month


192122 14-May-2009 jhb

Trim the default set of device hints on i386 and amd64:
- Remove vga0 and the disabled uart2/uart3 hints from both platforms.
- Remove hints for ISA adv0, bt0, aha0, aic0, ed0, cs0, sn0, ie0, fe0, and
le0 from i386. All these hints were marked 'disabled' and thus already
did not work "out of the box".

Discussed with: imp


192114 14-May-2009 attilio

FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now). Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by: jhb, kmacy
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


192050 13-May-2009 jhb

Implement simple machine check support for amd64 and i386.
- For CPUs that only support MCE (the machine check exception) but not MCA
(i.e. Pentium), all this does is print out the value of the machine check
registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
a bit more involved.
- First, there is limited support for decoding the CPU-independent MCA
error codes in the kernel, and the kernel uses this to output a short
description of any machine check events that occur.
- When a machine check exception occurs, all of the MCx banks on the
current CPU are scanned and any events are reported to the console
before panic'ing.
- To catch events for correctable errors, a periodic timer kicks off a
task which scans the MCx banks on all CPUs. The frequency of these
checks is controlled via the "hw.mca.interval" sysctl.
- Userland can request an immediate scan of the MCx banks by writing
a non-zero value to "hw.mca.force_scan".
- If any correctable events are encountered, the appropriate details
are stored in a 'struct mca_record' (defined in <machine/mca.h>).
The "hw.mca.count" is a count of such records and each record may
be queried via the "hw.mca.records" tree by specifying the record
index (0 .. count - 1) as the next name in the MIB similar to using
PIDs with the kern.proc.* sysctls. The idea is to export machine
check events to userland for more detailed processing.
- The periodic timer and hw.mca sysctls are only present if the CPU
supports MCA.

Discussed with: emaste (briefly)
MFC after: 1 month


192035 13-May-2009 alc

Correct a rare use-after-free error in pmap_copy(). This error was
introduced in amd64 revision 1.540 and i386 revision 1.547. However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)

The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page. I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails. This is appropriate because the creation of
mappings by pmap_copy() is optional. They are a (possible) optimization,
and not a requirement.

Correct a nearby whitespace error in the i386 pmap_copy().

Crash reported by: jeff@
MFC after: 6 weeks


191989 11-May-2009 dchagin

Translate l_timeval arg to native struct timeval in
linux_setsockopt()/linux_getsockopt() for SO_RCVTIMEO,
SO_SNDTIMEO opts as l_timeval has MD members.

Remove bogus __packed attribute from l_timeval struct on __amd64__.

PR: kern/134276
Submitted by: Thomas Mueller <tmueller sysgo com>
Approved by: kib (mentor)
MFC after: 2 weeks


191973 10-May-2009 dchagin

Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by: kib (mentor)


191966 10-May-2009 dchagin

Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by: bde

Approved by: kib (mentor)


191954 10-May-2009 kuriyama

- Use "device\t" and "options \t" for consistency.


191896 07-May-2009 jamie

Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by: dchagin, kib
Approved by: bz (mentor)


191876 07-May-2009 dchagin

To avoid excessive code duplication move MI definitions to the MI
header file. As it is defined in Linux.

Approved by: kib (mentor)
MFC after: 1 month


191848 06-May-2009 dfr

Disable adaptive mutexes and rwlocks for XENHVM.


191847 06-May-2009 dfr

Fix XENHVM build.


191803 05-May-2009 mav

Do not try to initialize LAPIC timer if we are not going to use it.
It solves assertion, when kernel built with INVARIANTS configured
to use i8254 timer.


191788 04-May-2009 jkim

Unlock the largest standard CPUID on Intel CPUs for both amd64 and i386 and
fix SMP topology detection. On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place. On amd64, all supported Intel CPUs should have this MSR.


191766 03-May-2009 mav

Rename statclock_disable variable to atrtcclock_disable that it actually is,
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.

Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.

This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.


191744 02-May-2009 mav

Add support for using i8254 and rtc timers as event sources for amd64 SMP
system. Redistribute hard-/stat-/profclock events to other CPUs using IPIs.


191741 02-May-2009 dchagin

Move extern variable definitions to the header file.

Approved by: kib (mentor)
MFC after: 1 month


191733 01-May-2009 mav

Add resume methods to i8254 and atrtc devices.


191730 01-May-2009 mav

Small addition to r191720.

Restore previous behaviour for the case of unknown interrupt. Invocation
of IRQ -1 crashes my system on resume. Returning 0, as it was, is not
perfect also, but at least not so dangerous.


191726 01-May-2009 sam

o add uath
o sort usb wireless drivers


191720 01-May-2009 mav

Use value -1 instead of 0 for marking unused APIC vectors. This fixes
IRQ0 routing on LAPIC-enabled systems.

Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers
as hard-/stat-/profclock sources falling back to using i8254 and rtc timers.

On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU
enters C3 or deeper power state. It makes no problems for interrupt
processing, as chipset wakes up CPU on interrupt triggering. But entering
C3 state kills LAPIC timer and freezes system time, making C3 and deeper
states practically unusable. Using i8254 timer allows to avoid this
problem.

By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters
C3 state, saving more then a Watt of total idle power (>10%) in addition to
all other power-saving techniques.

This technique is not working for SMP yet, as only one CPU receives
timer interrupts. But I think that problem could be fixed by forwarding
interrupts to other CPUs with IPI.


191719 01-May-2009 dchagin

Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by: kib (mentor)
MFC after: 1 month


191708 30-Apr-2009 jkim

- Fix divide-by-zero panic when SMP kernel is used on UP system[1].
- Avoid possible divide-by-zero panic on SMP system when the CPUID is
disabled, unsupported, or buggy.

Submitted by: pluknet (pluknet at gmail dot com)[1]


191648 29-Apr-2009 jeff

- Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
- Remove the cpu_cores/cpu_logical detection from identcpu.
- Describe the layout of the system in cpu_mp_announce().

Sponsored by: Nokia


191438 23-Apr-2009 jhb

Reduce the number of bounce zones (and thus the number of bounce pages
used in some cases):
- Ignore DMA tag boundaries when allocating bounce pages. The boundaries
don't determine whether or not parts of a DMA request bounce. Instead,
they are just used to carve up segments.
- Allow tags with sub-page alignment to share bounce pages since bounce
pages are always page aligned.

Reviewed by: scottl (amd64)
MFC after: 1 month


191405 22-Apr-2009 jhb

Adjust the way we number CPUs on x86 so that we attempt to "group" all
logical CPUs in a package. We do this by numbering the non-boot CPUs
by starting with the first CPU whose APIC ID is after the boot CPU and
wrapping back around to APIC ID 0 if needed rather than always starting
at APIC ID 0. While here, adjust the cpu_mp_announce() routine to list
CPUs based on the mapping established by assign_cpu_ids() rather than
making assumptions about the algorithm assign_cpu_ids() uses.

MFC after: 1 month


191309 20-Apr-2009 rwatson

Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by: bde [1], jhb [2]
MFC after: 2 weeks


191278 19-Apr-2009 rwatson

Add description and cautionary note regarding CACHE_LINE_SIZE.

MFC after: 2 weeks
Suggested by: alc


191276 19-Apr-2009 rwatson

For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant. These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after: 2 weeks
Discussed on: arch@


191255 19-Apr-2009 kmacy

- Import infrastructure for caching flows as a means of accelerating L3 and L2 lookups
as well as providing stateful load balancing when used with RADIX_MPATH.
- Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at
runtime with 'sysctl net.inet.flowtable.enable=1'.

- Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to
their kernel config files.

- A minimal hookup will be added to ip_output in a subsequent commit. I would like to see
more review before bringing in changes that require more churn.

Supported by: Bitgravity Inc.


191201 17-Apr-2009 jhb

Restore bus DMA bounce pages to an offset of 0 when they are released by
a tag that has BUS_DMA_KEEP_PG_OFFSET set. Otherwise the page could be
reused with a non-zero offset by a tag that doesn't have
BUS_DMA_KEEP_PG_OFFSET leading to data corruption.

Sleuthing by: avg
Reviewed by: scottl


191130 15-Apr-2009 marcel

Add a compat option to the EBR scheme that controls the
naming of the partitions (GEOM_PART_EBR_COMPAT). When
compatibility is enabled, changes to the partitioning are
disallowed.

Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.

Enable compatibility on amd64 and i386.


191111 15-Apr-2009 jkim

A simple rewrite of biossmap.c:

- Do not iterate int 15h, function e820h twice. Instead, we use STAILQ to
store each return buffer and copy all at once.
- Export optional extended attributes defined in ACPI 3.0 as separate
metadata. Currently, there are only two bits defined in the specification.
For example, if the descriptor has extended attributes and it is not
enabled, it has to be ignored by OS. We may implement it in the kernel
later if it is necessary and proven correct in reality.
- Check return buffer size strictly as suggested in ACPI 3.0.

Reviewed by: jhb


191011 13-Apr-2009 kib

The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the
uio_td to extract pages from, instead of unconditionally use kernel
pmap.

Submitted by: Jason Harmening <jason.harmening gmail com> (amd64 version)
PR: amd64/133592
Reviewed by: scottl (original patch), jhb
MFC after: 2 weeks


190919 11-Apr-2009 ed

Simplify in/out functions (for i386 and AMD64).

Remove a hack to generate more efficient code for port numbers below
0x100, which has been obsolete for at least ten years, because GCC has
an asm constraint to specify that.

Submitted by: Christoph Mallon <christoph mallon gmx de>


190876 10-Apr-2009 jfv

Add ixgbe to the GENERIC amd64 kernel in place of the
older ixgb driver. I will add to other architectures
after this one proves trouble free.

MFC after: 2 weeks


190854 08-Apr-2009 ed

Also remove the unused __word_swap_int*() macros.

Submitted by: Christoph Mallon <christoph.mallon@gmx.de>


190853 08-Apr-2009 ed

Implement __bswap16() without using inline assembly.

Most compilers nowadays (including GCC) are smart enough to know what's
going on and generate more efficient code anyway.

Submitted by: Christoph Mallon <christoph.mallon@gmx.de>


190817 07-Apr-2009 ed

Don't explicitly force ecx to be used for MSR_FSBASE/MSR_GSBASE.

Because the "c" input constaint is used, the compiler will already place
the MSR_FSBASE/MSR_GSBASE constants in ecx. Using __asm("ecx") makes
LLVM crash. Even though this is also an LLVM bug, we'd better remove the
unnecessary GCCism as well.

Submitted by: Christoph Mallon <christoph.mallon@gmx.de>


190708 05-Apr-2009 dchagin

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


190636 02-Apr-2009 jkim

Reduce code duplcations from r190620. While I am here, tweak a comment.


190635 02-Apr-2009 jkim

Chase GDT layout changes and unbreak suspend/resume on amd64.


190633 01-Apr-2009 piso

Implement an ipfw action to reassemble ip packets: reass.


190629 01-Apr-2009 jkim

Garbage collect unused MSR_GSBASE since r190620.

The only consumer was exception.S and specialreg.h is directly included now.
Note no md5 changes were observed for all assym.s consumers with this.


190626 01-Apr-2009 jkim

Garbage collect unused stack segment since r190620.


190623 01-Apr-2009 kib

Sync definitions for struct sigcontext for i386 and amd64 architectures
to struct mcontext.


190620 01-Apr-2009 kib

Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb
Linuxolator tested by: dchagin


190619 01-Apr-2009 kib

Add separate gdt descriptors for %fs and %gs on amd64.
Reorder amd64 gdt descriptors so that user-accessible selectors are the
same as on i386. At least Wine hard-codes this into the binary.

In collaboration with: pho
Reviewed by: jhb


190618 01-Apr-2009 kib

Fully enumerate all i386 sysarch commands an amd64 include file.

Provides i386/freebsd API-compatible definitions for the argument
structures of the above sysarch commands. struct i386_ioperm_args
definition is ABI-compatible.

In collaboration with: pho
Reviewed by: jhb


190616 01-Apr-2009 kib

Add all segment registers for the amd64 CPU to struct reg and mcontext.
To keep these structures ABI-compatible, half the size of r_trapno,
r_err, mc_trapno, mc_flags.

Add fsbase and gsbase to mcontext on both amd64 and i386.
Add flags to amd64 mcontext to indicate that it contains valid segments
or bases.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb


190615 01-Apr-2009 kib

Provide convenient definition of the union descriptor, similar to the
i386 one. Fully enumerate system segments and gate types.

In collaboration with: pho
Reviewed by: jhb


190600 31-Mar-2009 jkim

Fix an uninitialized variable from the previous commit.


190599 31-Mar-2009 jkim

Probe size of installed memory modules from loader and display it
as 'real memory' instead of Maxmem if the value is available.
Note amd64 displayed physmem as 'usable memory' since machdep.c r1.640
to unconfuse users. Now it is consistent across amd64 and i386 again.
While I am here, clean up smbios.c a bit and update copyright date.

Reviewed by: jhb


190581 30-Mar-2009 mav

Integrate user/mav/ata branch:

Add ch_suspend/ch_resume methods for PCI controllers and implement them
for AHCI. Refactor AHCI channel initialization according to it.

Fix Port Multipliers operation. It is far from perfect yet, but works now.
Tested with JMicron JMB363 AHCI + SiI 3726 PMP pair.
Previous version was also tested with SiI 4726 PMP.

Hardware sponsored by: Vitsch Electronics / VEHosting.nl


190472 27-Mar-2009 ambrisko

Revert 190445 change to this file restoring:
typedef l_long l_off_t;
Change l_mmap_argv's to l_ulong for pgoff. This restores prior behaviour
to consumers of l_off_t but allows mmap to mmap a 32bit position which a
Linux application requires to access SMBIOS data via /dev/mem.

Reviewed by: dchagin
Prompted by: rdivacky


190447 26-Mar-2009 kib

Convert gdt_segs and ldt_segs initialization to C99 style.

Reviewed by: jhb


190445 26-Mar-2009 ambrisko

Add stuff to support upcoming BMC/IPMI flashing of newer Dell machine
via the Linux tool.
- Add Linux shim to ipmi(4)
- Create a partitions file to linprocfs to make Linux fdisk see
disks. This file is dynamic so we can see disks come and go.
- Convert msdosfs to vfat in mtab since Linux uses that for
msdosfs.
- In the Linux mount path convert vfat passed in to msdosfs
so Linux mount works on FreeBSD. Note that tasting works
so that if da0 is a msdos file system
/compat/linux/bin/mount /dev/da0 /mnt
works.
- fix a 64it bug for l_off_t.
Grabing sh, mount, fdisk, df from Linux, creating a symlink of mtab to
/compat/linux/etc/mtab and then some careful unpacking of the Linux bmc
update tool and hacking makes it work on newer Dell boxes. Note, probably
if you can't figure out how to do this, then you probably shouldn't be
doing it :-)


190426 25-Mar-2009 jhb

Fix a few nits in the earlier changes to prevent local information leakage
in AMD FPUs:
- Do not clear the affected state in the case that the FPU registers for
the thread that already owns the FPU are changed via fpu_setregs(). The
only local information the thread would see is its own state in that
case.
- Fix a type mismatch for the dummy variable used in a "fld". It accepts
a float, not a double.

Reviewed by: bde
Approved by: so (cperciva)
MFC after: 1 month


190413 25-Mar-2009 jhb

Rename (fpu|npx)_cleanstate to (fpu|npx)_initialstate to better reflect
their purpose.

Inspired by: bde
MFC after: 1 month


190386 24-Mar-2009 jhb

Fall back to using configuration type 1 accesses for PCI config requests if
the requested PCI bus falls outside of the bus range given in the ACPI
MCFG table. Several BIOSes seem to not include all of the PCI busses in
systems in their MCFG tables. It maybe that the BIOS is simply buggy and
does support all the busses, but it is more conservative to just fall back
to the old method unless it is certain that memory accesses will work.


190341 23-Mar-2009 jkim

- Clean up suspend/resume code for amd64.
- Call acpi_resync_clock() to reset system time before hardclock is ready
to tick. Note we assume the current timecounter hardware and RTC are
already available for read operation.

Tested by: mav


190272 22-Mar-2009 alc

Update stale comments. The alternate address space mapping was eliminated
when PAE support was added to i386. The direct mapping exists on amd64.


190239 22-Mar-2009 alc

In general, the kernel virtual address of the pml4 page table page that is
stored in the pmap is from the direct map region. The two exceptions have
been the kernel pmap and the swapper's pmap. These pmaps have used a
kernel virtual address established by pmap_bootstrap() for their shared
pml4 page table page. However, there is no reason not to use the direct
map for these pmaps as well.


190237 22-Mar-2009 alc

Eliminate the recomputation of pcb_cr3 from cpu_set_upcall(). The
bcopy()ed value from the old thread is the correct value because the new
thread and the old thread will share a page table.


190100 19-Mar-2009 thompsa

Remove the uscanner(4) driver, this follows the removal of the kernel scanner
driver in Linux 2.6. uscanner was just a simple wrapper around a fifo and
contained no logic, the default interface is now libusb (supported by sane).

Reviewed by: HPS


189926 17-Mar-2009 kib

Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer
to the full path of the image that is being executed.
Increase AT_COUNT.

Remove no longer true comment about types used in Linux ELF binaries,
listed types contain FreeBSD-specific entries.

Reviewed by: kan


189903 17-Mar-2009 jkim

Initial suspend/resume support for amd64.

This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.


189872 16-Mar-2009 dchagin

Chase the k8temp->amdtemp rename in NOTES and loader.conf.

Approved by: kib (mentor)


189785 14-Mar-2009 alc

Update the pmap's resident page count when a page table page is freed in
pmap_remove_pde() and pmap_remove_pages().

MFC after: 6 weeks


189783 14-Mar-2009 alc

Correct accounting errors in _pmap_allocpte(). Specifically, the pmap's
resident page count and the global wired page count were not correctly
maintained when page table page allocation failed.

MFC after: 6 weeks


189771 13-Mar-2009 dchagin

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


189699 11-Mar-2009 dfr

Merge in support for Xen HVM on amd64 architecture.


189698 11-Mar-2009 alc

Optimize the inner loop of pmap_copy().

MFC after: 6 weeks


189610 10-Mar-2009 alc

Eliminate the last use of the recursive mapping to access user-space page
table pages. Now, all accesses to user-space page table pages are
performed through the direct map. (The recursive mapping is only used
to access kernel-space page table pages.)

Eliminate the TLB invalidation on the recursive mapping when a user-space
page table page is removed from the page table and when a user-space
superpage is demoted.


189572 09-Mar-2009 rwatson

Trim comments about the MP-safety of various bits of the amd64/i386
system call entry path and i386 IP checksum generation: we now assume
all code is MPSAFE unless explicitly marked otherwise. Remove XXX
Giant comments along similar lines: the code by the comments either
doesn't need or doesn't want Giant (especially the NMI handler).

MFC after: 3 days


189551 09-Mar-2009 alc

Change pmap_enter_quick_locked() so that it uses the kernel's direct map
instead of the pmap's recursive mapping to access the lowest level of the
page table when it maps a user-space virtual address.


189509 08-Mar-2009 sobomax

Small comment nit: "run time" -> "run-time".

Submitted by: rwatson


189497 07-Mar-2009 thompsa

Reenable ndis in the LINT build now that it has been updated for USB. Thanks to
HPS and Weongyo.


189454 06-Mar-2009 alc

If the PDE is known, then use the direct mapping instead of the recursive
mapping to access the PTE.


189423 05-Mar-2009 jhb

A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
thread hasn't used the FPU yet to reflect the real initial control
word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
word instead of trashing whatever the current state of the FPU is.

Reviewed by: bde


189415 05-Mar-2009 alc

Make pmap_copy() more TLB friendly. Specifically, make it use the kernel's
direct map instead of the pmap's recursive mapping to access the lowest
level in the page table.

MFC after: 6 weeks


189412 05-Mar-2009 jhb

A few cleanups to the FPU code on amd64:
- fpudna() always returned 1 since amd64 CPUs always have FPUs. Change
the function to return void and adjust the calling code in trap() to
assume the return 1 case is the only case.
- Remove fpu_cleanstate_ready as it is always true when it is tested.
Also, only initialize fpu_cleanstate when fpuinit() is called on the BSP.

Reviewed by: bde


189411 05-Mar-2009 jhb

Move the PCB flag macros up next to the 'pcb_flags' member in the struct.


189404 05-Mar-2009 jhb

At least one BIOS bogusly includes duplicate entries for I/O APICs. The
bogus entries have a starting IRQ that is invalid (> 255, so won't fit
into a PCI intline config register). It had the side effect of breaking
MSI by "claiming" several IRQs in the MSI range. Fix this by ignoring such
I/O APICs.

MFC after: 2 weeks


189362 04-Mar-2009 dchagin

Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by: Marcin Cieslak <saper at SYSTEM PL>
Approved by: kib (mentor)
MFC after: 1 week


189282 02-Mar-2009 kib

Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process
executing on 64bit kernel. This eliminates the direct comparisions
of p_sysent with &ia32_freebsd_sysvec, that were left intact after
r185169.


189057 25-Feb-2009 sobomax

Fix typo in comments in r189023.


189055 25-Feb-2009 jkim

Enable support for PAT_WRITE_PROTECTED and PAT_UNCACHED cache modes
unconditionally on amd64. On i386, we assume PAT is usable if the CPU
vendor is not Intel or CPU model is newer than Pentium IV.

Reviewed by: alc, jhb


189023 25-Feb-2009 sobomax

Make machdep.hyperthreading_enabled tunable working with the SCHED_ULE.
Unlike with SCHED_BSD, however, it can only be set to 0 at boot time,
it's not possible to change it at runtime.

Reviewed by: jhb
MFC after: 1 month


189018 24-Feb-2009 thompsa

These are no longer needed.


188977 24-Feb-2009 thompsa

Exclude ndis from the LINT build as it currently breaks the build, patches to
move to the new usb stack are in progress.


188944 23-Feb-2009 thompsa

Change over the usb kernel options to the new stack (retaining existing
naming). The old usb stack can be compiled in my prefixing the name with 'o'.


188938 23-Feb-2009 jhb

Some whitespace and style fixes.

Submitted by: bde (partly)


188932 23-Feb-2009 alc

Optimize free_pv_entry(); specifically, avoid repeated TAILQ_REMOVE()s.

MFC after: 1 week


188904 21-Feb-2009 jeff

- Resolve an issue where we may clear an idt while an interrupt on a
different cpu is still assigned to that vector by never clearing idt
entries. This was only provided as a debugging feature and the bugs
are caught by other means.
- Drop the sched lock when rebinding to reassign an interrupt vector
to a new cpu so that pending interrupts have a chance to be delivered
before removing the old vector.

Discussed with: tegge, jhb


188750 18-Feb-2009 kib

Adapt linux emulation to use cv for vfork wait.

Submitted by: Takahiro Kurosawa <takahiro.kurosawa gmail com>
PR: kern/131506


188665 15-Feb-2009 thompsa

Add uslcom to the build too.

Reminded by: Michael Butler


188660 15-Feb-2009 thompsa

Switch over GENERIC kernels to USB2 by default.

Tested by: make universe


188608 14-Feb-2009 alc

Remove unnecessary page queues locking around vm_page_busy() and
vm_page_wakeup(). (This change is applicable to RELENG_7 but not
RELENG_6.)

MFC after: 1 week


188426 10-Feb-2009 marcel

Add option GEOM_PART_EBR by default on amd64 and i386.


188403 09-Feb-2009 cognet

The bounce zone sees its page number increased if multiple dma maps use it in
the same dma tag. However, it can happen multiple dma tags share the same
bounce zone too, so add a per-bounce zone map counter, and check it instead of
the dma tag map counter, to know if we have to alloc more pages.

Reported by: miwi
Reviewed by: scottl


188350 08-Feb-2009 imp

When bouncing pages, allow a new option to preserve the intra-page
offset. This is needed for the ehci hardware buffer rings that assume
this behavior.

This is an interim solution, and a more general one is being worked
on. This solution doesn't break anything that doesn't ask for it
directly. The mbuf and uio variants with this flag likely don't work
and haven't been tested.

Universe builds with these changes. I don't have a huge-memory
machine to test these changes with, but will be happy to work with
folks that do and hps if this changes turns out not to be sufficient.

Submitted by: alfred@ from Hans Peter Selasky's original


188302 08-Feb-2009 imp

Companion for r188301: fix the prototypes.


188301 08-Feb-2009 imp

Correct parameter types for pcib_{read,write}_config by fixing the
protptyoes for the legacy_* impelemtnations of these kobj methods.


188254 07-Feb-2009 wkoszek

Tidy NOTES a bit:
- remove misleading nve/nfe comments, which make it hard to
distinguish those two at a first glance
- bring pbio documentation to the block comment together with
other drivers

I also brought commented out line responsible for si(4), since it
seems to compile and already has respective comment in this file.


188249 06-Feb-2009 wkoszek

ural(4) is already present in global NOTES, thus there is no
need to explicitly list it here once again. This removes:

WARNING: duplicate option `DEV_URAL' encountered.
WARNING: duplicate device `ural' encountered.

Warnings when compiling LINT on amd64.


188247 06-Feb-2009 wkoszek

Fix AGP debugging code:
- correct format strings
- fill opt_agp.h if AGP_DEBUG is defined
- bring AGP_DEBUG to LINT by mentioning it in NOTES

This should hopefully fix a warning that was...

Found by: Coverity Prevent(tm)
CID: 3676
Tested on: amd64, i386


188065 03-Feb-2009 jkoshy

Improve robustness of NMI handling, for NMIs recognized in kernel
mode.

- Make the NMI handler run on its own stack (TSS_IST2).
- Store the GSBASE value for each CPU just before the start of
each NMI stack, permitting efficient retrieval using %rsp-relative
addressing.
- For NMIs taken from kernel mode, program MSR_GSBASE explicitly
since one or both of MSR_GSBASE and MSR_KGSBASE can be potentially
invalid. The current contents of MSR_GSBASE are saved and restored
at exit.
- For NMIs handled from user mode, continue to use 'swapgs' to
load the per-CPU GSBASE.

Reviewed by: jeff
Debugging help: jeff
Tested by: gnn, Artem Belevich <artemb at gmail dot com>


187964 31-Jan-2009 obrien

Fix the inconsistent tabbing.

Noticed by: bde


187948 31-Jan-2009 obrien

Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by: jhb


187880 29-Jan-2009 jeff

- Allocate apic vectors on a per-cpu basis. This allows us to allocate
more irqs as we have more cpus. This is principally useful on systems
with msi devices which may want many irqs per-cpu.

Discussed with: jhb
Sponsored by: Nokia


187867 28-Jan-2009 jhb

Use a different value for the initial control word for the FPU state for
32-bit processes. The value matches the initial setting used by
FreeBSD/i386. Otherwise, 32-bit binaries using floating point would use
a slightly different initial state when run on FreeBSD/amd64.

MFC after: 1 week


187598 22-Jan-2009 jkim

VIA Nano processor has a special MSR (CENT_HARDWARECTRL3) bit 32 to determine
whether TSC is P-state invariant or not. In fact, this MSR is writable but
we just leave it at the BIOS default for now.


187470 20-Jan-2009 kib

The context switch to the 32bit binary does not properly restore
the fsbase value. The switch loads the fs segment register, that
invalidates the value in fsbase msr, thus value in %r9 can not be
considered the current value for fsbase anymore.

Unconditionally reload fsbase when switching to 32bit binary.

PR: 130526
MFC after: 3 weeks


187433 19-Jan-2009 sobomax

Take NTFS option out to match i386 GENERIC.

Suggested by: phk, luigi


187430 19-Jan-2009 sobomax

asr(4) is not amd64-clean, not amr(4).

Pointy hat to: myself
Submitted by: scottl


187429 19-Jan-2009 sobomax

Comment amr(4) out - according to scottl it's not 64-bit clean.


187427 19-Jan-2009 sobomax

Whitespace-only: reduce diff to the i386 GENERIC.


187426 19-Jan-2009 sobomax

Add asr(4) and stge(4) from i386 GENERIC. Both drivers compile on amd64 and
there is no particular reason for them to be i386-only.

MFC after: 2 weeks


187221 14-Jan-2009 kib

Disable interrupts, if they were enabled, before doing swapgs.
Otherwise, interrupt may happen while we run with kernel CS and usermode
gsbase.

Reviewed by: jeff
MFC after: 1 week


187181 13-Jan-2009 thompsa

MFp4: //depot/projects/usb@155990

Add USB scanner support to USB2 config files.

Submitted by: Hans Petter Selasky


187144 13-Jan-2009 luigi

Documentation-only change:

- add a reference to the config(5) manpage;
- hopefully clarify the format of the 'env FILENAME' directive.

I am putting these notes in sys/${arch}/conf/GENERIC and not
in sys/conf/NOTES because:

1. i386/GENERIC already had reference to a similar option (hints..)
and to documentation (handbook)

2. GENERIC is what most users look at when they have to modify or
create a new kernel config, so having the suggestion there is
more effective.

I am only touching i386 and amd64 because the other GENERIC files
are already out of sync, and I am not sure what is the overall plan.

MFC after: 3 days


187109 12-Jan-2009 jkim

Add basic amd64 support for VIA Nano processors.


186797 05-Jan-2009 jkim

Add Centaur/IDT/VIA vendor ID for Nano family, which has long mode support.


186776 05-Jan-2009 rwatson

Add commented out options KDTRACE_HOOKS and, for amd64, KDRACE_FRAME,
to GENERIC configuration files. This brings what's in 8.x in sync
with what is in 7.x, but does not change any current defaults.

Possibly they should now be enabled in head by default?


186610 30-Dec-2008 rpaulo

Disable USB bluetooth (needs netgraph built in) and USB audio (doesn't
compile).


186608 30-Dec-2008 rpaulo

Add a kernel config file so that users have less difficulty testing
USBng.

If it makes sense, it could be done for arm/mips too.


186240 17-Dec-2008 marcel

Make gpart the default partitioning class on all platforms.
Both ia64 and powerpc were using gpart exclusively already
so there's no change for those two.

Discussed on: arch@


186212 17-Dec-2008 imp

AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them.

Reviewed by: peter


186211 17-Dec-2008 imp

Remove obsolete AT_DEBUG stuff. It never should have been committed
in the first place, let alone migrated to linux emulation.

Reviewed by: peter, rdivacky


186076 14-Dec-2008 jkoshy

Bug fix: %ebx needs to be preserved in the user callchain capture
path.


186037 13-Dec-2008 jkoshy

- Bug fix: prevent a thread from migrating between CPUs between the
time it is marked for user space callchain capture in the NMI
handler and the time the callchain capture callback runs.

- Improve code and control flow clarity by invoking hwpmc(4)'s user
space callchain capture callback directly from low-level code.

Reviewed by: jhb (kern/subr_trap.c)
Testing (various patch revisions): gnn,
Fabien Thomas <fabien dot thomas at netasq dot com>,
Artem Belevich <artemb at gmail dot com>


186009 12-Dec-2008 jkim

Add more CPUID bits from AMD CPUID Specification Rev. 2.28.


185991 12-Dec-2008 jkoshy

Expose symbol `PMC_FN_USER_CALLCHAIN' to assembler code.


185933 11-Dec-2008 jhb

Add constants for fields in the local APIC error status register and a
routine to read it.


185715 06-Dec-2008 alc

Change the default value for the flag enabling superpage mapping and
promotion to "on".

Reminded by: jhb
Tested by: kris


185634 05-Dec-2008 kib

Improve db_backtrace() for compat ia32 on amd64. 32bit image enters
the kernel via Xint0x80_syscall().

Submitted by: dchagin
MFC after: 1 week


185567 02-Dec-2008 ed

Remove "[KEEP THIS!]" from COMPAT_43TTY. It's not really that important.

Sgtty is a programming interface that has been replaced by termios over
the years. In June we already removed <sgtty.h>, which exposes the
ioctl()'s that are implemented by this interface. The importance of this
flag is overrated right now.


185561 02-Dec-2008 ganbold

Remove unused variable.

Found with: Coverity Prevent(tm)
CID: 3685

Approved by: jhb


185522 01-Dec-2008 sam

Switch to ath hal source code. Note this removes the ath_hal
module; the ath module now brings in the hal support. Kernel
config files are almost backwards compatible; supplying

device ath_hal

gives you the same chip support that the binary hal did but you
must also include

options AH_SUPPORT_AR5416

to enable the extended format descriptors used by 11n parts.
It is now possible to control the chip support included in a
build by specifying exactly which chips are to be supported
in the config file; consult ath_hal(4) for information.


185515 01-Dec-2008 kensmith

Adjustments to make a tags file a bit more suitable to amd64.

Reviewed by: peter


185460 30-Nov-2008 mav

According to "Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3B: System Programming Guide, Part 2", CPUs with family 0x6 and model
above or 0xE and CPUs with family 0xF and model above or 0x3 have invariant
TSC.


185442 29-Nov-2008 kib

Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64.
Change types used in the linux' struct msghdr and struct cmsghdr
definitions to the properly-sized architecture-specific types.
Move ancillary data handler from linux_sendit() to linux_sendmsg().

Submitted by: dchagin


185439 29-Nov-2008 kib

Regenerate


185438 29-Nov-2008 kib

Fix iovec32 for linux32/amd64.

Add a custom version of copyiniov() to deal with the 32-bit iovec
pointers from userland (to be used later).

Adjust prototypes for linux_readv() and linux_writev() to use new
l_iovec32 definition and to match actual linux code. In particular,
use ulong for fd (why ?).

Submitted by: dchagin


185363 27-Nov-2008 jkoshy

- Add support for PMCs in Intel CPUs of Family 6, model 0xE (Core Solo
and Core Duo), models 0xF (Core2), model 0x17 (Core2Extreme) and
model 0x1C (Atom).

In these CPUs, the actual numbers, kinds and widths of PMCs present
need to queried at run time. Support for specific "architectural"
events also needs to be queried at run time.

Model 0xE CPUs support programmable PMCs, subsequent CPUs
additionally support "fixed-function" counters.

- Use event names that are close to vendor documentation, taking in
account that:
- events with identical semantics on two or more CPUs in this family
can have differing names in vendor documentation,
- identical vendor event names may map to differing events across
CPUs,
- each type of CPU supports a different subset of measurable
events.

Fixed-function and programmable counters both use the same vendor
names for events. The use of a class name prefix ("iaf-" or
"iap-" respectively) permits these to be distinguished.

- In libpmc, refactor pmc_name_of_event() into a public interface
and an internal helper function, for use by log handling code.

- Minor code tweaks: staticize a global, freshen a few comments.

Tested by: gnn


185343 26-Nov-2008 jkim

Use newly introduced cpu_vendor_id to make invariant TSC detection more
clearer and merge r185295 to amd64.


185341 26-Nov-2008 jkim

Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "...").

Reviewed by: jhb, peter (early amd64 version)


185169 22-Nov-2008 kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


185162 22-Nov-2008 kmacy

- bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.


185002 16-Nov-2008 kib

In the robust futexes list head, futex_offset shall be signed,
and glibc actually supplies negative offsets. Change l_ulong to l_long.

Submitted by: dchagin


184870 12-Nov-2008 yongari

Add ale(4), a driver for Atheros AR8121/AR8113/AR8114 PCIe ethernet
controller. The controller is also known as L1E(AR8121) and
L2E(AR8113/AR8114). Unlike its predecessor Attansic L1,
AR8121/AR8113/AR8114 uses completely different Rx logic such that
it requires separate driver. Datasheet for AR81xx is not available
to open source driver writers but it shares large part of Tx and
PHY logic of L1. I still don't understand some part of register
meaning and some MAC statistics counters but the driver seems to
have no critical issues for performance and stability.

The AR81xx requires copy operation to pass received frames to upper
stack such that ale(4) consumes a lot of CPU cycles than that of
other controller. A couple of silicon bugs also adds more CPU
cycles to address the known hardware bug. However, if you have fast
CPU you can still saturate the link.
Currently ale(4) supports the following hardware features.
- MSI.
- TCP Segmentation offload.
- Hardware VLAN tag insertion/stripping with checksum offload.
- Tx TCP/UDP checksum offload and Rx IP/TCP/UDP checksum offload.
- Tx/Rx interrupt moderation.
- Hardware statistics counters.
- Jumbo frame.
- WOL.

AR81xx PCIe ethernet controllers are mainly found on ASUS EeePC or
P5Q series of ASUS motherboards. Special thanks to Jeremy Chadwick
who sent the hardware to me. Without his donation writing a driver
for AR81xx would never have been possible. Big thanks to all people
who reported feedback or tested patches.

HW donated by: koitsu
Tested by: bsam, Joao Barros <joao.barros <> gmail DOT com >
Jan Henrik Sylvester <me <> janh DOT de >
Ivan Brawley < ivan <> brawley DOT id DOT au >,
CURRENT ML


184849 11-Nov-2008 ed

Several cleanups related to pipe(2).

- Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2)
fills an array with two descriptors.

- Remove EFAULT from the manual page. Because of the current calling
convention, pipe(2) raises a segmentation fault when an invalid
address is passed.

- Introduce kern_pipe() to make it easier for binary emulations to
implement pipe(2).

- Make Linux binary emulation use kern_pipe(), which means we don't have
to recover td_retval after calling the FreeBSD system call.

Approved by: rdivacky
Discussed on: arch


184802 09-Nov-2008 jkoshy

- Separate PMC class dependent code from other kinds of machine
dependencies. A 'struct pmc_classdep' structure describes operations
on PMCs; 'struct pmc_mdep' contains one or more 'struct pmc_classdep'
structures depending on the CPU in question.

Inside PMC class dependent code, row indices are relative to the
PMCs supported by the PMC class; MI code in "hwpmc_mod.c" translates
global row indices before invoking class dependent operations.

- Augment the OP_GETCPUINFO request with the number of PMCs present
in a PMC class.

- Move code common to Intel CPUs to file "hwpmc_intel.c".

- Move TSC handling to file "hwpmc_tsc.c".


184790 09-Nov-2008 ed

Regenerate system call tables for r184789.


184789 09-Nov-2008 ed

Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.

Looking at our source code history, it seems the uname(),
getdomainname() and setdomainname() system calls got deprecated
somewhere after FreeBSD 1.1, but they have never been phased out
properly. Because we don't have a COMPAT_FREEBSD1, just use
COMPAT_FREEBSD4.

Also fix the Linuxolator to build without the setdomainname() routine by
just making it call userland_sysctl on kern.domainname. Also replace the
setdomainname()'s implementation to use this approach, because we're
duplicating code with sysctl_domainname().

I wasn't able to keep these three routines working in our
COMPAT_FREEBSD32, because that would require yet another keyword for
syscalls.master (COMPAT4+NOPROTO). Because this routine is probably
unused already, this won't be a problem in practice. If it turns out to
be a problem, we'll just restore this functionality.

Reviewed by: rdivacky, kib


184499 31-Oct-2008 kib

Revert r184136. Instead, push the check for crashdumpmap overflow into the
MD i386 and amd64 dump code.

Requested by: jhb
Retested by: pho
MFC after: 3 days (+ 176304 + 184136)


184378 27-Oct-2008 sobomax

Fix r184323 - set stathz to be the same as lapic_timer_hz when lapic_timer_hz
is less than 128. Remove extra {} to match existing style.


184293 26-Oct-2008 sobomax

Fix division by zero panic if kern.hz less than 32.

MFC after: 1 day


184170 22-Oct-2008 jkim

Simplify AMD64_CPU_MODEL() and AMD64_CPU_FAMILY() macros as the base family
should be at least 0xf00 for all supported platforms.


184169 22-Oct-2008 jkim

Add AMD Family 0Fh, Model 6Bh, Stepping 2 to the list of invariant TSCs
and fix i386 test.


184146 22-Oct-2008 jkim

Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higher
even if BIOS does not advertise it.


184102 21-Oct-2008 jkim

Turn off CPU frequency change notifiers when the TSC is P-state invariant
or it is forced by setting 'kern.timecounter.invariant_tsc' tunable
to non-zero.


184101 21-Oct-2008 jkim

Detect Advanced Power Management Information for AMD CPUs.


184058 19-Oct-2008 kib

Correctly fill siginfo for the signals delivered by linux tkill/tgkill.
It is required for async cancellation to work.

Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made
to not linux process.

Do not call em_find(p, ...) with p unlocked.

Move common code for linux_tkill() and linux_tgkill() into
linux_do_tkill().

Change linux siginfo_t definition to match actual linux one. Extend
uid fields to 4 bytes from 2. The extension does not change structure
layout and is binary compatible with previous definition, because i386
is little endian, and each uid field has 2 byte padding after it.

Reported by: Nicolas Joly <njoly pasteur fr>
Submitted by: dchangin
MFC after: 1 month


184026 18-Oct-2008 kib

Set PCB_32BIT and clear PCB_GS32BIT for linux32 binaries.

Tested by: dchagin
MFC after: 3 days


183871 14-Oct-2008 kib

Make robust futexes work on linux32/amd64. Use PTRIN to read
user-mode pointers. Change types used in the structures definitions to
properly-sized architecture-specific types.

Submitted by: dchagin
MFC after: 1 week


183615 05-Oct-2008 davidxu

If the current thread has the trap bit set (i.e. a debugger had
single stepped the process to the system call), we need to clear
the trap flag from the new frame. Otherwise, the new thread will
receive a (likely unexpected) SIGTRAP when it executes the first
instruction after returning to userland.


183567 03-Oct-2008 stas

- Add driver for Attansic L2 FastEthernet controller found on
Asus EeePC and some Asus mainboards.

Reviewed by: yongari, rpaulo, jhb
Tested by: many
Approved by: kib (mentor)
MFC after: 1 week


183527 01-Oct-2008 peter

Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment. Textdump magic can be passed in.


183525 01-Oct-2008 jhb

Bump MAXCPU to 32 now that 32 CPU x86 systems exist.

Tested by: rwatson, mdtansca
Approved by: peter


183439 28-Sep-2008 marius

Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by: jhb
Reviewed by: arch, grehan, jhb


183397 27-Sep-2008 ed

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


183322 24-Sep-2008 kib

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


183151 18-Sep-2008 stas

- Recognize SAVE and OSXSAVE extended processor features.

Approved by: kib (mentor)
MFC after: 1 month


183033 15-Sep-2008 jkoshy

Correct a callchain capture bug on the i386.

On the i386 architecture, the processor only saves the current value
of `%esp' on stack if a privilege switch is necessary when entering
the interrupt handler. Thus, `frame->tf_esp' is only valid for
an entry from user mode. For interrupts taken in kernel mode, we
need to determine the top-of-stack for the interrupted kernel
procedure by adding the appropriate offset to the current frame
pointer.

Reported by: kris, Fabien Thomas
Tested by: Fabien Thomas <fabien.thomas at netasq dot com>


182947 11-Sep-2008 jhb

Add a 'hw.pci.mcfg' tunable. It can be set to 0 to disable memory-mapped
PCI config access.


182936 11-Sep-2008 jhb

Update the comments above the 0xcf9 register reset attempt to match the
code. We only attempt a single reset using this method (a "hard" reset),
and we use two writes to ensure there is a 0 -> 1 transition in bit 2 to
force a reset.

MFC after: 1 week


182910 10-Sep-2008 jhb

Some K8 chipsets don't expose all of the PCI devices on bus 0 via PCIe
memory-mapped config access. Add a workaround for these systems by
checking the first function of each slot on bus 0 using both the
memory-mapped config access and the older type 1 I/O port config access.
If we find a slot that is only visible via the type 1 I/O port config
access, we flag that slot. Future PCI config transactions to flagged
slots on bus 0 use type 1 I/O port config access rather than memory mapped
config access.


182868 08-Sep-2008 kib

The pcb_gs32p should be per-cpu, not per-thread pointer. This is
location in GDT where the segment descriptor from pcb_gs32sd is
copied, and the location is in GDT local to CPU.

Noted and reviewed by: peter
MFC after: 1 week


182867 08-Sep-2008 kib

Provide private per-CPU GDTs on amd64. This is required at least for the
linux CB_GS32BIT to work.

Noted by: nox
Reviewed by: peter
MFC after: 1 week


182866 08-Sep-2008 kib

In linux_set_thread_area(), mark pcb as PCB_GS32BIT. This was missed
when r180992 was committed.

Reviewed by: peter
MFC after: 1 week


182865 08-Sep-2008 kib

Fix inconsistencies in the comments.

MFC after: 1 week


182849 07-Sep-2008 kib

Segment registers are stored in the uc_mcontext member of the struct
l_ucontext. To restore the registers content, trampoline needs to
dereference uc_mcontext instead of taking some undefined values from
l_ucontext.

Submitted by: Dmitry Chagin <dchagin@>
MFC after: 1 week


182684 02-Sep-2008 kib

- When executing FreeBSD/amd64 binaries from FreeBSD/i386 or Linux/i386
processes, clear PCB_32BIT and PCB_GS32BIT bits [1].

- Reread the fs and gs bases from the msr unconditionally, not believing
the values in pcb_fsbase and pcb_gsbase, since usermode may reload
segment registers, invalidating the cache. [2].

Both problems resulted in the wrong fs base, causing wrong tls pointer
be dereferenced in the usermode.

Reported and tested by: Vyacheslav Bocharov <adeepv at gmail com> [1]
Reported by: Bernd Walter <ticsoat cicely7 cicely de>,
Artem Belevich <fbsdlist at src cx>[2]
Reviewed by: peter
MFC after: 3 days


182220 26-Aug-2008 jkim

Move empty filter handling to MI source.

MFC after: 3 days


182173 25-Aug-2008 jkim

Fix a typo in copyrights.


182046 23-Aug-2008 jhb

Adjust the handling the various timer frequencies when using the lapic
timer. Previously, the various divisors were fixed which meant that while
it gave somewhat reasonable stathz, etc. at hz=1000, it went off the rails
with any other hz value. With these changes, we now pick a lapic timer hz
based on the value of hz. If hz is >= 1500, then the lapic timer runs at
hz. If 1500 hz >= 750, we run the lapic timer at hz * 2. If hz < 750, we
run at hz * 4. We compute a divider at runtime to make stathz run as close
to 128 as we can since stathz really wants to be run at something close to
that frequency. Profiling just runs on every clock tick. So some examples:

With hz = 100, the lapic timer now runs at 400 instead of 2000. stathz
will be 133, and profhz = 400. With hz = 1000 (default), the lapic timer
is still at 2000 (as it is now), stathz is at 133 (as it is now), and
profhz will be 2000 (previously 666).

MFC after: 2 weeks


181987 22-Aug-2008 jhb

Extend the support for PCI-e memory mapped configuration space access:
- Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the
rest of the kernel. It now also accepts parameters via function
arguments rather than global variables.
- Add a notion of minimum and maximum bus numbers and reject requests for
an out of range bus.
- Add more range checks on slot/func/reg/bytes parameters to the cfg reg
read/write routines. Don't panic on any invalid parameters, just fail
the request (writes do nothing, reads return -1). This matches the
behavior of the other cfg mechanisms.
- Port the memory mapped configuration space access to amd64. On amd64
we simply use the direct map (via pmap_mapdev()) for the memory mapped
window.
- During acpi_attach() just after loading the ACPI tables, check for a
MCFG table. If it exists, call pciereg_cfgopen() on each subtable
(memory mapped window). For now we only support windows for domain 0
that start with bus 0. This removes the need for more chipset-specific
quirks in the MD code.
- Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets
since these machines should all have MCFG tables via ACPI.
- Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen()
earlier.

MFC after: 2 weeks


181905 20-Aug-2008 ed

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


181875 19-Aug-2008 jhb

Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports
already define _KERNEL to get to this and I'm about to add hooks to
libkvm to access per-CPU data.

MFC after: 1 week


181848 18-Aug-2008 jkim

Correctly check unsignedness of all BPF_LD|BPF_IND instructions.
This is roughly from sys/net/bpf_filter.c r1.12 and r1.14.


181846 18-Aug-2008 jkim

- Make these files compilable on user land.
- Update copyrights and fix style(9).


181823 18-Aug-2008 kib

The doreti_iret_fault code is always called with gs base MSR containing
kernel gs base, because %rip is adjusted only on kernel-mode trap caused
by iretq execution. On the other hand, the stack contains (hardware
part of) trap frame from the usermode. As a consequence, checking for
frame mode and doing swapgs causes the kernel to enter trap() with
usermode gs base.

Remove the check for mode and conditional swapgs, we already have right
gs base in the MSR.

Submitted by: Nate Eldredge <neldredge math ucsd edu>
MFC after: 3 days


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


181700 13-Aug-2008 jkim

Use int32_t/int16_t instead of int/short as sys/net/bpf_filter.c does.


181697 13-Aug-2008 jkim

- Remove unnecessary jump instruction(s) when offset(s) is/are zero(s).
- Constantly use conditional jumps for unsigned integers.


181648 12-Aug-2008 jkim

Update copyrights and fix style(9).


181644 12-Aug-2008 jkim

Replace all stack usages with registers and remove unused macros.


181606 11-Aug-2008 jhb

Decode some more "exotic" instructions including: fxsave, fxrstor, ldmxcsr,
stmxcsr, clflush, lfence, mfence, sfence, syscall, sysret, sysenter,
sysexit, pause, monitor, mwait, and swapgs (amd64 only).

MFC after: 1 week


181456 09-Aug-2008 alc

Intel describes the behavior of their processors as "undefined" if two or
more mappings to the same physical page have different memory types, i.e.,
PAT settings. Consequently, if pmap_change_attr() is applied to a virtual
address range within the kernel map, then the corresponding ranges of the
direct map also need to be changed. Enhance pmap_change_attr() to handle
this case automatically.

Add a comment describing what pmap_change_attr() does.

Discussed with: jhb


181430 08-Aug-2008 stas

- Add cpuctl(4) pseudo-device driver to provide access to some low-level
features of CPUs like reading/writing machine-specific registers,
retrieving cpuid data, and updating microcode.
- Add cpucontrol(8) utility, that provides userland access to
the features of cpuctl(4).
- Add subsequent manpages.

The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX
is created for each cpu present in the systems. The pseudo-device minor
number corresponds to the cpu number in the system. The cpuctl(4) pseudo-
device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID
and UPDATE. The first pair alows the caller to read/write machine-specific
registers from the correspondent CPU. cpuid data could be retrieved using
the CPUID call, and microcode updates are applied via UPDATE.

The permissions are inforced based on the pseudo-device file permissions.
RDMSR/CPUID will be allowed when the caller has read access to the device
node, while WRMSR/UPDATE will be granted only when the node is opened
for writing. There're also a number of priv(9) checks.

The cpucontrol(8) utility is intened to provide userland access to
the cpuctl(4) device features. The utility also allows one to apply
cpu microcode updates.

Currently only Intel and AMD cpus are supported and were tested.

Approved by: kib
Reviewed by: rpaulo, cokane, Peter Jeremy
MFC after: 1 month


181356 07-Aug-2008 alc

Introduce pmap_change_attr_locked().


181284 04-Aug-2008 alc

Make pmap_kenter_attr() static.


181233 03-Aug-2008 ed

Disconnect drivers that haven't been ported to MPSAFE TTY yet.

As clearly mentioned on the mailing lists, there is a list of drivers
that have not been ported to the MPSAFE TTY layer yet. Remove them from
the kernel configuration files. This means people can now still use
these drivers if they explicitly put them in their kernel configuration
file, which is good.

People should keep in mind that after August 10, these drivers will not
work anymore. Even though owners of the hardware are capable of getting
these drivers working again, I will see if I can at least get them to a
compilable state (if time permits).


181151 02-Aug-2008 alc

Enhance pmap_mapdev_attr(). Take advantage of recent enhancements to
pmap_change_attr() in order to use the direct map for any cache mode, not
just write-back mode.

It is worth noting that this change also eliminates a situation in which we
have two mappings to the same physical memory with different cache modes.

Submitted by: Magesh Dhasayyan (with some changes by me)
Discussed with: jhb


181112 01-Aug-2008 alc

Enhance pmap_change_attr() with the ability to demote 1GB page mappings.


181077 31-Jul-2008 alc

Enhance pmap_change_attr(). Specifically, avoid 2MB page demotions, cache
mode changes, and cache and TLB invalidation when some or all of the
specified range is already mapped with the specified cache mode.

Submitted by: Magesh Dhasayyan


181043 31-Jul-2008 alc

Eliminate recomputation of the PDE by pmap_pde_attr().


181031 30-Jul-2008 jfv

Add igb to the default kernel

MFC after:ASAP


180992 30-Jul-2008 kib

Bring back the save/restore of the %ds, %es, %fs and %gs registers for
the 32bit images on amd64.

Change the semantic of the PCB_32BIT pcb flag to request the context
switch code to operate on the segment registers. Its previous meaning
of saving or restoring the %gs base offset is assigned to the new
PCB_GS32BIT flag.

FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit
emulation sets PCB_32BIT | PCB_GS32BIT.

Reviewed by: peter
MFC after: 2 weeks


180872 28-Jul-2008 alc

Don't allow pmap_change_attr() to be applied to the recursive mapping.


180870 28-Jul-2008 alc

Add a check for 1GB page mappings to pmap_change_attr() so that it fails
gracefully. (On K10 family processors the direct map is implemented using
1GB page mappings.)


180846 27-Jul-2008 alc

Style fixes to several function definitions.


180845 27-Jul-2008 alc

Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2MB page
mapping to 4KB page mappings when the specified attribute change only
applies to a portion of the 2MB page. Previously, in such cases,
pmap_change_attr() gave up and returned an error.

Submitted by: Magesh Dhasayyan


180623 19-Jul-2008 alc

Increase the ceiling on the size of the buffer map.


180601 18-Jul-2008 alc

Correct an error in pmap_change_attr()'s initial loop that verifies that the
given range of addresses are mapped. Previously, the loop was testing the
same address every time.

Submitted by: Magesh Dhasayyan


180600 18-Jul-2008 alc

Simplify pmap_extract()'s control flow, making it more like the related
functions pmap_extract_and_hold() and pmap_kextract().


180533 15-Jul-2008 alc

Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is
specified when appropriate.

Reviewed by: scottl


180498 13-Jul-2008 alc

Handle a race between pmap_kextract() and pmap_promote_pde(). This race
caused ZFS to crash when restoring a snapshot with superpage promotion
enabled.

Reported by: kris


180487 13-Jul-2008 ed

Make uart(4) the default serial port driver on i386 and amd64.

The uart(4) driver has the advantage of supporting a wider variety of
hardware on a greater amount of platforms. This driver has already been
the standard on platforms such as ia64, powerpc and sparc64.

I've decided not to change anything on pc98. I'd rather let people from
the pc98 team look at this.

Approved by: philip (mentor), marcel


180485 12-Jul-2008 alc

Refine the changes made in SVN rev 180430. Specifically, instantiate a new
page table page only if the 2MB page mapping has been used. Also, refactor
some assertions.


180483 12-Jul-2008 alc

In order to apply pmap_demote_pde() to a page directory entry (PDE) from the
direct map, the PDE must have PG_M and PG_A preset.

Noticed by: Magesh Dhasayyan


180430 10-Jul-2008 alc

Extend pmap_demote_pde() to include the ability to instantiate a new page
table page where none existed before.


180393 09-Jul-2008 peter

Band-aid a problem with 32 bit selector setup.

Initialize %ds, %es, and %fs during CPU startup. Otherwise a garbage
value could leak to a 32-bit process if a process migrated to a different
CPU after exec and the new CPU had never exec'd a 32-bit process.

A more complete fix is needed, but this mitigates the most frequent
manifestations.

Obtained from: ups


180378 09-Jul-2008 alc

Fix lines that are too long in pmap_growkernel() by substituting shorter but
equivalent expressions.


180373 08-Jul-2008 alc

Eliminate pmap_growkernel()'s dependence on create_pagetables() preallocating
page directory pages from VM_MIN_KERNEL_ADDRESS through the end of the
kernel's bss. Specifically, the dependence was in pmap_growkernel()'s one-
time initialization of kernel_vm_end, not in its main body. (I could not,
however, resist the urge to optimize the main body.)

Reduce the number of preallocated page directory pages to just those needed
to support NKPT page table pages. (In fact, this allows me to revert a
couple of my earlier changes to create_pagetables().)


180362 08-Jul-2008 alc

Rev 180333, ``Change create_pagetables() and pmap_init() so that many fewer
page table pages have to be preallocated ...'', violates an assumption made
by minidumpsys(): kernel_vm_end is the highest virtual address that has ever
been used by the kernel. Now, however, the kernel code, data, and bss may
reside at addresses beyond kernel_vm_end. This revision modifies the upper
bound on minidumpsys()'s two page table traversals to account for this
possibility.


180359 07-Jul-2008 delphij

Add HWPMC_HOOKS to GENERIC kernels, this makes hwpmc.ko work out
of the box.


180352 07-Jul-2008 alc

In FreeBSD 7.0 and beyond, pmap_growkernel() should pass VM_ALLOC_INTERRUPT
to vm_page_alloc() instead of VM_ALLOC_SYSTEM. VM_ALLOC_SYSTEM was the
logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not
reclaim a cached page. Simply put, there was no ordering between
VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the
cache and free queues. Now, there is; VM_ALLOC_INTERRUPT dominates
VM_ALLOC_SYSTEM.

While I'm here, teach pmap_growkernel() to request a prezeroed page.

MFC after: 1 week


180333 06-Jul-2008 alc

Change create_pagetables() and pmap_init() so that many fewer page table
pages have to be preallocated by create_pagetables().


180311 05-Jul-2008 alc

Increase the kernel map's size to 7GB, making room for a kmem map of size
greater than 4GB. (Auto-sizing will set the ceiling on the kmem map size
to 4.2GB.)


180255 04-Jul-2008 alc

Eliminate an unused declaration. (In fact, the declaration is bogus
because the variable is defined static to pmap.c on i386.)

Found by: CScout


180210 03-Jul-2008 alc

Increase the ceiling on the kmem map's size to 3.6GB. Also, define the
ceiling as a fraction of the kernel map's size rather than an absolute
quantity. Thus, scaling of the kmem map's size will be automatic with
changes to the kernel map's size.


180209 03-Jul-2008 peter

Exclude .cvsignore files from $FreeBSD$ checking


180170 02-Jul-2008 alc

Eliminate an unnecessary static variable: nkpt.


180109 30-Jun-2008 alc

Document the layout of the address space, borrowing heavily from
http://lists.freebsd.org/pipermail/freebsd-amd64/2005-July/005578.html


180108 30-Jun-2008 alc

Compute NKPDPE from NKPT. This reduces the number of knobs that must be
turned in order to change the size of the kernel virtual address space.


180101 29-Jun-2008 alc

Strictly speaking, the definition of VM_MAX_KERNEL_ADDRESS is wrong. However,
in practice, the error (currently) makes no difference because the computation
performed by KVADDR() hides the error. This revision fixes the error.

Also, eliminate a (now) unused definition.


180100 29-Jun-2008 alc

Increase the size of the kernel virtual address space to 6GB. Until the
maximum size of the kmem map can be greater than 4GB, there is little point
in making the kernel virtual address space larger than 6GB.

Tested by: kris@


179990 25-Jun-2008 ed

Remove the unused major/minor numbers from iodev and memdev.

Now that st_rdev is being automatically generated by the kernel, there
is no need to define static major/minor numbers for the iodev and
memdev. We still need the minor numbers for the memdev, however, to
distinguish between /dev/mem and /dev/kmem.

Approved by: philip (mentor)


179977 24-Jun-2008 jkim

Emit opcodes closer to GNU as(1) generated codes and micro-optimize.


179967 23-Jun-2008 jkim

Rehash and clean up BPF JIT compiler macros to match AT&T notations.


179956 23-Jun-2008 alc

Ensure that KERNBASE is no less than the virtual address -2GB.


179917 21-Jun-2008 alc

Prepare for a larger kernel virtual address space. Specifically, once
KERNBASE and VM_MIN_KERNEL_ADDRESS are no longer the same, the physical
memory allocated during bootstrap will be offset from the low-end of the
kernel's page table.


179898 20-Jun-2008 alc

Make preparations for increasing the size of the kernel virtual
address space on the amd64 architecture. The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space. Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).
That said, kris@ has tested crash dumps under the full patch that
increases the kernel virtual address space on amd64 to 6GB.

Tested by: kris@


179895 20-Jun-2008 delphij

Add et(4), a port of DragonFly's Agere ET1310 10/100/Gigabit
Ethernet device driver, written by sephe@

Obtained from: DragonFly
Sponsored by: iXsystems
MFC after: 2 weeks


179886 20-Jun-2008 alc

Make preparations for increasing the size of the kernel virtual
address space on the amd64 architecture. The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space. Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).


179777 13-Jun-2008 alc

Tweak the promotion test in pmap_promote_pde(). Specifically, test PG_A
before PG_M. This sometimes prevents unnecessary removal of write access
from a PTE. Overall, the net result is fewer demotions and promotion
failures.


179749 12-Jun-2008 alc

Reverse the direction of pmap_promote_pde()'s traversal over the specified
page table page. The direction of the traversal can matter if
pmap_promote_pde() has to remove write access (PG_RW) from a PTE that hasn't
been modified (PG_M). In general, if there are two or more such PTEs to
choose among, it is better to write protect the one nearer the high end of
the page table page rather than the low end. This is because most programs
access memory in an ascending direction. The net result of this change is a
sometimes significant reduction in the number of failed promotion attempts
and the number of pages that are write protected by pmap_promote_pde().


179471 01-Jun-2008 alc

Correct an error in pmap_promote_pde() that may result in an errant
promotion within the kernel's address space. Specifically,
pmap_promote_pde() is only called when the page table page (PTP) that
is referenced by the given PDE has a full "use count", i.e., its
wire_count is 512. Although this guarantees for a user address space
that all 512 PTEs in the PTP hold valid mappings, the same is not true
of the kernel's address space. A kernel PTP always has a use count of
512 regardless of the state of the PTEs. Therefore,
pmap_promote_pde() should not assume (or assert) that the first PTE in
the PTP is valid.


179347 27-May-2008 yongari

Add jme(4) to the list of drivers supported by GENERIC kernel.


179315 26-May-2008 bz

Remove ISDN4BSD (I4B) from HEAD as it is not MPSAFE and
parts relied on the now removed NET_NEEDS_GIANT.
Most of I4B has been disconnected from the build
since July 2007 in HEAD/RELENG_7.

This is what was removed:
- configuration in /etc/isdn
- examples
- man pages
- kernel configuration
- sys/i4b (drivers, layers, include files)
- user space tools
- i4b support from ppp
- further documentation

Discussed with: rwatson, re


179279 24-May-2008 jb

Add the DTrace hooks for exception handling (Function boundary trace
-fbt- provider), cyclic clock and syscalls.


179229 23-May-2008 alc

The VM system no longer uses setPQL2(). Remove it and its helpers.


179109 19-May-2008 yongari

Add age(4) to the list of drivers supported by GENERIC kernel.


179081 18-May-2008 alc

Retire pmap_addr_hint(). It is no longer used.


179078 17-May-2008 remko

Resort the if_ti driver to match the PCI Network cards instead of placing
it under the mii devices list.

PR: kern/123147
Submitted by: gavin
Approved by: imp (mentor, implicit)
MFC after: 3 days


179049 16-May-2008 attilio

Removed unused assembly offsets for structures digging.


178977 13-May-2008 rdivacky

Regen.

Approved by: kib (mentor)


178976 13-May-2008 rdivacky

Implement robust futexes. Most of the code is modelled after
what Linux does. This is because robust futexes are mostly
userspace thing which we cannot alter. Two syscalls maintain
pointer to userspace list and when process exits a routine
walks this list waking up processes sleeping on futexes
from that list.

Reviewed by: kib (mentor)
MFC after: 1 month


178947 11-May-2008 alc

Correct an error in pmap_align_superpage(). Specifically, correctly
handle the case where the mapping is greater than a superpage in size
but the alignment of the physical pages spans a superpage boundary.


178875 09-May-2008 alc

Introduce pmap_align_superpage(). It increases the starting virtual
address of the given mapping if a different alignment might result in more
superpage mappings.


178742 03-May-2008 sam

enable IEEE80211_DEBUG and IEEE80211_AMPDU_AGE by default


178676 29-Apr-2008 sam

Intel 4965 wireless driver (derived from openbsd driver of the same name)


178493 25-Apr-2008 alc

Always use PG_PS_FRAME to extract the physical address of a 2/4MB page
from a PDE.


178471 25-Apr-2008 jeff

- Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.

Sponsored by: Nokia


178439 23-Apr-2008 rdivacky

Implement linux_truncate64() syscall.

Tested by: Aline de Freitas <aline@riseup.net>
Approved by: kib (mentor)


178429 22-Apr-2008 phk

Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.


178354 20-Apr-2008 sam

Multi-bss (aka vap) support for 802.11 devices.

Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral). Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.

Supported by: Hobnob and Marvell
Reviewed by: many
Obtained from: Atheros (some bits)


178352 20-Apr-2008 sam

move awi to the Attic; it will not make the jump to the new world order

Reviewed by: imp


178314 19-Apr-2008 peter

Put in a real isa_irq_pending() stub in order to remove two lines of dmesg
noise from sio per unit. sio likes to probe if interrupts are configured
correctly by looking at the pending bits of the atpic in order to put a
non-fatal warning on the console. I think I'd rather read the pending
bits from the apics, but I'm not sure its worth the hassle.


178299 18-Apr-2008 jeff

- Add inlines for the monitor and mwait instructions.

Sponsored by: Nokia


178258 16-Apr-2008 jkim

Regenerate.


178257 16-Apr-2008 jkim

Add stubs for syscalls introduced in Linux 2.6.17 kernel.
Some GNU libc version started using them before 2.6.17 was officially out.

MFC after: 3 days


178210 15-Apr-2008 imp

This file is unused on amd64.


178193 14-Apr-2008 phk

Convert amd64 and i386 to share the atrtc device driver.


178153 12-Apr-2008 rpaulo

Connect k8temp(4) to the build.


178092 11-Apr-2008 jeff

- Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.

Reviewed by: jhb
Sponsored by: Nokia


178070 10-Apr-2008 alc

Correct pmap_copy()'s method for extracting the physical address of a
2/4MB page from a PDE. Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.

Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page. This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().


177999 08-Apr-2008 kib

Regenerate


177997 08-Apr-2008 kib

Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by: rdivacky
Sponsored by: Google Summer of Code 2007
Tested by: pho


177967 07-Apr-2008 alc

Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings.


177940 05-Apr-2008 jhb

Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).

Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64


177917 04-Apr-2008 alc

Eliminate an unnecessary test and its misleading comment from pmap_enter().


177851 02-Apr-2008 alc

Optimize pmap_pml4e() and pmap_pdpe() based upon two observations: The
given pmap is never NULL, and therefore pmap_pml4e() can never return
NULL. The pervasive use of these inline functions throughout the pmap
makes these simple changes worthwhile.


177680 28-Mar-2008 ps

Add support to mincore for detecting whether a page is part of a
"super" page or not.

Reviewed by: alc, ups


177662 27-Mar-2008 dfr

Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.


177661 27-Mar-2008 jb

When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.


177651 26-Mar-2008 phk

Back in the good old days, PC's had random pieces of rock for
frequency generation and what frequency the generated was anyones
guess.

In general the 32.768kHz RTC clock x-tal was the best, because that
was a regular wrist-watch Xtal, whereas the X-tal generating the
ISA bus frequency was much lower quality, often costing as much as
several cents a piece, so it made good sense to check the ISA bus
frequency against the RTC clock.

The other relevant property of those machines, is that they
typically had no more than 16MB RAM.

These days, CPU chips croak if their clocks are not tightly within
specs and all necessary frequencies are derived from the master
crystal by means if PLL's.

Considering that it takes on average 1.5 second to calibrate the
frequency of the i8254 counter, that more likely than not, we will
not actually use the result of the calibration, and as the final
clincher, we seldom use the i8254 for anything besides BEL in
syscons anyway, it has become time to drop the calibration code.

If you need to tell the system what frequency your i8254 runs,
you can do so from the loader using hw.i8254.freq or using the
sysctl kern.timecounter.tc.i8254.frequency.


177643 26-Mar-2008 phk

Eliminate unnecessary #includes


177642 26-Mar-2008 phk

The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].


177631 26-Mar-2008 phk

Rename timer0_max_count to i8254_max_count.
Rename timer0_real_max_count to i8254_real_max_count and make it static.
Rename timer_freq to i8254_freq and make it a loader tunable.


177628 26-Mar-2008 phk

The RTC related pscnt and psdiv variables have no business being public.


177586 24-Mar-2008 jkim

Belatedly add BPF_JITTER in NOTES for supported architectures.


177535 23-Mar-2008 peter

First pass at (possibly futile) microoptimizing of cpu_switch. Results
are mixed. Some pure context switch microbenchmarks show up to 29%
improvement. Pipe based context switch microbenchmarks show up to 7%
improvement. Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.

Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
non-threaded userland apps. These typically cost 120 clock cycles each
on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no
faster on this.
- The above change only helps unthreaded userland apps that tend to use
the same value for gsbase. Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
prefetching a better chance of working. Operations are now in increasing
memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths. Hopefully
allowing better code density in cache lines. This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
realistic static branch prediction hint. Both Intel and AMD cpus
default to predicting branches to lower memory addresses as being
taken, and to higher memory addresses as not being taken. This is
overridden by the limited dynamic branch prediction subsystem. A trip
through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
in new operations. Hopefully this will allow the cpus to execute in
parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)

Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.

While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.

A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)

This is still work-in-progress.


177534 23-Mar-2008 alc

Correct an error in pmap_mincore() when applied to a 2MB page mapping:
Use PG_PS_FRAME, not PG_FRAME, to obtain the physical address of the
2MB physical page from the PDE.


177533 23-Mar-2008 peter

Export TDP_KTHREAD to asm files.


177532 23-Mar-2008 peter

Move pcb_flags to make trivially better use of cache lines.


177531 23-Mar-2008 peter

Protect the setting of the fsbase/gsbase MSR registers and the
pcb_[fg]sbase values with a critical section, like the rest of the kernel.


177529 23-Mar-2008 alc

To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set. However, these changes enable
me to remove a work-around from pmap_promote_pde(), the superpage
promotion procedure.

(Note: The AMD processors that we have tested, including the latest,
the Phenom, still exhibit the historical behavior.)

Acknowledgments: After I observed the problem, Stephan (ups) was
instrumental in characterizing the exact behavior of Intel's recent
TLBs.

Tested by: Peter Holm


177525 23-Mar-2008 kib

Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by: kris (i386/pmap.c:pmap_remove() part)
Reviewed by: alc
MFC after: 1 week


177468 20-Mar-2008 jhb

Explicitly use spinlock_enter/exit rather than locking the icu_lock spin
lock in the 8259A drivers as these drivers are only used on UP systems.
This slightly reduces the penalty of an SMP kernel (such as GENERIC) on
a UP x86 machine.


177467 20-Mar-2008 jhb

Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ
resource to a CPU. The default method is to pass the request up to the
parent similar to BUS_CONFIG_INTR() so that all busses don't have to
explicitly implement bus_bind_intr. A bus_bind_intr(9) wrapper routine
similar to bus_setup/teardown_intr() is added for device drivers to use.
Unbinding an interrupt is done by binding it to NOCPU. The IRQ resource
must be allocated, but it can happen in any order with respect to
bus_setup_intr(). Currently it is only supported on amd64 and i386 via
nexus(4) methods that simply call the intr_bind() routine.

Tested by: gallatin


177325 17-Mar-2008 jhb

Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.

Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}


177276 16-Mar-2008 pjd

Implement atomic_fetchadd_long() for all architectures and document it.

Reviewed by: attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)


177258 16-Mar-2008 rdivacky

Regen.


177257 16-Mar-2008 rdivacky

Implement sched_setaffinity and get_setaffinity using
real cpu affinity setting primitives.

Reviewed by: jeff
Approved by: kib (mentor)


177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


177181 14-Mar-2008 jhb

Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:

intr_bind(rman_get_start(irq_res), cpu);

however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.

Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}


177160 14-Mar-2008 jhb

Fix a silly bogon which prevented all the CPUs that are tagged as interrupt
receivers from being given interrupts if any CPUs in the system were not
tagged as interrupt receivers that I introduced when switching the x86
interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC
IDs. In practice this only affects systems with Hyperthreading (though
disabling HTT in the BIOS would workaround the issue) as that is the only
case currently where one can have CPUs that aren't tagged as interrupt
receivers. On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical
CPUs of which 2 were interrupt receivers) the result was that all
device interrupts were sent to CPU 0.

MFC after: 1 week
Pointy hat to: jhb


177157 13-Mar-2008 jhb

Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines. The existing code already handles
having two platforms: ACPI and legacy. However, the existing approach was
rather hardcoded and difficult to extend. These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device). This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it
can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
resource managers so that subclassed nexus(4) drivers can invoke it from
their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
attach routine.
- The ACPI driver no longer contains an new-bus identify method. Instead
it exposes a public function (acpi_identify()) which is a probe routine
that the MD nexus(4) drivers can use to probe for ACPI. All of the
probe logic in acpi_probe() is now moved into acpi_identify() and
acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
acpi_identify() and claims the nexus0 device if the probe succeeds. It
then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
This matches the previous behavior where the old acpi_identify() would
fail to add an acpi0 device again leaving you with no devices.

Discussed with: imp
Silence on: arch@


177145 13-Mar-2008 kib

Since version 4.3, gcc changed its behaviour concerning the i386/amd64
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.

jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.

Submitted by: Aurelien Jarno <aurelien aurel32 net>
PR: 121422
Reviewed by: jhb
MFC after: 2 weeks


177125 12-Mar-2008 jhb

The variable MTRR registers actually have variable-sized PhysBase and
PhysMask fields based on the number of physical address bits supported
by the current CPU. The old code assumed 36 bits on i386 and 40 bits on
amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.

In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.). The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.

Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available. If that
feature is not available, then assume 36 PA bits.

While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).

MFC after: 1 week
PR: i386/120516
Reported by: Nokia


177123 12-Mar-2008 jhb

Minimize diffs with i686_mem.c:
- A few whitespace changes I missed in the style(9) changes.
- Move M_MEMDESC to mem.c.


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


177070 11-Mar-2008 jhb

Style(9) these files. No changes in the compiled code. (Verified by
diff'ing objdump -d output).


177069 11-Mar-2008 jhb

Add constants for the various fields in MTRR registers.

MFC after: 1 week
Verified by: md5(1)


177041 10-Mar-2008 jhb

Probe CPUs after the PCI hierarchy on i386, amd64, and ia64. This allows
the cpufreq drivers to reliably use properties of PCI devices for quirks,
etc.
- For the legacy drivers, add CPU devices via an identify routine in the
CPU driver itself rather than in the legacy driver's attach routine.
- Add CPU devices after Host-PCI bridges in the acpi bus driver.
- Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and
check its device ID rather than having a bogus PCI attachment that only
checked for the ID in probe and always failed. As a side effect, you
can now kldload ichss after boot.
- Fix the ichss(4) driver to use the correct device_t for the ICH (and not
for ichss0) when doing PCI config space operations to enable SpeedStep.

MFC after: 2 weeks
Reviewed by: njl, Andriy Gapon avg of icyb.net.ua


177006 10-Mar-2008 jeff

- Rather than repeating the same preemption code everywhere call the scheduler
specific sched_preempt() routine.


176829 05-Mar-2008 rink

Import uslcom(4) from OpenBSD - this is a driver for Silicon Laboratories
CP2101/CP2102 based USB serial adapters.

Reviewed by: imp, emaste
Obtained from: OpenBSD
MFC after: 2 weeks


176803 04-Mar-2008 alc

Add support for automatic promotion of 4KB page mappings to 2MB page
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. (Expect this to change.)

Reviewed by: ups
Tested by: kris, Peter Holm


176734 02-Mar-2008 jeff

- Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.

Sponsored by: Nokia


176406 19-Feb-2008 ru

Eliminate whitespace diffs to the i386 version.


176304 15-Feb-2008 scottl

Teach the dump and minidump code to respect the maxioszie attribute of
the disk; the hard-coded assumption of 64K doesn't work in all cases.


176206 12-Feb-2008 scottl

If busdma is being used to realign dynamic buffers and the alignment is set to
PAGE_SIZE or less, the bounce page counting logic was flawed and wouldn't
reserve any pages. Adjust to be correct. Review of other architectures is
forthcoming.

Submitted by: Joseph Golio


176193 11-Feb-2008 jkim

Fix Linux mmap with MAP_GROWSDOWN flag.

Reported by: Andriy Gapon (avg at icyb dot net dot ua)
Tested by: Andriy Gapon (avg at icyb dot net dot ua)
Pointyhat: me
MFC after: 3 days


175915 03-Feb-2008 scottl

Remove the rr232x driver. It has been superceded by the hptrr driver.


175905 02-Feb-2008 das

Add a few more CPUID feature bits while here. We don't support these
features yet.


175904 02-Feb-2008 das

SSE4 CPUID bits


175859 31-Jan-2008 jhb

For no good reason I had assumed that ACPI table headers would be page
aligned (or at least not cross a page boundary). However, it turns out
that on at least one machine one table header does cross a page boundary.
This caused problems with the MADT early probe as it uses the crash dump
map to load ACPI tables by loading the RSDT/XSDT into pages 1 ... N and
loading the header of each ACPI table header into page 0 looking for the
MADT. However, if a table header crossed a page boundary, then page 1
would get trashed resulting in a panic. Fix this by reserving the first
2 pages for ACPI table headers (headers are less than a page in size,
so 2 pages will be sufficient) and use pages 2 .. N for the RSDT and XSDT.

Note: amd64 should probably be simplified to just use pmap_mapbios()
for all these tables which will use the direct map and not need the
crash dump hack.

MFC after: 5 days
Tested on: i386
Reported by: Pete French petefrench of ticketswitch.com


175846 31-Jan-2008 mav

Move GET_STACK_USAGE from MI header to i386/amd64 MD ones.
Somebody who can, please feel free to implement it for other archs
or copy this one if it suits.


175768 28-Jan-2008 ru

Add a wrapper function that bound checks writes to the dump device.


175405 17-Jan-2008 jhb

Use cpu_spinwait() (i.e., "pause") when spinning on rdtsc during DELAY().

MFC after: 1 week


175404 17-Jan-2008 alc

Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally
compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)

Eliminate dead code from pmap_enter(). This code implemented an assertion.
On i386, an equivalent check is already implemented. However, on amd64,
a small change is required to implement an equivalent check.

Eliminate \n from a nearby panic string.

Use KASSERT() to reimplement pmap_copy()'s two assertions.


175398 17-Jan-2008 bde

Translate from the i386. All FP constants and operations are evaluated
in the range and precision of their type(s) on amd64, but FLT_EVAL_METHOD
said that they were evalated in the "interesting" (buggy) i387 methods.
float_t was broken compatibly with FLT_EVAL_METHOD.

These definitions seem to be broken on powerpc and possibly on arm.
float_t is float on powerpc with gcc [-notraditional] according to
glibc, and FLT_EVAL_METHOD is marked with XXX on arm.


175325 14-Jan-2008 alc

Make pmap_is_prefaultable() more TLB friendly. Specifically, make it use
the kernel's direct map instead of the pmap's recursive mapping to access
the lowest level in the page table. The direct map is preferable for two
reasons: (1) The TLB is more likely to hold the required direct mapping
because pmap_enter() has already used the direct map to access a nearby
PTE and (2) loading a direct mapping into the TLB involves walking only 2
or 3 levels of the page table instead of 4.


175231 11-Jan-2008 bde

Fix fpset*() to not trap if there is a currently unmasked exception.
Unmasked exceptions (which can be fixed up using fpset*() before they
trap) are very rare, especially on amd64 since SSE exceptions trap
synchronously, but I want to merge the faster amd64 implementations of
fpset*() back to i386 without introducing the bug on i386.

The i386 implementation has always avoided the trap automatically by
changing things using load/store of the FP environment, but this is
very slow. Most changes only affect the control word, so they can
usually be done much more efficiently, and amd64 has always done this,
but loading the control word can trap.

This version use the fast method only in the usual case where it will
not trap. This only costs a couple of integer instructions (including
one branch which I haven't optimized carefully yet) in the usual case,
but bloats the inlines a lot. The inlines were already a bit too large
to handle both the FPU and SSE.


175228 11-Jan-2008 bde

Fix some style bugs:
- fix a previous style fix: shifts should be in the correct direction even
if they are null.
- restore a comment about namespace pollution from floatingpoint.h 1.12 and
update it.
- remove unused namespace pollution FP_*REG.
- improve some comments.
- sort macro definitions for entry points.
- don't use underscores for macro args.


175180 09-Jan-2008 bde

Simplify the ifdefs:
- fix this to compile with C++ by casting ints to enums in a few places
and by using the correct parameter type for _fpsetprec(). Remove
__cplusplus ifdefs which disabled the buggy code.
- remove __CC_SUPPORTS___INLINE ifdefs. `__inline' vs `inline', and either
of these #defined away, are supposed to be handled by very old ifdefs
in <sys/cdefs.h>. Thus the __CC_SUPPORTS___INLINE macro is not needed
here (or anywhere else that it used). It is less needed here than in
most places, since this file is userland-only and userland is far from
supporting INTEL_COMPILER. The __CC_SUPPORTS___INLINE__ macro which
was used here is even less needed. It is to support spelling `inline'
as `__inline__' instead of the usual spelling `__inline'.

Fix some style bugs that I missed in the previous commit (remove unused
asms and sort more variables).


175179 09-Jan-2008 bde

Fix some style bugs (mainly, use explicit shifts when accessing bit-fields
even if the shift count happens to be 0, sort declarations, and spell
__inline normally).


175178 09-Jan-2008 bde

Improve some comments.


175155 08-Jan-2008 alc

Convert a PMAP_DIAGNOSTIC to a KASSERT.


175147 07-Jan-2008 jhb

Add COMPAT_FREEBSD7 and enable it in configs that have COMPAT_FREEBSD6.


175119 06-Jan-2008 alc

Shrink the size of struct vm_page on amd64 and i386 by eliminating
pv_list_count from struct md_page. Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.


175067 03-Jan-2008 alc

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


175056 02-Jan-2008 alc

Provide a legitimate pindex to vm_page_alloc() in pmap_growkernel()
instead of writing apologetic comments. As it turns out, I need every
kernel page table page to have a legitimate pindex to support superpage
promotion on kernel memory.

Correct a nearby style error: Pointers should be compared to NULL.


174962 28-Dec-2007 rpaulo

Add asmc(4).

Requested by: njl (mentor)


174938 27-Dec-2007 alc

Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.


174898 25-Dec-2007 rwatson

Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.


174604 15-Dec-2007 scottl

Add the 'hptrr' driver for supporting the following Highpoint RocketRAID
cards:

o RocketRAID 172x series
o RocketRAID 174x series
o RocketRAID 2210
o RocketRAID 222x series
o RocketRAID 2240
o RocketRAID 230x series
o RocketRAID 231x series
o RocketRAID 232x series
o RocketRAID 2340
o RocketRAID 2522

Many thanks to Highpoint for their continued support of FreeBSD.

Submitted by: Highpoint


174557 12-Dec-2007 rpaulo

Disallow the legacy USB circuit to generate an SMI# via an ICH
register (MacBooks only).
This allows MacBooks to boot in SMP mode without any trick and solves
the timer problems with HZ=1000.

MFC after: 1 week

Reviewed by: njl (mentor), jhb
Approved by: njl (mentor), jhb


174496 09-Dec-2007 alc

Eliminate compilation warnings due to the use of non-static inlines
through the introduction and use of the __gnu89_inline attribute.

Submitted by: bde (i386)
MFC after: 3 days


174454 08-Dec-2007 alc

Use 1GB virtual pages to implement the direct map on architectures that
support this feature.

Wrap a nearby line that is too long.

MFC after: 6 weeks


174452 08-Dec-2007 alc

Recognize architectural support for 1GB virtual pages.

MFC after: 6 weeks


174395 07-Dec-2007 jkoshy

Kernel and hwpmc(4) support for callchain capture.

Sponsored by: FreeBSD Foundation and Google Inc.


174254 04-Dec-2007 kib

Fix the ABI change of the signal delivered on the access to the page
with insufficient protection mode.

For the i386 and amd64, create the tunable, machdep.prot_fault_translation,
with the following behaviour:
0 = autodetect the signal to be delivered on KERN_PROTECTION_FAILURE
from vm_fault based on the ELF OSABI note:
no note or __FreeBSD_version < 700004 - SIGBUS/BUS_PAGE_FAULT
note, and __FreeBSD_version >= 700004 - SIGSEGV/SEGV_ACCERR
1 = always SIGBUS/BUS_PAGE_FAULT
2 = always SIGSEGV/SEGV_ACCERR

This would do mostly automatic correction of ABI breakage, with the exception
of the untaged binaries for 7-CURRENT/RELENG_7 before the note is fixed. For
them, sysctl would allow to run the binary with manual settings.

Discussed with: portmgr (kris)
PR: kern/118304
MFC after: 3 days


174249 04-Dec-2007 alc

Style change: Use NULL rather than 0 where appropriate.


174195 02-Dec-2007 rwatson

Break out stack(9) from ddb(4):

- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)


174135 01-Dec-2007 phk

Remove XRPU driver, after asking all the users.


174104 30-Nov-2007 alc

Improve get_pv_entry()'s handling of low-memory conditions. After page
allocation fails and pv entries are reclaimed, there may be an unused pv
entry in a pv chunk that survived the reclamation. However, previously,
after reclamation, get_pv_entry() did not look for an unused pv entry in
a surviving pv chunk; it simply retried the page allocation. Now, it
does look for an unused pv entry before retrying the page allocation.

Note: This only applies to RELENG_7. Earlier branches use a different
pv entry allocator.

MFC after: 6 weeks


174067 29-Nov-2007 bde

Don't use plain "ret" instructions at targets of jump instructions,
since the branch caches on at least Athlon XP through Athlon 64 CPU's
don't understand such instructions and guarantee a cache miss taking
at least 10 cycles. Use the documented workaround "ret $0" instead
("nop; ret" also works, but "ret $0" is probably faster on old CPUs).

Normal code (even asm code) doesn't branch to "ret", since there is
usually some cleanup to do, but the __mcount, .mcount and .mexitcount
entry points were optimized too well to have the minimum number of
instructions (3 instructions each if profiling is not enabled) and
they did this. I didn't see a significant number of cache misses for
.mexitcount, but for the shared "ret" for __mcount and .mcount I
observed cache misses costing 26 cycles each. For a send(2) syscall
that makes about 70 function calls, the cost of these cache misses
alone increased the syscall time from about 4000 cycles to about 7000
cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled;
after this fix, configuring profiling only costs about 600 cycles in the
4000, which is consistent with almost perfect branch prediction in the
mcounting calls.


174066 29-Nov-2007 bde

Remove entry points for -finstrument functions since they are currently
unused except to obfuscate disassemblies. -mprofiler-epilogue is
currently with gcc-4 (it does too little), but -finstrument-functions
is broken in a different way (it does too much).

amd64 version: meger whitespace fixes from i386 version.


174056 28-Nov-2007 alc

Account for pv entry pages in the total number of wired pages. (Note: pv
entry pages have always been included in the total number of wired pages
on i386 just not amd64.)

MFC after: 6 weeks


174050 28-Nov-2007 jhb

Adjust the code to probe for the PCI config mechanism to use.
- On amd64, just assume type #1 is always used. PCI 2.0 mandated
deprecated type #2 and required type #1 for all future bridges which
was well before amd64 existed.
- For i386, ignore whatever value was in 0xcf8 before testing for type #1
and instead rely on the other tests to determine if type #1 works. Some
newer machines leave garbage in 0xcf8 during boot and as a result the
kernel doesn't find PCI at all (which greatly confuses ACPI which expects
PCI to exist when PCI busses are in the namespace).

MFC after: 3 days
Discussed with: scottl


174005 28-Nov-2007 attilio

Make ADAPTIVE_GIANT as the default in the kernel and remove the option.
Currently, Giant is not too much contented so that it is ok to treact it
like any other mutexes.

Please don't forget to update your own custom config kernel files.

Approved by: cognet, marcel (maintainers of arches where option is
not enabled at the moment)


173988 27-Nov-2007 jhb

Remove the 'needbounce' variable from the _bus_dmamap_load_buffer()
routine. It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.

MFC after: 1 week
Reviewed by: scottl


173855 23-Nov-2007 jkoshy

MFP4: Add assembly language symbols used by hwpmc(4)'s callchain capture.


173799 21-Nov-2007 scottl

Extend critical section coverage in the low-level interrupt handlers to
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.

Reviewed by: jhb, jeff, silby


173708 17-Nov-2007 alc

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


173659 15-Nov-2007 jhb

Add support for cross double fault frames in stack traces:
- Populate the register values for the trapframe put on the stack by the
double fault handler.
- Teach DDB's trace routine to treat a double fault like other trap frames.

MFC after: 3 days


173615 14-Nov-2007 marcel

o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().

i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().

PR: ia64/118024


173601 14-Nov-2007 julian

A bunch more files that should probably print out a thread name
instead of a process name.


173600 14-Nov-2007 julian

generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.


173491 08-Nov-2007 benjsc

Link wpi(4) into the build.

This includes:
o mtree (for legal/intel_wpi)
o manpage for i386/amd64 archs
o module for i386/amd64 archs
o NOTES for i386/amd64 archs

Approved by: mlaier (comentor)


173370 05-Nov-2007 alc

Add comments explaining why all stores updating a non-kernel page table
must be globally performed before calling any of the TLB invalidation
functions.

With one exception, on amd64, this requirement was already met. Fix this
one case. Also, as a clarification, change an existing atomic op into a
release. (Suggested by: jhb)

Reported and reviewed by: ups
MFC after: 3 days


173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


173296 03-Nov-2007 alc

Eliminate spurious "Approaching the limit on PV entries, ..."
warnings. Specifically, whenever vm_page_alloc(9) returned NULL to
get_pv_entry(), we issued a warning regardless of the number of pv
entries in use. (Note: The older pv entry allocator in RELENG_6 does
not have this problem.)

Reported by: Jeremy Chadwick

Eliminate the direct call to pagedaemon_wakeup() by get_pv_entry().
This was a holdover from earlier times when the page daemon was
responsible for the reclamation of pv entries.

MFC after: 5 days


173160 29-Oct-2007 peter

Move nvram out of DEFAULTS. There really isn't a lot of justification
for consuming the memory. The module works just fine in the unlikely
case that this is needed. It can still be compiled into a custom kernel.


173118 28-Oct-2007 jhb

- Add constants for the different memory types in the SMAP table.
- Use the SMAP types and constants from <machine/pc/bios.h> in the boot
code rather than duplicating it.


173061 27-Oct-2007 jhb

Don't test the APIC flag in the cpuid features for amd64 to see if a
local APIC is present or not. All amd64 CPUs have a local APIC and some
BIOSen don't set the CPUID_APIC flag.

MFC after: 1 week


172998 26-Oct-2007 peter

Split /dev/nvram driver out of isa/clock.c for i386 and amd64. I have not
refactored it to be a generic device.
Instead of being part of the standard kernel, there is now a 'nvram' device
for i386/amd64. It is in DEFAULTS like io and mem, and can be turned off
with 'nodevice nvram'. This matches the previous behavior when it was
first committed.


172997 26-Oct-2007 imp

Ooops. Put back Invariants and witness

Submitted by: csjp


172996 26-Oct-2007 imp

Add usb serial devices by default. I'm tired of telling people how to
do this that should know better :-).


172937 24-Oct-2007 jhb

Update copyright attribution.

MFC after: 3 days


172799 19-Oct-2007 kensmith

Switch over to ULE as the default scheduler for amd64 and i386
architectures.


172674 15-Oct-2007 netchild

Backout sensors framework.

Requested by: phk
Discussed on: cvs-all


172632 14-Oct-2007 netchild

Import it(4) and lm(4), supporting most popular Super I/O Hardware Monitors.

Submitted by: Constantine A. Murenin <cnst@FreeBSD.org>
Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors)
Mentored by: syrinx
Tested by: many
OKed by: kensmith
Obtained from: OpenBSD (parts)


172394 30-Sep-2007 marius

Make the PCI code aware of PCI domains (aka PCI segments) so we can
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.

Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)


172332 26-Sep-2007 brueffer

Use the correct expanded name for SCTP.

PR: 116496
Submitted by: koitsu
Reviewed by: rrs
Approved by: re (kensmith)


172317 25-Sep-2007 alc

Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)


172256 20-Sep-2007 attilio

Fix some entries in the locks static table of witness.
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
the table, however, has been the source of a false positive LOR reporting
with the dt_lock. However, smp rendezvous lock would have had sched_lock
there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
appropriate stub.

Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
declared as static.

Tested by: kris
Approved by: jeff (mentor)
Approved by: re


172255 20-Sep-2007 kib

Fill in cr2 in the signal context from ksi->ksi_addr.
Together with the sys/i386/i386/trap.c rev. 1.306 it fixes the PR.

Submitted by: rdivacky
Suggested by: jhb
Sponsored by: Google Summer of Code 2007
PR: kern/77710
Approved by: re (kensmith)


172220 18-Sep-2007 dwmalone

The kernel version of Linux statfs64 is actually supposed to take
3 arguments, but we had forgotten the second argument. Also make the
Linux statfs64 struct depend on the architecture because it has an
extra 4 bytes padding on amd64 compared to i386.

The three argument fix is from David Taylor, the struct statfs64
stuff is my fault. With this patch I can install i386 Linux matlab
on an amd64 machine.

Submitted by: David Taylor <davidt_at_yadt.co.uk>
Approved by: re (kensmith)


172212 17-Sep-2007 peter

Fix an undefined symbol that as/ld neglected to flag as a problem. It
was used in assembler code in such a way that no unresolved relocation
records were generated, so ld didn't flag the problem. You can see
this with an 'nm' of the kernel. There will be 'U MAXCPU' on SMP systems.

The impact of this is that the intrcount/intrnames arrays do not have
the intended amount of space reserved. This could lead to interesting
problems due to the arrays being present in the middle of kernel code.
An overflow would be rather interesting as executable code would be used
as per-cpu incrementing interrupt counters.

This fixes it for now by exporting MAXCPU to the assembler. A better fix
might be to define these data structures in C - they're only referenced
in the kernel from C code these days anyway.

Approved by: re (kensmith)


172207 17-Sep-2007 jeff

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


172189 15-Sep-2007 alc

It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)


172163 14-Sep-2007 attilio

Currently the LO_NOPROFILE flag (which is masked on upper level code by
per-primitive macros like MTX_NOPROFILE, SX_NOPROFILE or RW_NOPROFILE) is
not really honoured. In particular lock_profile_obtain_lock_failure() and
lock_profile_obtain_lock_success() are naked respect this flag.
The bug leads to locks marked with no-profiling to be profiled as well.
In the case of the clock_lock, used by the timer i8254 this leads to
unpredictable behaviour both on amd64 and ia32 (double faults panic,
sudden reboots, etc.). The amd64 clock_lock is also not marked as
not profilable as it should be.
Fix these bugs adding proper checks in the lock profiling code and at
clock_lock initialization time.

i8254 bug pointed out by: kris
Tested by: matteo, Giuseppe Cocomazzi <sbudella at libero dot it>
Approved by: jeff (mentor)
Approved by: re


172144 11-Sep-2007 attilio

This is a follow-up, cleaning-up commit about recent changes involving
topology foo functions.
Working at the patch for topology problems in ia32/amd64 evicted some
problems regarding functions ordering in the SI_SUB_CPU family of
SYSINIT'ed subsystems.
In order to avoid problems with new modified to involved functions, a
correct ordering is not semantically specified for SI_SUB_CPU functions
(for a larger view of the issue please visit:
http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html )

Discussed with: peter
Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org>
Approved by: jeff
Approved by: re


171999 28-Aug-2007 kib

Regenerate.

Approved by: re (kensmith)


171998 28-Aug-2007 kib

Implement fake linux sched_getaffinity() syscall to enable java to work
with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets
native scheduler affinity syscalls.

Submitted by: rdivacky
Reviewed by: jkim
Sponsored by: Google Summer of Code 2007
Approved by: re (kensmith)


171916 22-Aug-2007 jkoshy

Assign sizes to assembly language support functions.

Approved by: re (kensmith)


171914 22-Aug-2007 jkoshy

Define an END() macro for use in i386 and amd64 assembly code, akin
to the one available on the ia64, sparc64, and sun4v architectures.

Approved by: re (kensmith)


171907 21-Aug-2007 alc

In general, when we map a page into the kernel's address space, we no
longer create a pv entry for that mapping. (The two exceptions are
mappings into the kernel's exec and pipe submaps.) Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default. This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.

Approved by: re (kensmith)


171854 15-Aug-2007 des

Add a driver for the on-die digital thermal sensor found on Intel Core
and newer CPUs (including Core 2 and Core / Core 2 based Xeons). The
driver attaches to each cpu device and creates a sysctl node in that
device's sysctl context (dev.cpu.N.temperature). When invoked, the
handler binds to the appropriate CPU to ensure a correct reading.

Submitted by: Rui Paulo <rpaulo@fnop.net>
Sponsored by: Google Summer of Code 2007
Tested by: des, marcus, Constantine A. Murenin, Ian FREISLICH
Approved by: re (kensmith)
MFC after: 3 weeks


171702 02-Aug-2007 peter

Move mp_topology() from apic_init(i386) and apic_setup_local(amd64) to
cpu_start_mp(). This is after we have read the cpuid registers to
calculate the hyperthreading_cpus value for the sysctl that enables or
disables hyperthread cores. Change mp_topology() to use that information
rather than trying to do it itself.

This solves the problem of ULE being incorrectly told that dual core
Athlon64 X2 or Operton cpus are hyperthreading cores. At the very least,
we now have a single piece of code to identify hyperthreading.

Obtained from: jhb
Approved by: re (kensmith)


171597 26-Jul-2007 jhb

If the trap number stored in the trapframe is corrupted into a negative
value, then we would use a negative index into the trap_msg[] array
resulting in a nested page fault. Make the 'type' variable holding the
trap number unsigned to avoid this.

MFC after: 2 weeks
Approved by: re (rwatson)


171553 23-Jul-2007 dwmalone

If clock_ct_to_ts fails to convert time time from the real time clock,
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).

Approved by: re
MFC after: 3 weeks


171481 17-Jul-2007 jeff

- Optimize the amd64 cpu_switch() TD_LOCK blocking and releasing to
require fewer blocking loops.
- Don't use atomic ops with 4BSD or on UP.
- Only use the blocking loop if ULE is compiled in.
- Use the correct memory barrier.

Discussed with: attilio, jhb, ssouhlal
Tested by: current@
Approved by: re


171410 12-Jul-2007 jhb

Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
and use that in preference to maxssiz during exec() rather than always
using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
RLIMIT_STACK to determine if the existing mapping for the stack needs to
be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by: re (kensmith)


171216 04-Jul-2007 peter

Don't add the 'pad' argument to the mmap/truncate/etc syscalls.

Submitted by: kensmith
Approved by: re (kensmith)


171196 04-Jul-2007 bz

Temporary disconnect i4bing, i4bisppp and i4bipr from the build for
the 7.0 timeframe.

This is needed because I4B is not locked and NET_NEEDS_GIANT goes away.

The plan is to lock I4B and bring everything back for 7.1.

Approved by: re (kensmith)


171146 01-Jul-2007 njl

Revert previous commit, retaining cpufreq.

Approved by: re (implicitly)


171145 01-Jul-2007 njl

Add cpufreq(4) to GENERIC. It does not change the frequency by default,
so systems should be relatively unaffected. Users can then simply enable
powerd(8) in rc.conf to take advantage of it.

Approved by: re


171128 01-Jul-2007 alc

Pages that do belong to an object and page queue can now be freed without
holding the page queues lock. Thus, the page table pages released by
pmap_remove() and pmap_remove_pages() can be freed after the page queues
lock is released.

Approved by: re (kensmith)


170867 17-Jun-2007 mjacob

Check for pte being NULL in return from pmap_pte_pde- unlikely or
even impossible, but it's better ot have a panic and a quiesced
gcc4.2.


170866 17-Jun-2007 mjacob

Initialize lastaddr to zero to make gcc4.2 happy.


170802 15-Jun-2007 peter

Prototype (but functional) Linux-ish /dev/nvram interface to the extra
114 bytes of cmos ram in the PC clock chip. The big difference between
this and the Linux version is that we do not recalculate the checksums
for bytes 16..31.

We use this at work when cloning identical machines - we can copy the
bios settings as well. Reading /dev/nvram gives 114 bytes of data but
you can seek/read/write whichever bytes you like.

Yes, this is a "foot, gun, fire!" type of device.


170731 14-Jun-2007 delphij

Enable SCTP by default for GENERIC kernels in order to give it
more exposure. The current state of SCTP implementation is
considered to be ready for 32-bit platforms, but still need some
work/testing on 64-bit platforms.

Approved by: re (kensmith)
Discussed with: rrs


170594 12-Jun-2007 yongari

Add nfe(4) to the list of drivers supported by GENERIC kernel.
While I'm here comment out nve(4) as nfe(4) will take over.

Approved by: re


170564 11-Jun-2007 mjacob

Check against maxsegsz being zero in bus_dma_tag_create and return EINVAL
if it is.

Reviewed by: scott long


170552 11-Jun-2007 thompsa

Add wlan_scan_ap and wlan_scan_sta to platforms that include wlan.


170520 11-Jun-2007 marcel

Use default options for default partitioning schemes, rather than
making the relevant files standard. This avoids duplication and
makes it easier to override/disable unwanted schemes. Since ARM
doesn't have a DEFAULTS configuration file, leave the source
files for the BSD and MBR partitioning schemes in files.arm for
now.


170517 10-Jun-2007 attilio

Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)


170473 09-Jun-2007 marcel

Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.


170440 08-Jun-2007 rwatson

Enable AUDIT by default in the GENERIC kernel, allowing security event
auditing to be turned on without a kernel recompile, just an rc.conf
option.

Approved by: re (kensmith)
Obtained from: TrustedBSD Project


170368 06-Jun-2007 davidxu

Backout experimental adaptive-spin umtx code.


170340 05-Jun-2007 jhb

Move a warning under bootverbose as no machines that trigger it have ended
up being broken.


170310 05-Jun-2007 jeff

- Add a new argument to cpu_switch. This is a pointer to a mutex that
oldthread should point at before we return.
- When cpu_switch() is called the td_lock pointer in the old thread may
point at the blocked lock. This prevents other processors from
switching into this thread while we're still switching out. Wait
until we're done deactivating the vmspace before we release the
thread by assigning to td_lock.
- Before we can activate the new vmspace we must make sure that the new
thread is not assigned to the blocked lock. It may be in the process
of switching out on another cpu. Spin until the new thread is
available.


170309 05-Jun-2007 jeff

- Expose td_lock to assembly so it may be used in cpu_switch().


170307 05-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170305 04-Jun-2007 jeff

- Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170304 04-Jun-2007 jeff

Commit 11/14 of sched_lock decomposition.
- There is no globally visible scheduler lock any longer. For now the
watchdog can only check Giant. This model of checking particular locks
is flawed and should be revisited. Other metrics should be considered.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170303 04-Jun-2007 jeff

Commit 10/14 of sched_lock decomposition.
- Use sched_throw() rather than replicating the same cpu_throw() code for
each architecture. This also allows the scheduler to use any locking it
may want to.
- Use the thread_lock() rather than sched_lock when preempting.
- The scheduler lock is not required to synchronize release_aps.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170291 04-Jun-2007 attilio

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


170289 04-Jun-2007 dwmalone

Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported. In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.


170253 03-Jun-2007 alc

Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] and dump_avail[] using one of these
definitions.

Approved by: re


170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


170162 31-May-2007 piso

In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb


170150 31-May-2007 des

Add CPUID2_PDCM

Requested by: jkim
MFC after: 3 days


170135 30-May-2007 des

MFi386: PDCM, remove pointless message

MFC after: 3 days


170086 29-May-2007 yongari

Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by: scottl


170061 28-May-2007 simokawa

Enable fwip and dcons in GENERIC. They seem fairly stable.

Note on dcons:
To enable dcons in kernel, put the following lines in /boot/loader.conf.
You may also want to enable dcons in /etc/ttys.

boot_multicons="YES"
#Force dcons to be the high-level console if a firewire bus presents.
#hw.firewire.dcons_crom.force_console=1

FireWire/dcons support in loader will come shortly.
(i386/amd64 only)


170028 27-May-2007 rwatson

Remove "XXX Giant" comments before calls to kdb_trap() -- the kernel
debugger is quite capable of handling Giant-free execution at this
point. Several other similar comments remain in trap.c on both i386
and amd64 awaiting analysis.


169895 23-May-2007 kib

Move futex support code from <arch>/support.s into linux compat directory.
Implement all futex atomic operations in assembler to not depend on the
fuword() that does not allow to distinguish between -1 and failure return.
Correctly return 0 from atomic operations on success.

In collaboration with: rdivacky
Tested by: Scot Hetzel <swhetzel gmail com>, Milos Vyletel <mvyletel mzm cz>
Sponsored by: Google SoC 2007


169846 22-May-2007 kan

Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.


169805 20-May-2007 jeff

- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.

Suggested by: julian@
Contributed by: attilio@


169731 19-May-2007 kan

Remove extern struct pcpu __pcpu[]; from the header file and
move it the the only file where it appears to be used.


169730 19-May-2007 kan

Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct
into the valid C.


169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


169565 14-May-2007 jhb

Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after: 1 week


169458 11-May-2007 kan

Do not dereference linux_to_bsd_signal[-1] if userland has
passed zero as exit signal.

GCC 4.2 changes the kernel data segment layout not to have 0
in that memory location. This code ran by luck before and now
the luck has run out.


169434 10-May-2007 kevlo

Add wlan_amrr. ural(4) uses amrr as transmit rate control.


169421 09-May-2007 scottl

It turns out that the hptiop driver isn't portable after all. Confine it to
amd64 and i386 for now.


169412 09-May-2007 scottl

Introduce a driver for the Highpoint RocketRAID 3xxx series of controllers.
The driver relies on CAM.

Many thanks to Highpoint for providing this driver.


169395 08-May-2007 jhb

Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
starting any of the APs up rather than doing it while starting up the
APs. This step is now where we honor MAXCPU.

MFC after: 1 week


169391 08-May-2007 jhb

Minor fixes and tweaks to the x86 interrupt code:
- Split the intr_table_lock into an sx lock used for most things, and a
spin lock to protect intrcnt_index. Originally I had this as a spin lock
so interrupt code could use it to lookup sources. However, we don't
actually do that because it would add a lot of overhead to interrupts,
and if we ever do support removing interrupt sources, we can use other
means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
determine if a source is enabled or not. This allows us to notice when
a source is no longer in use. When that happens, we now invoke a new
PIC method (pic_disable_intr()) to inform the PIC driver that the
source is no longer in use. The I/O APIC driver frees the APIC IDT
vector when this happens. The MSI driver no longer needs to have a
hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
complement apic_enable_vector() and use it in the I/O APIC and MSI code
when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
IRQ to its irq_rman. The MSI code uses this when it creates new
interrupt sources to let the nexus know about newly valid IRQs.
Previously the msi_alloc() and msix_alloc() passed some extra stuff
back to the nexus methods which then added the IRQs. This approach is
a bit cleaner.
- Change the MSI sx lock to a mutex. If we need to create new sources,
drop the lock, create the required number of sources, then get the lock
and try the allocation again.


169320 06-May-2007 piso

Bring in the reminaing bits to make interrupt filtering work:

o push much of the i386 and amd64 MD interrupt handling code
(intr_machdep.c::intr_execute_handlers()) into MI code
(kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
(intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
and make them part of 'struct intr_event', passing them as arguments to
kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
with filter and ithread functions.

Approved by: re (implicit?)


169291 05-May-2007 alc

Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory. The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used. Previously,
only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR: 112194


169221 02-May-2007 jhb

Revamp the MSI/MSI-X code a bit to achieve two main goals:
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.

The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.

MFC after: 1 month


169042 25-Apr-2007 ariff

Disable C1 Enhanced mode on AMD K8 Family Revision F and above to keep
local APIC timer alive.

Reviewed by: jhb
PR: i386/104678
MFC after: 3 days


169030 24-Apr-2007 jhb

Fix the triple fault used as a last resort during a reboot to actually
fault. The previous method zero'd out the page tables, invalidated the
TLB, and then entered a spin loop. The idea was that the instruction after
the TLB invalidate would result in a page fault and the page fault and
subsequent double fault wouldn't be able to determine the physical page
for their fault handlers' first instruction. This stopped working when
PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3
reload doesn't clear TLB entries with PG_G set. Thus, the CPU was still
able to map the virtual address for the spin loop and happily performed
its infinite loop.

The triple fault now uses a much more deterministic sledge-hammer approach
to generate a triple fault. First, the IDT descriptor is set to point to
an empty IDT, so any interrupts (including a double fault) will instantly
fault. Second, we trigger a int 3 breakpoint to force an interrupt and
kick off a triple fault.

MFC after: 3 days


169029 24-Apr-2007 jhb

MFi386: Attempt to reset the machine using the Reset Control register and
Fast A20 and Init register if the keyboard reset doesn't work before
resorting to a triple fault.


168930 21-Apr-2007 ups

Modify TLB invalidation handling.

Reviewed by: alc@, peter@
MFC after: 1 week


168920 21-Apr-2007 sepotvin

Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc


168848 18-Apr-2007 jkim

Fix style(9) and comments.

Submitted by: Scot Hetzel (swhetzel at gmail dot com)


168844 18-Apr-2007 jkim

style(9) says sizeof's are not be followed by a space. Fix them.


168843 18-Apr-2007 jkim

Implement settimeofday() for Linuxulator/amd64.

Submitted by: Scot Hetzel (swhetzel at gmail dot com)


168822 17-Apr-2007 jhb

Honor the BUS_DMA_NOCACHE flag to bus_dmamem_alloc() on amd64 and i386 by
mapping the pages as UC (uncacheable) using pmap_change_attr().

MFC after: 1 week
Requested by: ariff
Reviewed by: scottl


168691 13-Apr-2007 alc

Eliminate the misuse of PG_FRAME to truncate a virtual address to a virtual
page boundary.

Reviewed by: ru@


168603 10-Apr-2007 pjd

Remove trailing '.' for consistency!


168594 10-Apr-2007 pjd

Add UFS_GJOURNAL options to the GENERIC kernel.

Approved by: re (kensmith)


168275 02-Apr-2007 jkim

MFP4: Turn emul_lock into a mutex.

Submitted by: rdivacky


168109 31-Mar-2007 jkim

Correct BB-profiling and adjust comments.

Pointed out by: bde
Reviewed by: bde


168102 30-Mar-2007 jkim

Fix off-by-4 error in address validation for i386, reduce PCB reloading, and
fix more style(9) nits.

Pointed out by: bde
Discussed with: kib
Reviewd by: bde


168088 30-Mar-2007 jkim

Fix more style(9) nits[1] and remove unnecessary use of '#if !defined(_KERNEL)'.

Pointed out by: bde[1]


168078 30-Mar-2007 jkim

Use the same wisdom of sys/i386/i386/support.s 1.97 to remove obfuscation.

Pointed out by: bde


168063 30-Mar-2007 jkim

MFP4: Fix style(9) nits and grammar in comments.


168056 30-Mar-2007 jkim

MFP4: 114193, 114194

Dont "return" in linux_clone() after we forked the new process in a case
of problems. Move the copyout of p2->p_pid outside the emul_lock coverage.

Submitted by: Roman Divacky


168037 30-Mar-2007 jkim

MFP4: Linux futex support for amd64.

Initial patch was submitted by kib and additional work was done
by Divacky Roman.

Tested by: emulation


168036 30-Mar-2007 jkim

Regen for set_thread_area.


168035 30-Mar-2007 jkim

MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.

Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by: emulation


168014 29-Mar-2007 julian

Implement the openat() linux syscall
Submitted by: Roman Divacky (rdivacky@)
MFC after: 2 weeks


167912 26-Mar-2007 kris

Remove unnecessary giant acquisition around panic in #ifdef DIAGNOSTIC
code.

# There is some question about whether this code is even relevant any
# longer (it dates back to prehistoric times, i.e. present in r1.1),
# especially on amd64.

Reviewed by: jhb


167905 26-Mar-2007 njl

Add an interface for drivers to be notified of changes to CPU frequency.
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change. cpufreq_post_change provides the results of the
change (success or failure). cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed. Hook
in all the drivers I could find that needed it.

* TSC: update TSC frequency value. When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq. This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.

Reviewed by: bde, phk
MFC after: 1 month


167814 22-Mar-2007 jkim

Catch up with ACPI-CA 20070320 import.


167767 21-Mar-2007 jhb

Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().

One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first. nyan@ has already fixed all of the PC-98
drivers.


167747 20-Mar-2007 jhb

Add a new apic0 psuedo-device to claim memory resources for the memory
address ranges used by local and I/O APICs in the system. Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.


167745 20-Mar-2007 jhb

Add a new ram0 pseudo-device that claims memory resouces for physical
addresses corresponding to system RAM. On amd64 ram0 uses the SMAP
and claims all the type 1 SMAP regions. On i386 ram0 uses the
dump_avail[] array. Note that on i386 we have to ignore regions above
4G in PAE kernels since bus resources use longs.


167744 20-Mar-2007 jkim

- Add macros for newly added CPUID bits in the corresponding header files.
- Use correct capticalization in xTPR as Intel uses in their documents.
- Use proper description instead of vendor code name in comment.


167742 20-Mar-2007 jhb

Tweak the probe/attach order of devices on the x86 nexus devices.
Various BIOS-related psuedo-devices are added at an order of 5. acpi0 is
added at an order of 10, and legacy0 is added at an order of 11.


167741 20-Mar-2007 jhb

MFi386 1.173: Display two new Intel feature bits.


167493 12-Mar-2007 jkim

Add another CPUID for AMD CPUs and fix style(9) while I am here.


167429 11-Mar-2007 alc

Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file. Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb


167423 10-Mar-2007 alc

Completely eliminate "avail_start". It serves no useful purpose.


167364 09-Mar-2007 jhb

Defer calling lapic_init() until we've completed the 'MPTable: <...>'
printf. Otherwise, printfs inside of lapic_init() (such as during a
verbose boot) can uglify the output.


167352 09-Mar-2007 mohans

Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.


167277 06-Mar-2007 scottl

Don't increment total_bounced when doing no-op dmamap_sync ops.


167273 06-Mar-2007 jhb

Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid))
rather than local APIC IDs to keep track of CPUs which can handle
interrupts.


167250 05-Mar-2007 alc

Acquiring smp_ipi_mtx on every call to pmap_invalidate_*() is wasteful.
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor. This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.

Reviewed by: jhb
MFC after: 3 weeks


167247 05-Mar-2007 jhb

Use vm_paddr_t rather than uintptr_t when passing the physical address of
APICs to lapic_init() and ioapic_create().


167240 05-Mar-2007 jhb

Add a simple device driver to "eat" any I/O APICs that show up as PCI
devices.

MFC after: 1 week


167157 02-Mar-2007 jkim

MFP4: 115220, 115222

- Fix style(9) and reduce diff between amd64 and i386.
- Prefix Linuxulator macros with LINUX_ to prevent future collision.


167048 27-Feb-2007 jkim

MFP4: 115094

Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.

Reported by: Scot Hetzel (swhetzel at gmail dot com)
netchild


166944 24-Feb-2007 netchild

Partial MFp4 of 114977:
Whitespace commit: Fix grammar, spelling and punctuation.

Submitted by: "Scot Hetzel" <swhetzel@gmail.com>


166922 23-Feb-2007 jhb

Use ih_filter instead of ih_handler in a couple of places. This fixes
most INTR_FAST handlers on i386.

Reviewed by: piso


166901 23-Feb-2007 piso

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


166823 19-Feb-2007 kib

MFi386 rev. 1.544 of i386/i386/pmap.c:
Rounding addr upwards to next 2M boundary in pmap_growkernel() could
cause addr to become 0, resulting in an early return without populating
the last PDE.

Reported and tested by: kris
Suggested by: alc
MFC after: 1 week


166810 18-Feb-2007 alc

Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.


166776 15-Feb-2007 jhb

Add bootverbose printfs to indicate which IDT vectors are assigned to MSI
interrupts.


166731 15-Feb-2007 jkim

Fix accidental removal of an empty line from the previous commit.


166730 15-Feb-2007 jkim

Regen.


166729 15-Feb-2007 jkim

MFP4: 113033

Port iopl(2) from i386. This fixes LTP iopl01 and iopl02 on amd64.


166727 15-Feb-2007 jkim

MFP4: 113025, 113146, 113177, 113203, 113500, 113546, 113570

- PROT_READ, PROT_WRITE, or PROT_EXEC implies PROT_READ and PROT_EXEC.
Linux/ia64's i386 emulation layer does this and it complies with Linux
header files. This fixes mmap05 LTP test case on amd64.
- Do not adjust stack size when failure has occurred.
- Synchronize i386 mmap/mprotect with amd64.


166604 09-Feb-2007 brooks

Include GEOM_LABEL in GENERIC. It's very useful and not well publicized
enough.

Approved by: pjd


166569 08-Feb-2007 jhb

Don't send interrupts to CPUs disabled via lapic hints.

Reported by: Ludger Bolmerg <lbolmerg ! web.de>
MFC after: 3 days
Pointy hat to: jhb


166551 07-Feb-2007 marcel

Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.


166540 06-Feb-2007 bde

Fixed some style bugs. Routine except:
- don't use __GNUCLIKE___OFFSETOF, since __offsetof() is a standard
FreeBSD implementaion detail which has nothing to do with GNUC.


166536 06-Feb-2007 bde

Simplified PCPU_GET() and PCPU_SET(). We must copy through a temporary
variable to avoid invalid constraints in dead code. Use an array of
u_char's (inside a struct) instead of a char/short/int/long variable so
that the variable and its accesses can be spelled in the same way in all
cases and code doesn't need to be cloned just to hold the spelling
differences.

Fixed strict-aliasing errors in PCPU_SET() and in the amd64 PCPU_GET().
Cast to (void *) as in rev.1.37 of the i386 version where the errors
were fixed for the i386 PCPU_GET() only. It would be more correct to
copy to and from the temp. variable using memcpy(), but then an
ifdef tangle would be required to ensure using the builtin memcpy().
We depend on fairly aggressive optimization to put the temp. variable
only in a register despite it being copied using
*(type *)(void *)&anothertype and could depend on this when using
memcpy() too. This seems to work right even for -O0, but the -O0 case
has not been completely tested.

This change gives identical object code for all object files in LINT
on amd64 (except for one file with a __TIME__ stamp). For LINT on
i386 it gives unimportant differences in instruction order and padding
in a few object files. This was only tested for -O.

This change (actually a previous version of it) gives the following
reductions in the number of object files in LINT that fail to compile
with -O2 but without the -fno-strict-aliasing kludge:
- amd64: 29 (down from 211)
- i386: 36 (down from 47)

gcc-3.4.6 actually allows the invalid constraints that result from not
using the temp. variable, at least with -O[1-2], but gcc-3.3.3 crashes
on them and I don't want to depend on compiler bugs.


166520 05-Feb-2007 jhb

Change GDB_BUFSZ to be large enough to hold a register dump where each
register takes 16 characters (64-bit register in hex). In practice this
is a slight bit of overkill as 7 of the 56 registers are only 32-bit, but
having the buffer too small results in remote kgdb trashing kernel memory
when it connects.

PR: amd64/108673
Submitted by: Ravi Murty, Nikhil Rao @ Intel
MFC after: 3 days


166398 01-Feb-2007 kib

Introduce some more SO_ option equivalents from Linux to FreeBSD.

The msg variable in linux_recvmsg() was not initialized.
Copy it from userspace.

Submitted by: rdivacky


166395 01-Feb-2007 kib

Fix LOR that occurs because proctree_lock was acquired while holding
emuldata lock by moving the code upwards outside the emul_lock coverage.

Submitted by: rdivacky


166394 01-Feb-2007 kib

MFi386: Use LINUX_SIG_VALID macro.

Submitted by: rdivacky


166283 27-Jan-2007 jkoshy

Use a known good stack at the time of servicing an NMI --- reuse
the space allocated for the double fault handler since this space
is otherwise unused till the time a double fault occurs.

This change should have been committed alongside r1.127 of
"exception.S", but I somehow missed doing so.

Problem reported by: jeff
Pointy hat to: jkoshy


166188 23-Jan-2007 jeff

- Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty. The few asserts and thread state
setting were moved to the individual schedulers. sched_add() was
chosen to displace it for naming consistency reasons.
- Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
different on all three schedulers where it was only called in one place
each.
- Remove the long ifdef'd out remrunqueue code.
- Remove the now redundant ts_state. Inspect the thread state directly.
- Don't set TSF_* flags from kern_switch.c, we were only doing this to
support a feature in one scheduler.
- Change sched_choose() to return a thread rather than a td_sched. Also,
rely on the schedulers to return the idlethread. This simplifies the
logic in choosethread(). Aside from the run queue links kern_switch.c
mostly does not care about the contents of td_sched.

Discussed with: julian

- Move the idle thread loop into the per scheduler area. ULE wants to
do something different from the other schedulers.

Suggested by: jhb

Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.


166187 23-Jan-2007 jeff

- Allow the schedulers to IPI_PREEMPT idlethread. This puts the decision
for this behavior on the initiator side.


166186 23-Jan-2007 bde

Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize. There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work. Split it up a bit more and do the first part as
late as possible to document the necessary order. The functions that
implement the split are still bogusly exported.

Cleaned up initialization of the i8254 clock hardware using the new
split. Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.

This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug. The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.


166176 22-Jan-2007 jhb

Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support.
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.

Tested by: scottl


166150 20-Jan-2007 netchild

MFp4 (113077, 113083, 113103, 113124, 113097):

Dont expose em->shared to the outside world before its properly
initialized. Might not affect anything but its at least a better
coding style.

Dont expose em via p->p_emuldata until its properly initialized.
This also enables us to get rid of some locking and simplify the
code because we are workin on a local copy.

In linux_fork and linux_vfork create the process in stopped state
to be sure that the new process runs with fully initialized emuldata
structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
race that could result in never woken up process [2].

Reported by: Scot Hetzel [1]
Suggested by: jhb [2]
Reviewed by: jhb (at least some important parts)
Submitted by: rdivacky
Tested by: Scot Hetzel (on amd64)

Change 2 comments (in the new code) to comply to style(9).

Suggested by: jhb


166082 18-Jan-2007 rodrigc

Revert previous change.

Requested by: kan


166078 18-Jan-2007 rodrigc

Forward declare __pcpu as a pointer type instead of an array type to
eliminate GCC 4.1 error: "array type has incomplete element type".


166007 14-Jan-2007 netchild

MFp4 (112893):
Make linux_vfork() actually work. This enables make to work again with 2.6.
It also fixes the LTP vfork tests.

Submitted by: rdivacky


165967 12-Jan-2007 imp

Remove 3rd clause, renumber, ok per email


165947 11-Jan-2007 jhb

Remove magic from rman_activate_resource() that uses the direct map at
KERNBASE for the first 1 MB of RAM instead of calling pmap_mapdev().
pmap_mapdev() knows how to handle the first 1 MB (and has known for a
while now) and properly maps the memory as UC to boot.

MFC after: 2 weeks


165929 11-Jan-2007 jeff

- Use the correct test in the ipi bitmask handler for IPI_PREEMPT so that
we actually issue preemptions.
- Remove the #ifdef IPI_PREEMPTION so it is always compiled in. Leave
the option which optionally enables support in sched_4bsd. sched_ule.c
will soon use this functionality as a run time rather than compile time
option.
- Compare against the idlethread rather than the priority. There are some
idle prio tasks that we can preempt.

Discussed with: ups
Tested on: i386, amd64


165918 09-Jan-2007 jkim

Add SSSE3 extensions and correct CNXT-ID spelling for Intel processors.


165867 07-Jan-2007 netchild

MFp4 (112498):
Rename the locking flags to EMUL_DOLOCK and EMUL_DONTLOCK to prevent confusion.

Submitted by: rdivacky


165832 06-Jan-2007 netchild

MFi386 rev 1.56:
Bring the linux mmap code more into line with how linux (2.4.x) behaves.

Tested by: Scot Hetzel <swhetzel@gmail.com> on amd64 without PROT_EXEC

Additionally to the i386 version always use PROT_EXEC in the mapping like the
previous version of the amd64 code did. We need to examinate this further to
decide what the right thing to do is. For now this fixes several problems in
the LTP test runs and should behave regarding PROT_EXEC like before.


165690 31-Dec-2006 netchild

regen after addition of linux_utimes and linux_rt_sigtimedwait


165689 31-Dec-2006 netchild

MFp4 (111746, 108671, 108945, 112352):
- add linux utimes syscall [1]
- add linux rt_sigtimedwait syscall [2]

Submitted by: "Scot Hetzel" <swhetzel@gmail.com> [1]
Submitted by: Bruce Becker <hostmaster@whois.gts.net> [2]
PR: 93199 [2]


165635 29-Dec-2006 bde

Fixed some style bugs (mainly assorted errors in comments, and inconsistent
spelling of `result').


165633 29-Dec-2006 bde

Fixed some style bugs (whitespace only).


165630 29-Dec-2006 bde

Try harder to garbage-collect the "LOCORE" (really asm) version of
MPLOCKED. The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors. The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.


165610 29-Dec-2006 rwatson

Regenerate.


165609 29-Dec-2006 rwatson

Assign or clean up audit identifiers for a number of additional Linux
system calls on the amd64 architecture.

Some minor white space tweaks for consistency with other syscalls.master
files.

Obtained from: TrustedBSD Project


165578 28-Dec-2006 bde

Removed gratuitous cosmetic differences with the i386 version. This
mainly involves removing all __CC_SUPPORTS___INLINE__ ifdefs. These
ifdefs are even less needed for amd64 than for i386, but the i386
atomic.h never had them. The ifdefs here were just an optimization
of obsolescent compatibility cruft (__inline) for a null set of
compilers. I think null sets of compilers should only be supported
in cases where this is more than an optimization, doesn't require
extensive ifdefs, and only involves not-so-obsolescent compatibility
cruft (plain inline here).


165572 27-Dec-2006 bde

Avoid an instruction in atomic_cmpset_{int_long)() in most cases.
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%. This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).

Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register. We converted to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax. Let
gcc promote %al in the few cases where this is needed.

Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
%al anywhere, so don't hard-code it in the constraints either.

Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
for the main output operand, although this is not required. The input
and output operands that use the "a" constraint are now decoupled, and
this makes things clearer except for the reason that the output register
is hard-coded. It is now just a hack to tell gcc that the input "a" has
been clobbered without increasing the number of operands.


165479 23-Dec-2006 davidxu

Fix a panic when rebooting a SMP machine, when option STOP_NMI is used,
nmi handler is used to stop other processors, nmi hander calls trap(),
however, trap() now accepts a pointer rather than a reference, this was
changed by kmacy@.


165408 20-Dec-2006 jkim

MFP4: 109655

- Move linux_nanosleep() from src/sys/amd64/linux32/linux32_machdep.c to
src/sys/compat/linux/linux_time.c.
- Validate timespec ranges before use as Linux kernel does.
- Fix l_timespec structure.
- Clean up style(9) nits.


165369 20-Dec-2006 davidxu

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


165310 17-Dec-2006 kmacy

Evidently neither GENERIC nor kan's config had isa in it :-0. As
Doug Barton says, "embrace the LINT".


165303 17-Dec-2006 kmacy

Newer versions of gcc don't support treating structures passed by value
as if they were really passed by reference. Specifically, the dead stores
elimination pass in the GCC 4.1 optimiser breaks the non-compliant behavior
on which FreeBSD relied. This change brings FreeBSD up to date by switching
trap frames to being explicitly passed by reference.

Reviewed by: kan
Tested by: kan


165148 13-Dec-2006 yongari

Add msk(4) to the list of drivers supported by GENERIC kernel.


165128 12-Dec-2006 jhb

Give Host-PCI bridge drivers their own pcib_alloc_msi() and
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.


165127 12-Dec-2006 jhb

Sort function prototypes.


165125 12-Dec-2006 jhb

Add a function to return the MD interrupt source cookie associated with
an interrupt event. Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.


164951 06-Dec-2006 sobomax

Allow machdep.cpu_idle_hlt to be set from the loader. This should allow
to workaround the problem with SMP kernels on Turion64 X2 processors
described in kern/104678 and may be useful in other situations too.

MFC after: 3 days


164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


164912 05-Dec-2006 ru

Use a different bitmask for superpages' base address so that it
doesn't conflict with the PG_PDE_PAT bit. (We still don't mask
off all the reserved bits but that's okay for now.)

Reviewed by: alc


164860 03-Dec-2006 netchild

MFP4 (110939):

MFi386: return EOPNOTSUPP for unknown module events.

Submitted by: rdivacky


164859 03-Dec-2006 netchild

Sync with i386 (remove the LINUX stuff) now that the module is usable.


164841 03-Dec-2006 bde

Optimized RTC accesses by avoiding null writes to the index register
and by only delaying when an RTC register is written to. The delay
after writing to the data register is now not just a workaround.

This reduces the number of ISA accesses in the usual case from 4 to
1. The usual case is 2 rtcin()'s for each RTC interrupt. The index
register is almost always RTC_INTR for this. The 3 extra ISA accesses
were 1 for writing the index and 2 for delays. Some delays are needed
in theory, but in practice they now just slow down slow accesses some
more since almost eveyone including us does them wrong so modern systems
enforce sufficient delays in hardware. I used to have the delays ifdefed
out, but with the index register optimization the delays are rarely
executed so the old magic ones can be kept or even implemented non-
magically without significant cost.

Optimizing RTC interrupt handling is more interesting than it used to
be because RTC interrupts are currently needed to fix the more efficient
apic timer interrupts on some systems. apic_timer_hz is normally 2000
so the RTC interrupt rate needs to be 2048 to keep the apic timer
firing on such systems. Without these changes, each RTC interrupt
normally took 10 ISA accesses (2 PIC accesses and 2 sets of 4 RTC
accesses). Each ISA access takes 1-1.5uS so 10 of then at 2048 Hz
takes 2-3% of a CPU. Now 4 of them take 0.8-1.2% of a CPU.


164760 30-Nov-2006 jb

Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.


164726 28-Nov-2006 ru

Differentiate between data and instruction fetch in the fatal
page fault trap handler.

Reviewed by: alc


164565 23-Nov-2006 ru

Use a define instead of a "magic" value.


164564 23-Nov-2006 ru

Finish the PG_NX support at the pmap level.

Reviewed by: alc


164505 22-Nov-2006 ru

It's been possible to build linprocfs as a module for some time now.

Submitted by: rdivacky


164413 19-Nov-2006 alc

The global variable avail_end is redundant and only used once. Eliminate
it. Make avail_start static to the pmap on amd64. (It no longer exists
on other architectures.)


164365 17-Nov-2006 jhb

Add support for 8 byte hardware watches in long mode. Kernel hardware
watches support 8 byte watches. For userland, we disallow 8 byte watches
for 32-bit tasks.


164362 17-Nov-2006 jhb

- Add macro constants for the various fields in %dr7 and use them in place
of various scattered magic values.
- Pretty print the address of hardware watchpoints in 'show watch' rather
than just displaying hex.
- Expand address field width on amd64 for 64-bit pointers.


164358 17-Nov-2006 jhb

Trim some noise from bootverbose:
- Drop the printf in intr_machdep.c when we assign an interrupt souce to
a CPU. Each source already has a more detailed printf.
- Don't output a line for each ioapic pin showing its initial state, this
has outlived its usefulness.
- When an APIC enumerator sets the bus, polarity, or trigger mode of an
ioapic pin, just return success without printing anything if the new
value matches the current one.

MFC after: 2 weeks


164357 17-Nov-2006 jhb

A few more style fixes.


164303 15-Nov-2006 jhb

Various whitespace and style fixes.


164301 15-Nov-2006 jhb

Fix a typo that broke MSI (MSI-X worked fine) in the later revisions of
the MSI patches.


164265 13-Nov-2006 jhb

MD support for PCI Message Signalled Interrupts on amd64 and i386:
- Add a new apic_alloc_vectors() method to the local APIC support code
to allocate N contiguous IDT vectors (aligned on a M >= N boundary).
This function is used to allocate IDT vectors for a group of MSI
messages.
- Add MSI and MSI-X PICs. The PIC code here provides methods to manage
edge-triggered MSI messages as x86 interrupt sources. In addition to
the PIC methods, msi.c also includes methods to allocate and release
MSI and MSI-X messages. For x86, we allow for up to 128 different
MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs,
16-254 for APIC PCI IRQs, and IRQ 255 is reserved).
- Add pcib_(alloc|release)_msi[x]() methods to the MD x86 PCI bridge
drivers to bubble the request up to the nexus driver.
- Add pcib_(alloc|release)_msi[x]() methods to the x86 nexus drivers that
ask the MSI PIC code to allocate resources and IDT vectors.

MFC after: 2 months


164263 13-Nov-2006 jhb

Various fixes:
- Remove an extra entry from the array for 0x0f prefixed instruction groups.
This fixes decoding of instructions where the second opcode >= 0x80.
- Add support for the 64-bit immediate mov instructions.
- When short_addr is enabled, don't parse the modr/m byte for a 16-bit
address, but as a 32-bit address.
- Support %rip relative addressing.
- Don't print a displacement of 0 if there is a base or index register.

MFC after: 3 days


164262 13-Nov-2006 ru

Fix NKPT comments to match reality. Note that the current value
of NKPT is no longer enough to run amd64 with 16G of RAM, as it
doesn't have space for mapping a kernel (16M kernel would require
additionally 8 page tables).


164250 13-Nov-2006 ru

Fix a comment.


164229 12-Nov-2006 alc

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


164199 11-Nov-2006 ru

Regen.

Forgotten by: trhodes


164078 07-Nov-2006 ru

Spelling.


164077 07-Nov-2006 ru

Line up memory amount reporting that got broken when s/real/usable/.


164066 07-Nov-2006 jhb

Add a new 'union l_sigval' to use in place of 'union sigval' in the
linux siginfo structure. l_sigval uses a l_uintptr_t for sival_ptr so
that sival_ptr is the right size for linux32 on amd64. Since no code
currently uses 'lsi_ptr' this is just a cosmetic nit rather than a bug
fix.


164064 07-Nov-2006 jhb

Remove duplicate IDTVEC macro definition, it's already defined in
<machine/intr_machdep.h>.


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163987 04-Nov-2006 jb

Remove the KDTRACE option again because of the complaints about having
it as a default.

For the record, the KDTRACE option caused _no_ additional source files
to be compiled in; certainly no CDDL source files. All it did was to
allow existing BSD licensed kernel files to include one or more CDDL
header files.

By removing this from DEFAULTS, the onus is on a kernel builder to add
the option to the kernel config, possibly by including GENERIC and
customising from there. It means that DTrace won't be a feature
available in FreeBSD by default, which is the way I intended it to be.

Without this option, you can't load the dtrace module (which contains
the dtrace device and the DTrace framework). This is equivalent to
requiring an option in a kernel config before you can load the linux
emulation module, for example.

I think it is a mistake to have DTrace ported to FreeBSD, but not
to have it available to everyone, all the time. The only exception
to this is the companies which distribute systems with FreeBSD embedded.
Those companies will customise their systems anyway. The KDTRACE
option was intended for them, and only them.


163972 04-Nov-2006 jb

Build in kernel support for loading DTrace modules by default. This
adds the hooks that DTrace modules register with, and adds a few functions
which have the dtrace_ prefix to allow the DTrace FBT (function boundary
trace) provider to avoid tracing because they are called from the DTtrace
probe context.

Unlike other forms of tracing and debug, DTrace support in the kernel
incurs negligible run-time cost.

I think the only reason why anyone wouldn't want to have kernel support
enabled for DTrace would be due to the license (CDDL) under which DTrace
is released.


163858 01-Nov-2006 jb

Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by: scottl, jhb
MFC after: 2 weeks


163829 31-Oct-2006 kib

Fix a typo resulting in truncated linux32 signal trampoline code copied
to the usermode. Usually, signal handler segfaulted on return.

Reviewed by: jhb
MFC after: 3 days


163761 29-Oct-2006 netchild

regen after linux_io_* backout


163760 29-Oct-2006 netchild

Backout the linux aio stuff. Several problems where identified and the
dynamic nature (if no native aio code is available, the linux part
returns ENOSYS because of missing requisites) should be solved differently
than it is.

All this will be done in P4.

Not included in this commit is a backout of the changes to the native aio
code (removing static in some places). Those changes (and some more) will
also be needed when the reworked linux aio stuff will reenter the tree.

Requested by: rwatson
Discussed with: rwatson


163756 29-Oct-2006 bde

Removed some SMP ifdefs so that using the TSC as a cputime clock is
not completely decided at config time. Just don't default to using
the TSC if there are multiple active CPUs. Also, don't default to
using the TSC if it is broken. SMP ifdefs are still used to disallow
using perfmon since perfmon is always broken if SMP is just configured.

This only helps much for SMP kernels running on 1 CPU. The overheads
for using the i8254 cputime clock were a bit too high on 486/33's, and
now on multi-GHz CPUs they are usually in the 99-99.9% range. Switching
from the old default of an i8254 clock to the TSC works poorly because
the overheads are not recalibrated.

Use the same condition for declaring perfmon stuff as for using it.


163738 28-Oct-2006 bde

In the userland .mcount():
- Don't use a frame pointer. Our callers need a frame pointer, but we
could only use one to support things that aren't supported. (These
things are:
- profiling of profiling
- debugging of profiling. The core ENTRY() macro doesn't support
forcing a frame pointer for debugging, so don't do more here.)
- Ensure that we are in the text section and have normal alignment.
- Use the normal syntax for `.type'.


163736 28-Oct-2006 netchild

regen (prctl addition)


163735 28-Oct-2006 bde

i386/include/profile.h:
Fixed a syntax error for the (!__KERNEL && !__GNUCLIKE_ASM) case in
rev.1.36. Apparently, this case has never been reached even by lint.

Submitted by: stefanf

{amd64,i386}/include/profile.h:
In case the above case is actually reached, break it properly by
providing null support that will fail at link time instead of a stub
that gives wrong (null) profiling at runtime.


163734 28-Oct-2006 netchild

MFP4:
Implement prctl().

Submitted by: rdivacky
Tested with: LTP


163729 28-Oct-2006 bde

In MCOUNT_OVERHEAD(label), actually use the `label' parameter. We were
still using the global label named "profil", and this worked accidentally
because all callers use the same name.


163727 28-Oct-2006 bde

Cleaned up includes. <machine/profile.h> was unused. <machine/timerreg.h>
was only used in the GUPROF case, so the messes to get its i386 prerequisites
included shouldn't have been needed.

Fixed some style bugs. Quote #error contents, and don't repeat an #error
directive on amd64.


163726 28-Oct-2006 bde

Removed all traces of HIDENAME() in amd64 and i386 kernel code. Using
this used to be slightly cleaner than using ifdefs in a few places to
support both a.out and elf, but using it now just causes messes and
unportabilities. It seems to be impossible to implement the elf
HIDENAME() portably in cpp (since token pasting of "." and <name> is
invalid).

*/prof_machdep.c:
- Removed all uses of CNAME(). CNAME() is easy enough to use in pure
asm code, but using it in inline asm requires messy quoting. The
core pure asm code has been hacked on more and all uses of CNAME() in
it have already gone away. Just assume the elf convention here too.
- Removed now-uneeded include of <machine/asmacros.h>.
- Removed the workaround for a namespace conflict with this include.


163722 27-Oct-2006 bde

Don't call mexitcount or provide a stub mexitcount to call when
profiling is configured but high resolution profiling is not configured.
Only functions in *.[Ss] called the stub, so efficiency was not
significantly affected.


163711 26-Oct-2006 jb

Remove the KSE option now that it's in DEFAULTS on these arches/machines.

The 'nooption' kernel config entry has to be used to turn KSE off now.
This isn't my preferred way of dealing with this, but I'll defer to
scottl's experience with the io/mem kernel option change and the grief
experienced over that.

Submitted by: scottl@


163710 26-Oct-2006 jb

Add 'options KSE' to the kernel config DEFAULTS on all arches/machines
except sun4v.

This change makes the transition from a default to an option more
transparent and is an attempt to head off all the compliants that are
likely from people who don't read UPDATING, based on experience with
the io/mem change.

Submitted by: scottl@


163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


163630 23-Oct-2006 ru

Move "device splash" back to MI NOTES and "files", it's MI.


163603 22-Oct-2006 alc

Eliminate unnecessary PG_BUSY tests.


163567 21-Oct-2006 ru

MFi386: 1.13: Fix booting with ps2 keyboards.


163535 20-Oct-2006 des

Move more MD devices and options out of MI NOTES.


163534 20-Oct-2006 bde

Don't show debug registers in "show registers". Special registers should
be displayed specially, and debug registers are among of the least
interesting special registers (far behind %cr3). The debug registers
are still accessible as variables and displayed in another bogus place
("show watches").


163531 20-Oct-2006 des

The VGA_DEBUG option only exists on {amd64,i386,ia64}.
Also remove 'device io' from amd64 NOTES; DEFAULTS takes care of it.


163494 19-Oct-2006 imp

Remove references to pccard.conf


163449 17-Oct-2006 davidxu

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


163442 16-Oct-2006 jhb

Add one more include to fix the case of !DDB and !atpic.


163386 15-Oct-2006 hrs

Add a newline to the printf().

Spotted by: Peter Carah <pete@altadena.net>
MFC after: 3 days


163380 15-Oct-2006 netchild

regen (linux AIO stuff)


163379 15-Oct-2006 netchild

MFP4 (with some minor changes):

Implement the linux_io_* syscalls (AIO). They are only enabled if the native
AIO code is available (either compiled in to the kernel or as a module) at
the time the functions are used. If the AIO stuff is not available there
will be a ENOSYS.

From the submitter:
---snip---
DESIGN NOTES:

1. Linux permits a process to own multiple AIO queues (distinguished by
"context"), but FreeBSD creates only one single AIO queue per process.
My code maintains a request queue (STAILQ of queue(3)) per "context",
and throws all AIO requests of all contexts owned by a process into
the single FreeBSD per-process AIO queue.

When the process calls io_destroy(2), io_getevents(2), io_submit(2) and
io_cancel(2), my code can pick out requests owned by the specified context
from the single FreeBSD per-process AIO queue according to the per-context
request queues maintained by my code.

2. The request queue maintained by my code stores contrast information between
Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks
(struct aiocb). FreeBSD IO control block actually exists in userland memory
space, required by FreeBSD native aio_XXXXXX(2).

3. It is quite troubling that the function io_getevents() of libaio-0.3.105
needs to use Linux-specific "struct aio_ring", which is a partial mirror
of context in user space. I would rather take the address of context in
kernel as the context ID, but the io_getevents() of libaio forces me to
take the address of the "ring" in user space as the context ID.

To my surprise, one comment line in the file "io_getevents.c" of
libaio-0.3.105 reads:

Ben will hate me for this

REFERENCE:

1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/
(include/linux/aio_abi.h, fs/aio.c)

2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/
(io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2))

3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html
The design notes: http://lse.sourceforge.net/io/aionotes.txt

4. The package libaio, both source and binary:
http://rpmfind.net/linux/rpm2html/search.php?query=libaio
Simple transparent interface to Linux AIO system calls.

5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/
POSIX AIO implementation based on Linux AIO system calls (depending on
libaio).
---snip---

Submitted by: Li, Xiao <intron@intron.ac>


163374 15-Oct-2006 netchild

MFP4 (106538 + 106541):
Implement CLONE_VFORK. This fixes the clone05 LTP test.

Submitted by: rdivacky


163373 15-Oct-2006 netchild

Revert my previous commit, I mismerged this to the wrong place.

Pointy hat to: netchild


163372 15-Oct-2006 netchild

MFP4 (106541): Fix the clone05 test in the LTP.

Submitted by: rdivacky


163371 15-Oct-2006 netchild

MFP4 (107144[1]): Implement CLONE_FS on i386[1] and amd64.

Submitted by: rdivacky [1]


163317 13-Oct-2006 jhb

Move the 2 additional #includes down into the #ifndef DEV_ATPIC section.


163286 13-Oct-2006 jb

Attempt to fix the GENERIC kernel build which has been failing on
tinderbox for a couple of days.


163267 12-Oct-2006 jhb

Fix nodevice atpic compile.

Pointy hat to: jhb


163219 10-Oct-2006 jhb

Change the x86 interrupt code to suspend/resume interrupt controllers
(PICs) rather than interrupt sources. This allows interrupt controllers
with no interrupt pics (such as the 8259As when APIC is in use) to
participate in suspend/resume.
- Always register the 8259A PICs even if we don't use any of their pins.
- Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't
included.
- Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC
on resume. This gets suspend/resume working with APIC on UP systems.
SMP still needs more work to bring the APs back to life.

The MFC after is tentative.

Tested by: anholt (i386)
Submitted by: Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3)
MFC after: 1 week


163212 10-Oct-2006 jhb

Oops, fix sign bug in #ifdef for value of INTRCNT_COUNT.

PR: kern/99870
Submitted by: jkim
MFC after: 3 days


163041 05-Oct-2006 simon

- Remove SCHED_ULE from GENERIC to better avoid foot-shooting by
unsuspecting users.
- Add a comment in NOTES about experimental status of SCHED_ULE.
- Make warning about experimental status in sched_ule(4) a bit
stronger.

Suggested and reviewed by: dougb
Discussed on: developers
MFC after: 3 days


163018 05-Oct-2006 davidxu

Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.


163016 04-Oct-2006 jb

PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.


162970 02-Oct-2006 phk

Use utc_offset() where applicable, and hide the internals of it
as static variables.


162958 02-Oct-2006 phk

Second part of a little cleanup in the calendar/timezone/RTC handling.

Split subr_clock.c in two parts (by repo-copy):
subr_clock.c contains generic RTC and calendaric stuff. etc.
subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c. They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162713 27-Sep-2006 sobomax

Extend comment explaining why code is conditional at !defined(SCHED_ULE).

Suggested by: ru


162708 27-Sep-2006 sobomax

Since ULE doesn't honor hlt_cpus_mask don't compile code that prevents
timer interrupt servicing for disabled HTT cores in ULE case. Should be
probably fixed in ULE code instead, but we have no real maintainer for
ULE to do it.

PR: 103697


162658 26-Sep-2006 ru

Added COMPAT_FREEBSD6 option.


162572 23-Sep-2006 davidxu

Stop reloading %fs and %gs, since it causes the base address from
GDT to be loaded into FS.base and GS.base, these values of course
are not the values set by sysarch() with I386_SET_FSBASE and
I386_SET_GSBASE, the change fixed a crash for 32bit libthr after
signal handler returned and normal code is accessing thread pointer,
for example: movl %gs:8, %eax.


162562 22-Sep-2006 jhb

Update the ipmi(4) driver:
- Split out the communication protocols into their own files and use
a couple of function pointers in the softc that the commuication
protocols setup in their own attach routine.
- Add support for the SSIF interface (talking to IPMI over SMBus).
- Add an ACPI attachment.
- Add a PCI attachment that attaches to devices with the IPMI interface
subclass.
- Split the ISA attachment out into its own file: ipmi_isa.c.
- Change the code to probe the SMBIOS table for an IPMI entry to just use
pmap_mapbios() to map the table in rather than trying to setup a fake
resource on an isa device and then activating the resource to map in the
table.
- Make bus attachments leaner by adding attach functions for each
communication interface (ipmi_kcs_attach(), ipmi_smic_attach(), etc.)
that setup per-interface data.
- Formalize the model used by the driver to handle requests by adding an
explicit struct ipmi_request object that holds the state of a given
request and reply for the entire lifetime of the request. By bundling
the request into an object, it is easier to add retry logic to the various
communication backends (as well as eventually support BT mode which uses
a slightly different message format than KCS, SMIC, and SSIF).
- Add a per-softc lock and remove D_NEEDGIANT as the driver is now MPSAFE.
- Add 32-bit compatibility ioctl shims so you can use a 32-bit ipmitool
on FreeBSD/amd64.
- Add ipmi(4) to i386 and amd64 NOTES.

Submitted by: ambrisko (large portions of 2 and 3)
Sponsored by: IronPort Systems, Yahoo!
MFC after: 6 days


162487 21-Sep-2006 kan

Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and __builtin_va_start was present in all GCC version 3.1 and
later.


162482 20-Sep-2006 wkoszek

Correct 'interrupt interrupt' -> 'interrupt' in the comment.

Requested by: jhb
Approved by: cognet (mentor)


162378 17-Sep-2006 davidxu

Make cpu_set_upcall_kse() and cpu_set_user_tls() work for 32bit process.


162233 11-Sep-2006 jhb

Add a new ddb command 'show lapic' to dump details about the local APIC
registers for the current CPU.

MFC after: 3 days


162232 11-Sep-2006 jhb

Actually hook up the IPI_INVLCACHE IDT vectors backing
pmap_invalidate_cache() in the SMP case so pmap_mapdev() in multiuser
doesn't panic with a trap 30. I broke this many months ago when I
added pmap_invalidate_cache() as early parts of the PAT work.

Patience from: jmg
Pointy hat: jhb


162224 11-Sep-2006 jhb

- Fix rman_manage_region() to be a lot more intelligent. It now checks
for overlaps, but more importantly, it collapses adjacent free regions.
This is needed to cope with BIOSen that split up ports for system devices
(like IPMI controllers) across multiple system resource entries.
- Now that rman_manage_region() is not so dumb, remove extra logic in the
x86 nexus drivers to populate the IRQ rman that manually coalesced the
regions.

MFC after: 1 week


162182 09-Sep-2006 netchild

Change futex lock from mutex to sx. Make futex_get atomic (protected by the
futex lock).

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Suggested by: jhb


162112 07-Sep-2006 jhb

Use a single constant to define the sizes of the physmap[], phys_avail[],
and dump_avail[] arrays so they are in sync (previously it was possible
to store more entries in the physmap[] then we could store in phys_avail[],
which was pointless). While I'm here, bump up the length of these tables
to hold 30 entries on amd64 and 16 on i386. This allows machines with
fairly fragmented memory maps to boot ok (at least one machine would
not boot FreeBSD/i386 but would boot FreeBSD/amd64 because amd64 allowed
for more fragments).

MFC after: 3 days


162087 06-Sep-2006 sobomax

Unbreak in the case when device apic is compiled into non-SMP kernel.

Reported by: jhay
MFC after: 2 weeks


162042 05-Sep-2006 sobomax

The FreeBSD by default "disables" hyper-threading cores, by not scheduling
any threads to them. However, it still counts those cores as "active but
permanently idle" when calculating system-wide CPUs statistics. It is
incorrect, since it skews statistics quite a bit and creates real problems
for certain types of applications (monitoring applications for example),
by making them believe that the system does have enough idle CPU resources,
while in fact it does not.

Correct the problem by not calling performance counting routines on "disabled"
cores. The cleaner solution would be to just disable APIC timer interrupts on
those cores completely, but ENOTIME here and it is not clear if the
additional complexity really worth minor performance gain.

Reviewed by: ssouhlal
Sponsored by: Sippy Software, Inc.
MFC after: 2 weeks


161696 28-Aug-2006 netchild

MFi386 parts of rev 1.55 (modulo real MD parts):
- implement CLONE_PARENT semantic
- lock proc in the currently disabled part of CLONE_THREAD

Submitted by: rdivacky


161675 28-Aug-2006 davidxu

Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.


161666 27-Aug-2006 netchild

regen


161665 27-Aug-2006 netchild

Add the linux statfs64 call. This allows Tivoli backup to proceed a little
but further on -current (still not successful, but a step into the right
direction).

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Tested by: Paul Mather <paul@gromit.dlib.vt.edu>


161611 25-Aug-2006 netchild

Emulate what vfork does instead of using it in linux_vfork. This way
we can do the stuff we need to do with linux processes at fork and
don't panic the kernel at exit of the child.

Submitted by: rdivacky
Tested with: tst-vfork* (glibc regression tests)
Tested by: netchild


161474 20-Aug-2006 netchild

Sync the MI parts for amd64 with i386 and remove the corresponding special
handling for amd64 in the common code. The MD parts for amd64 are still
outstanding, but at least this fixes some panics on amd64.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Tested by: bsam


161461 19-Aug-2006 netchild

Get rid of some nested includes.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb


161419 17-Aug-2006 netchild

Move some stuff into headers where they belong.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb, ssouhlal


161400 17-Aug-2006 netchild

Initialize the emul sx-lock.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


161366 16-Aug-2006 davidxu

Change xorq back to xorl.

Noticed by: bde


161365 16-Aug-2006 netchild

Style fixes to comments.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb, ssouhlal


161342 15-Aug-2006 davidxu

Backout revision 1.117, xorl and xorq have same result, but xorq needs
longer decoding.


161330 15-Aug-2006 jhb

Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier
systrace changes.


161328 15-Aug-2006 jhb

- Remove unused sysvec variables from various syscalls.conf.
- Send the systrace_args files for all the compat ABIs to /dev/null for
now. Right now makesyscalls.sh generates a file with a hardcoded
function name, so it wouldn't work for any of the ABIs anyway. Probably
the function name should be configurable via a 'systracename' variable
and the functions should be stored in a function pointer in the sysvec
structure.


161315 15-Aug-2006 netchild

Initialize the eventhandlers, mutexes and sx locks.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


161311 15-Aug-2006 netchild

add autogenerated systrace_args stuff for dtrace


161310 15-Aug-2006 netchild

Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
- pid/tid mangling - complete
- thread area - complete
- futexes - complete with issues
- clone() extension - complete with some possible minor issues
- mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
disabled when not build as part of the kernel with native FreeBSD mq*
support (module support for this will come later)

Tested with:
- linux-firefox - works, tested
- linux-opera - works, tested
- linux-realplay - doesnt work, issue with futexes
- linux-skype - doesnt work, issue with futexes
- linux-rt2-demo - works, tested
- linux-acroread - doesnt work, unknown reason (coredump) and sometimes
issue with futexes
- various unix utilities in linux-base-gentoo3 and linux-base-fc4:
everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
sysctl compat.linux.osrelease=2.6.16
to switch back use
sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Some suggestions/help by: jhb, kib, manu@NetBSD.org, netchild


161309 15-Aug-2006 netchild

regen


161308 15-Aug-2006 davidxu

Because fuword on AMD64 returns 64bit long integer -1 on fault, clear
entire %rax to zero instead of only clearing %eax, otherwise it will
leave garbage data in upper 32 bits.


161305 15-Aug-2006 netchild

Add new syscalls in the linuxolator (only used when the sysctl
compat.linux.osrelease is changed to "2.6.16" or similar).

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


161292 14-Aug-2006 alc

Eliminate an unnecessary initialization from trap_pfault() that also
happens to contain a style error.


161285 14-Aug-2006 jhb

Don't try to preserve PAT bits in pmap_enter(). We currently on pages that
aren't mapped via pmap_enter() (KVA). We will eventually support PAT bits
on user pages, but those will require some sort of MI caching mode stored
in the vm_page.

Reviewed by: alc


161272 14-Aug-2006 alc

It's not entirely obvious that PGEX_I must be zero if no-execute is neither
supported nor enabled. Just to be sure, verify that no-execute is enabled
before passing VM_PROT_EXECUTE to vm_fault().

Suggested by: tegge@


161223 11-Aug-2006 jhb

First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR). Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
additional parameter (relative to pmap_mapdev()) specifying the cache
mode for this mapping. Note that on amd64 only WB mappings are done with
the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
mappings rather than WB. Previously we relied on the BIOS setting up
MTRR's to enforce memio regions being treated as UC. This might make
hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
that used pmap_mapdev() to map non-device memory (such as ACPI tables)
to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
caching mode for a range of KVA.

Reviewed by: alc


161204 10-Aug-2006 netchild

Add some more errno mappings (bsd -> linux) and a comment about the status..

Submitted by: "Intron" <mag@intron.ac>


161066 08-Aug-2006 alc

Pass VM_PROT_EXECUTE to vm_fault() instead of VM_PROT_READ if the page
fault was caused by an instruction fetch.


161016 06-Aug-2006 alc

Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().


160928 02-Aug-2006 alc

Define the additional page fault error codes that are implemented by amd64.


160889 01-Aug-2006 alc

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


160869 01-Aug-2006 obrien

Correct spelling of 3DNow!.


160813 29-Jul-2006 marcel

Remove sio(4) and related options from MI files to amd64, i386
and pc98 MD files. Remove nodevice and nooption lines specific
to sio(4) from ia64, powerpc and sparc64 NOTES. There were no
such lines for arm yet.
sio(4) is usable on less than half the platforms, not counting
a future mips platform. Its presence in MI files is therefore
increasingly becoming a burden.


160801 28-Jul-2006 jhb

Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.


160799 28-Jul-2006 jhb

Regen for MPSAFE flag removal.


160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


160797 28-Jul-2006 jhb

Various fixes to comments in the syscall master files including removing
cruft from the audit import and adding mention of COMPAT4 to freebsd32.


160773 27-Jul-2006 jhb

Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
section.
- Add a new explicit check before userret() to make sure we don't return
with any locks held. The advantage here is that we can include the
syscall number and name in syscall() whereas that info is not available
in userret().
- Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by
the more general checks just added.

MFC after: 2 weeks


160770 27-Jul-2006 jhb

Add KTR_SYSC tracing to the syscall() implementations that didn't have it
yet.

MFC after: 1 week


160764 27-Jul-2006 jhb

Add missing ptrace(2) system-call stops to various syscall()
implementations.

MFC after: 1 week


160763 27-Jul-2006 jhb

Don't allow MAXMEM or hw.physmem to extend the top of memory if our memory
map was obtained from the SMAP. SMAP is trustworthy, and the memory
extending feature is a band-aid for older systems where FreeBSD's methods
of detecting memory were not always trustworthy. This fixes the issue
where using hw.physmem could result in the ACPI tables getting trashed
breaking ACPI.

MFC after: 3 days
Tested on: i386


160617 24-Jul-2006 davidxu

Remove a duplicated line.


160525 20-Jul-2006 alc

Add pmap_clear_write() to the interface between the virtual memory
system's machine-dependent and machine-independent layers. Once
pmap_clear_write() is implemented on all of our supported
architectures, I intend to replace all calls to pmap_page_protect() by
calls to pmap_clear_write(). Why? Both the use and implementation of
pmap_page_protect() in our virtual memory system has subtle errors,
specifically, the management of execute permission is broken on some
architectures. The "prot" argument to pmap_page_protect() should
behave differently from the "prot" argument to other pmap functions.
Instead of meaning, "give the specified access rights to all of the
physical page's mappings," it means "don't take away the specified
access rights from all of the physical page's mappings, but do take
away the ones that aren't specified." However, owing to our i386
legacy, i.e., no support for no-execute rights, all but one invocation
of pmap_page_protect() specifies VM_PROT_READ only, when the intent
is, in fact, to remove only write permission. Consequently, a
faithful implementation of pmap_page_protect(), e.g., ia64, would
remove execute permission as well as write permission. On the other
hand, some architectures that support execute permission have
basically ignored whether or not VM_PROT_EXECUTE is passed to
pmap_page_protect(), e.g., amd64 and sparc64. This change represents
the first step in replacing pmap_page_protect() by the less subtle
pmap_clear_write() that is already implemented on amd64, i386, and
sparc64.

Discussed with: grehan@ and marcel@


160419 17-Jul-2006 alc

Now that free_pv_entry() accesses the pmap, call free_pv_entry() in
pmap_remove_all() before rather than after the pmap is unlocked. At
present, the page queues lock provides sufficient sychronization. In the
future, the page queues lock may not always be held when free_pv_entry() is
called.


160329 13-Jul-2006 jkim

Sync specialreg.h changes between amd64 and i386 with few fixes.


160312 12-Jul-2006 jhb

Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.


160286 12-Jul-2006 jkim

Add two new CPUID bits for AMD CPUs, i. e., SVM and extended APIC register.


160277 11-Jul-2006 jhb

Regen.


160276 11-Jul-2006 jhb

- Add conditional VFS Giant locking to getdents_common() (linux ABIs),
ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(),
and svr4_sys_getdents64() similar to that in getdirentries().
- Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(),
linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and
svr4_sys_getdents64() MPSAFE.


160210 09-Jul-2006 mjacob

Make the firmware assist driver resident in
preparation for isp using it.


160144 06-Jul-2006 jhb

Regen.


160143 06-Jul-2006 jhb

- Protect the list of linux ioctl handlers with an sx lock.
- Hold Giant while calling linux ioctl handlers for now as they aren't all
known to be MPSAFE yet.
- Mark linux_ioctl() MPSAFE.


160125 06-Jul-2006 alc

Make two simplifications to pmap_ts_referenced(): Eliminate an unnecessary
test and exit the loop in a shorter way.


160106 05-Jul-2006 alc

pmap_clear_ptes() is already convoluted. This will worsen with the
implementation of superpages. Eliminate it and add pmap_clear_write().

There are no functional changes. Checked by: md5


160103 05-Jul-2006 davidxu

Temporarily remove SCHED_CORE, it seems I have so many works can do now,
one example is POSIX priority mutex for libthr.


160073 02-Jul-2006 alc

Correct an error in the new pmap_collect(), thus only affecting HEAD.
Specifically, the pv entry was always being freed to the caller's pmap
instead of the pmap to which the pv entry belongs.


160069 01-Jul-2006 alc

Tidy up pmap_ts_referenced(): Eliminate excessive white space. Eliminate
an initialized but otherwise unused variable. Explicitly check a pointer
against NULL.

There are no functional changes. Checked by: md5


160059 01-Jul-2006 alc

Eliminate the remaining uses of "register".

Convert the remaining K&R-style function declarations to ANSI-style.


159994 27-Jun-2006 jhb

Regen.


159991 27-Jun-2006 jhb

- Add a kern_semctl() helper function for __semctl(). It accepts a pointer
to a copied-in copy of the 'union semun' and a uioseg to indicate which
memory space the 'buf' pointer of the union points to. This is then used
in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap.
- Mark linux_ipc() and svr4_sys_semsys() MPSAFE.


159983 27-Jun-2006 jhb

Regen.


159982 27-Jun-2006 jhb

- Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away. mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
(or rather, to handle concurrent unmount races) from going away.
umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.


159970 27-Jun-2006 alc

Correct a very old and very obscure bug: vmspace_fork() calls
pmap_copy() if the mapping is VM_INHERIT_SHARE. Suppose the mapping
is also wired. vmspace_fork() clears the wiring attributes in the vm
map entry but pmap_copy() copies the PG_W attribute in the PTE. I
don't think this is catastrophic. It blocks pmap_remove_pages() from
destroying the mapping and corrupts the pmap's wiring count.

This revision fixes the problem by changing pmap_copy() to clear the
PG_W attribute.

Reviewed by: tegge@


159967 26-Jun-2006 obrien

Add a pure open source nForce Ethernet driver, under BSDL.
This driver was ported from OpenBSD by Shigeaki Tagashira
<shigeaki@se.hiroshima-u.ac.jp> and posted at
http://www.se.hiroshima-u.ac.jp/~shigeaki/software/freebsd-nfe.html
It was additionally cleaned up by me.
It is still a work-in-progress and thus is purposefully not in GENERIC.
And it conflicts with nve(4), so only one should be loaded.


159964 26-Jun-2006 babkin

Backed out the change by request from rwatson.

PR: kern/14584


159961 26-Jun-2006 jhb

Regen.


159959 26-Jun-2006 jhb

linux_brk() is MPSAFE.


159934 25-Jun-2006 alc

Eliminate a comment that became stale after revision 1.540.

Wrap a nearby line.


159927 25-Jun-2006 babkin

The common UID/GID space implementation. It has been discussed on -arch
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.

PR: kern/14584
Submitted by: babkin


159824 21-Jun-2006 netchild

Commit the DUMMY stuff (printing messages for missing syscalls) for amd64 too.

Submitted by: rdivacky
Sponsored by: Google SoC 2006
Noticed by: jkim
Pointyhat to: netchild


159803 20-Jun-2006 alc

Change get_pv_entry() such that the call to vm_page_alloc() specifies
VM_ALLOC_NORMAL instead of VM_ALLOC_SYSTEM when try is TRUE. In other
words, when get_pv_entry() is permitted to fail, it no longer tries as
hard to allocate a page.

Change pmap_enter_quick_locked() to fail rather than wait if it is
unable to allocate a page table page. This prevents a race between
pmap_enter_object() and the page daemon. Specifically, an inactive
page that is a successor to the page that was given to
pmap_enter_quick_locked() might become a cache page while
pmap_enter_quick_locked() waits and later pmap_enter_object() maps
the cache page violating the invariant that cache pages are never
mapped. Similarly, change
pmap_enter_quick_locked() to call pmap_try_insert_pv_entry() rather
than pmap_insert_entry(). Generally speaking,
pmap_enter_quick_locked() is used to create speculative mappings. So,
it should not try hard to allocate memory if free memory is scarce.

Add an assertion that the object containing m_start is locked in
pmap_enter_object(). Remove a similar assertion from
pmap_enter_quick_locked() because that function no longer accesses the
containing object.

Remove a stale comment.

Reviewed by: ups@


159801 20-Jun-2006 netchild

regen after change to syscalls.master


159799 20-Jun-2006 netchild

Switch to using the DUMMY infrastructure instead of UNIMPL for the new
syscalls. This way there will be a log message printed to the console
(this time for real).

Note: UNIMPL should be used for syscalls we do not implement ever, e.g.
syscalls to load linux kernel modules.

Submitted by: rdivacky
Sponsored by: Goole SoC 2006
P4 IDs: 99600, 99602


159790 20-Jun-2006 yar

We no longer need to disable interrupts in MD trap machinery
when we're about to call kdb_trap() because the latter MI
function can disable interrupts by itself now.

Pointed out by: bde
X-MFC remark: depends on kern/subr_kdb.c#1.18
Sponsored by: RiNet (Cronyx Plus LLC)


159783 19-Jun-2006 davidxu

Add variable cpu_mxcsr_mask to save valid bits of mxcsr register.


159782 19-Jun-2006 davidxu

MFi386:
Use the method described in IA-32 Intel Architecture Software
Developer's Manual chapter 11.6.6 to get valid mxcsr bits,
use the mxcsr mask to clear invalid bits passed by user code.


159651 15-Jun-2006 netchild

Remove COMPAT_43 from GENERIC (and other kernel configs). For amd64 there's
an explicit comment that it's needed for the linuxolator. This is not the
case anymore. For all other architectures there was only a "KEEP THIS".
I'm (and other people too) running a COMPAT_43-less kernel since it's not
necessary anymore for the linuxolator. Roman is running such a kernel for a
for longer time. No problems so far. And I doubt other (newer than ia32
or alpha) architectures really depend on it.

This may result in a small performance increase for some workloads.

If the removal of COMPAT_43 results in a not working program, please
recompile it and all dependencies and try again before reporting a
problem.

The only place where COMPAT_43 is needed (as in: does not compile without
it) is in the (outdated/not usable since too old) svr4 code.

Note: this does not remove the COMPAT_43TTY option.

Nagging by: rdivacky


159627 15-Jun-2006 ups

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


159582 13-Jun-2006 netchild

regen after MFP4 (soc2006/rdivacky_linuxolator) of syscalls.master

P4-Changes: similar to 98673 and 98675 but regenerated locally
Sponsored by: Google SoC 2006
Submitted by: rdivacky


159581 13-Jun-2006 netchild

MFP4 (soc2006/rdivacky_linuxolator)

Update of syscall.master:
o Adding of several new dummy syscalls (268-310)
o Synchronization of amd64 syscall.master with i386 one
o Auditing added to amd64 syscall.master
o Change auditing type for lstat syscall (bugfix). [1]

P4-Changes: 98672, 98674
Noticed by: rwatson [1]
Sponsored by: Google SoC 2006
Submitted by: rdivacky


159570 13-Jun-2006 davidxu

Add scheduler CORE, the work I have done half a year ago, recent,
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.

Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
timeslice and interaction detecting algorithm are based
on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.


159549 12-Jun-2006 jhb

Enable a few more things in x86 NOTES to get broader LINT coverage:
- Turn on iwi(4), ipw(4), and ndis(4) on amd64 and i386.
- Turn on ral(4) and ural(4) on i386, pc98, and amd64.


159546 12-Jun-2006 alc

Don't invalidate the TLB in pmap_qenter() unless the old mapping was valid.
Most often, it isn't.

Reviewed by: tegge@


159537 12-Jun-2006 imp

Add the ability to subset the devices that UART pulls in. This allows
the arm to compile without all the extras that don't appear, at least
not in the flavors of ARM I deal with. This helps us save about 100k.

If I've botched the available devices on a platform, please let me
know and I'll correct ASAP.


159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


159130 01-Jun-2006 silby

After much discussion with mjacob and scottl, change bus_dmamem_alloc so
that it just warns the user with a printf when it misaligns a piece
of memory that was requested through a busdma tag.

Some drivers (such as mpt, and probably others) were asking for alignments
that could not be satisfied, but as far as driver operation was concerned,
that did not matter. In the theory that other drivers will fall into
this same category, we agreed that panicing or making the allocation
fail will cause more hardship than is necessary. The printf should
be sufficient motivation to get the driver glitch fixed.


159092 31-May-2006 mjacob

Turn the panic on not being able to meet alignment constraints
in bus_dmamem_alloc into the more reasonable EINVAL return.

Also, reclaim memory allocated but then not used if we had
an error return.


159012 28-May-2006 silby

MFi386 rev 1.78:

Add a quick hack to ensure that bus_dmamem_alloc properly aligns
small allocations with large alignment requirements.

Add a panic to detect cases where we've still failed to properly align.


158748 19-May-2006 sobomax

Move clock_lock prototype into <machine/clock.h>, where it is more
appropriate.

Discussed with: jhb


158711 17-May-2006 marius

Add le(4). I could actually only test it on alpha, i386 and sparc64 but
given that this includes the more problematic platforms I see no reason
why it shouldn't also work on amd64 and ia64.


158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


158647 16-May-2006 ru

Kill more references to lnc(4).

Submitted by: grep(1)


158568 14-May-2006 marius

Remove some remnants of lnc(4).


158445 11-May-2006 phk

Clean out sysctl machdep.* related defines.

The cmos clock related stuff should really be in MI code.


158407 10-May-2006 netchild

regen (linux rt_sigpending)


158406 10-May-2006 netchild

Implement rt_sigpending in the linuxolator.

PR: 92671
Submitted by: Markus Niemist"o <markus.niemisto@gmx.net>


158381 09-May-2006 ambrisko

Add in linsysfs. A linux 2.6 like sys filesystem to pacify the Linux
LSI MegaRAID SAS utility.

Sponsored by: IronPort Systems
Man page help from: brueffer


158334 06-May-2006 ambrisko

Forgot the amd/linux32 part since sys/*/linux didn't match :-(

Pointed out by: Alexander (thanks)


158270 03-May-2006 sam

add ath and wlan crypto support

MFC after: 1 month


158264 03-May-2006 scottl

Allow bus_dmamap_load() to pass ENOMEM back to the caller. This puts it into
conformance with the mbuf and uio load routines. ENOMEM can only happen
with BUS_DMA_NOWAIT is passed in, thus the deferals are disabled. I don't
like doing this, but fixing this fixes assumptions in other important drivers,
which is a net benefit for now.


158238 01-May-2006 jhb

Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).

MFC after: 1 month


158236 01-May-2006 jhb

Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction. This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after: 1 month


158133 29-Apr-2006 alc

Eliminate unnecessary, recursive acquisitions and releases of the page
queues lock by free_pv_entry() and pmap_remove_pages().

Reduce the scope of the page queues lock in pmap_remove_pages().


158124 28-Apr-2006 marcel

Rewrite of puc(4). Significant changes are:
o Properly use rman(9) to manage resources. This eliminates the
need to puc-specific hacks to rman. It also allows devinfo(8)
to be used to find out the specific assignment of resources to
serial/parallel ports.
o Compress the PCI device "database" by optimizing for the common
case and to use a procedural interface to handle the exceptions.
The procedural interface also generalizes the need to setup the
hardware (program chipsets, program clock frequencies).
o Eliminate the need for PUC_FASTINTR. Serdev devices are fast by
default and non-serdev devices are handled by the bus.
o Use the serdev I/F to collect interrupt status and to handle
interrupts across ports in priority order.
o Sync the PCI device configuration to include devices found in
NetBSD and not yet merged to FreeBSD.
o Add support for Quatech 2, 4 and 8 port UARTs.
o Add support for a couple dozen Timedia serial cards as found
in Linux.


158101 28-Apr-2006 scottl

Enable the rr232x driver for amd64.


158088 27-Apr-2006 alc

In general, bits in the page directory entry (PDE) and the page table
entry (PTE) have the same meaning. The exception to this rule is the
eighth bit (0x080). It is the PS bit in a PDE and the PAT bit in a
PTE. This change avoids the possibility that pmap_enter() confuses a
PAT bit with a PS bit, avoiding a panic().

Eliminate a diagnostic printf() from the i386 pmap_enter() that serves
no current purpose, i.e., I've seen no bug reports in the last two
years that are helped by this printf().

Reviewed by: jhb


158059 26-Apr-2006 peter

Move vm.pmap.pv_entry_count out from the PV_STATS ifdefs. It is always
available and is a real counter, not a statistic.


158007 25-Apr-2006 jkim

Check if reported HTT cores are physical cores. This commit does not
affect AMD CPUs at all because HTT bit is disabled earlier. Intel
multicore CPUs and ULE scheduler may be affected.


158004 24-Apr-2006 jkim

Add another Intel CPU feature flag, xTPR (Send Task Priority Messages).


158003 24-Apr-2006 jkim

Check if deterministic cache parameters leaf is valid before use.


158000 24-Apr-2006 cperciva

Adjust dangerous-shared-cache-detection logic from "all shared data
caches are dangerous" to "a shared L1 data cache is dangerous". This
is a compromise between paranoia and performance: Unlike the L1 cache,
nobody has publicly demonstrated a cryptographic side channel which
exploits the L2 cache -- this is harder due to the larger size, lower
bandwidth, and greater associativity -- and prohibiting shared L2
caches turns Intel Core Duo processors into Intel Core Solo processors.

As before, the 'machdep.hyperthreading_allowed' sysctl will allow even
the L1 data cache to be shared.

Discussed with: jhb, scottl
Security: See FreeBSD-SA-05:09.htt for background material.


157994 24-Apr-2006 delphij

Move AHC_REG_PRETTY_PRINT and AHD_REG_PRETTY_PRINT below
their corresponding devices.


157912 21-Apr-2006 peter

Oops. Minidumps were developed on 6.x, in without the small pv entry code.
Add some strategic dump_add_page()/dump_drop_page() lines to include pv
chunks in the minidumps - these operate in the direct map region like UMA.


157908 21-Apr-2006 peter

Introduce minidumps. Full physical memory crash dumps are still available
via the debug.minidump sysctl and tunable.

Traditional dumps store all physical memory. This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm. These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.

Minidumps invert the process. Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.

amd64 has a direct map region that things like UMA use. Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump. Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.

Dumps are a custom format. At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap. libkvm can now conveniently access the kvm
page table entries.

Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump. While this is a best case, I expect minidumps
to be in the 100MB-500MB range. Obviously, never larger than physical
memory of course.

minidumps are on by default. It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.

Both minidumps and regular dumps are supported on the same machine.


157893 20-Apr-2006 imp

Set the rid for a resoruce allocated with rman_reserve_resource.


157860 19-Apr-2006 cperciva

Correct a local information leakage bug affecting AMD FPUs.

Security: FreeBSD-SA-06:14.fpu


157850 18-Apr-2006 peter

If we're doing a try-alloc of a pv entry and give up early, do not forget
to reduce the pv_entry_count counter. This was found by Tor Egge. In the
same email, Tor also pointed out the pv_stats problem in the previous
commit, but I'd forgotten about it until I went looking for this email
about this allocation problem.


157849 18-Apr-2006 peter

pv_entry_count is more than a statistic. It is used for resource limiting.
Do not compile out its counter updates if pv entry stats are turned off.


157701 13-Apr-2006 alc

Include opt_pmap.h for PMAP_SHPGPERPROC.

PR: 94509


157680 12-Apr-2006 alc

Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge


157643 10-Apr-2006 ps

Hook bce up to the build


157541 05-Apr-2006 jhb

Cache the value of the lower half of each I/O APIC redirection table entry
so that we only have to do an ioapic_write() instead of an ioapic_read()
followed by an ioapic_write() every time we mask and unmask level triggered
interrupts. This cuts the execution time for these operations roughly in
half.

Profiled by: Paolo Pisati <p.pisati@oltrelinux.com>
MFC after: 1 week


157505 04-Apr-2006 peter

Convert pv_entry_frees and pv_entry_allocs stats counters from int to long,
they wrap way too quickly.


157458 04-Apr-2006 marcel

Sync with i386: Map exceptions to signals in gdb_cpu_signal() so
that kgdb(1) gets a SIGTRAP when it needs to.

Pointed out by: grehan@


157455 04-Apr-2006 marcel

The PC is register 16, not 18.

Pointed out by: grehan@


157448 03-Apr-2006 marcel

Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.


157446 03-Apr-2006 peter

Shrink the amd64 pv entry from 48 bytes to about 24 bytes. On a machine
with large mmap files mapped into many processes, this saves hundreds of
megabytes of ram.
pv entries were individually allocated and had two tailq entries and two
pointers (or addresses). Each pv entry was linked to a vm_page_t and
a process's address space (pmap). It had the virtual address and a
pointer to the pmap.
This change replaces the individual allocation with a per-process
allocation system. A page ("pv chunk") is allocated and this provides
168 pv entries for that process. We can now eliminate one of the 16 byte
tailq entries because we can simply iterate through the pv chunks to find
all the pv entries for a process. We can eliminate one of the 8 byte
pointers because the location of the pv entry implies the containing
pv chunk, which has the pointer. After overheads from the pv chunk
bitmap and tailq linkage, this works out that each pv entry has an
effective size of 24.38 bytes.

Future work still required, and other problems:
* when running low on pv entries or system ram, we may need to defrag
the chunk pages and free any spares. The stats (vm.pmap.*) show that
this doesn't seem to be that much of a problem, but it can be done if
needed.
* running low on pv entries is now a much bigger problem. The old
get_pv_entry() routine just needed to reclaim one other pv entry.
Now, since they are per-process, we can only use pv entries that are
assigned to our current process, or by stealing an entire page worth
from another process. Under normal circumstances, the pmap_collect()
code should be able to dislodge some pv entries from the current
process. But if needed, it can still reclaim entire pv chunk pages
from other processes.
* This should port to i386 really easily, except there it would reduce
pv entries from 24 bytes to about 12 bytes.

(I have integrated Alan's recent changes.)


157443 03-Apr-2006 peter

Remove the unused sva and eva arguments from pmap_remove_pages().


157394 02-Apr-2006 alc

Introduce pmap_try_insert_pv_entry(), a function that conditionally creates
a pv entry if the number of entries is below the high water mark for pv
entries.

Use pmap_try_insert_pv_entry() in pmap_copy() instead of
pmap_insert_entry(). This avoids possible recursion on a pmap lock in
get_pv_entry().

Eliminate the explicit low-memory checks in pmap_copy(). The check that
the number of pv entries was below the high water mark was largely
ineffective because it was located in the outer loop rather than the
inner loop where pv entries were allocated. Instead of checking, we
attempt the allocation and handle the failure.

Reviewed by: tegge
Reported by: kris
MFC after: 5 days


157341 31-Mar-2006 emax

Add kbdmux(4) to GENERIC on amd64

Requested by: scottl
Tested by: scottl


157259 29-Mar-2006 scottl

Hook the MFI driver up to the build.


157179 27-Mar-2006 jhb

If the XSDT address in the RSDP for an ACPI 2.0 machine is NULL, then fall
back to using the RSDT instead. ACPI-CA already follows this same strategy
as a workaround for yet another instance of brain-damaged BIOS writers.

PR: i386/93963
Submitted by: Masayuki FUKUI <fukui.FreeBSD@fanet.net>


156963 21-Mar-2006 alc

Eliminate unnecessary invalidations of the entire TLB by pmap_remove().
Specifically, on mappings with PG_G set pmap_remove() not only performs
the necessary per-page invlpg invalidations but also performs an
unnecessary invalidation of the entire set of non-PG_G entries.

Reviewed by: tegge


156930 21-Mar-2006 davidxu

Remove stale KSE code.

Reviewed by: alc


156920 20-Mar-2006 jhb

Drop some unneeded casts since we program the kernel in C rather than C++.


156919 20-Mar-2006 netchild

regen: fix of linuxolator with testing in a cross-build


156918 20-Mar-2006 netchild

Fix the linuxolator on amd64 (cross-build).


156875 19-Mar-2006 ru

Regen.


156874 19-Mar-2006 ru

Unbreak COMPAT_LINUX32 option support on amd64.

Broken by: netchild


156851 18-Mar-2006 netchild

regen


156847 18-Mar-2006 ups

Enable global pages TLB extension on Application Processors.

MFC after: 3 days


156843 18-Mar-2006 netchild

regen after COMPAT_43 removal


156842 18-Mar-2006 netchild

Get rid of the need of COMPAT_43 in the linuxolator.

Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Obtained from: DragonFly (some parts)


156706 14-Mar-2006 jhb

Don't allow userland to set hardware watch points on kernel memory at all.
Previously, we tried to allow this only for root. However, we were calling
suser() on the *target* process rather than the current process. This
means that if you can ptrace() a process running as root you can set a
hardware watch point in the kernel. In practice I think you probably have
to be root in order to pass the p_candebug() checks in ptrace() to attach
to a process running as root anyway. Rather than fix the suser(), I just
axed the entire idea, as I can't think of any good reason _at all_ for
userland to set hardware watch points for KVM.

MFC after: 3 days
Also thinks hardware watch points on KVM from userland are bad: bde, rwatson


156699 14-Mar-2006 peter

Merge/sync with i386: various cosmetic tweaks


156698 14-Mar-2006 peter

MFi386: The SIGFPE macros were moved to signal.h (FPE_INTOVF etc)


156696 13-Mar-2006 peter

MFi386: rename pcib_devclass to hostb_devclass (cosmetic here)


156695 13-Mar-2006 peter

MFi386: add a TRAP_INTERRUPT case


156694 13-Mar-2006 peter

Cosmetic sync with i386


156674 13-Mar-2006 ps

Fix the format/display descriptor of vm.kmem_size and vm.kmem_free
to be 'long' instead of 'int' so that sysctl(8) correctly displays
the 8 returned bytes as a single 'long' instead of two 'int' values.

Submitted by: peter


156504 09-Mar-2006 jhb

Flip the switch and don't route interrupts to hyperthreads in a HT system.
In at least one benchmark this showed around a 20% performance increase.
If other workloads do benefit from having hyperthreads service interrupts,
we can always make this a loader tunable.

MFC after: 3 days
Tested by: ps


156440 08-Mar-2006 ups

Fix exec_map resource leaks.

Tested by: kris@


156354 06-Mar-2006 yar

MFi386 revision 1.1220: options TDFX_LINUX --> device tdfx_linux


156130 01-Mar-2006 sam

guard function decls with _KERNEL so user code can include this file


156124 28-Feb-2006 jhb

Rework how we wire up interrupt sources to CPUs:
- Throw out all of the logical APIC ID stuff. The Intel docs are somewhat
ambiguous, but it seems that the "flat" cluster model we are currently
using is only supported on Pentium and P6 family CPUs. The other
"hierarchy" cluster model that is supported on all Intel CPUs with
local APICs is severely underdocumented. For example, it's not clear
if the OS needs to glean the topology of the APIC hierarchy from
somewhere (neither ACPI nor MP Table include it) and setup the logical
clusters based on the physical hierarchy or not. Not only that, but on
certain Intel chipsets, even though there were 4 CPUs in a logical
cluster, all the interrupts were only sent to one CPU anyway.
- We now bind interrupts to individual CPUs using physical addressing via
the local APIC IDs. This code has also moved out of the ioapic PIC
driver and into the common interrupt source code so that it can be
shared with MSI interrupt sources since MSI is addressed to APICs the
same way that I/O APIC pins are.
- Interrupt source classes grow a new method pic_assign_cpu() to bind an
interrupt source to a specific local APIC ID.
- The SMP code now tells the interrupt code which CPUs are avaiable to
handle interrupts in a simpler and more intuitive manner. For one thing,
it means we could now choose to not route interrupts to HT cores if we
wanted to (this code is currently in place in fact, but under an #if 0
for now).
- For now we simply do static round-robin of IRQs to CPUs when the first
interrupt handler just as before, with the change that IRQs are now
bound to individual CPUs rather than groups of up to 4 CPUs.
- Because the IRQ to CPU mapping has now been moved up a layer, it would
be easier to manage this mapping from higher levels. For example, we
could allow drivers to specify a CPU affinity map for their interrupts,
or we could allow a userland tool to bind IRQs to specific CPUs.

The MFC is tentative, but I want to see if this fixes problems some folks
had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work
fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose
interrupts).

MFC after: 1 week


155720 15-Feb-2006 dwmalone

It seems bit 5 of cpu_feature2 is the VMX (Virtual Machine Extensions)
bit. While I'm here, delete a comment that was cut and past from the
cpu_features code that doesn't belong here.


155534 11-Feb-2006 phk

CPU time accounting speedup (step 2)

Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64


155455 08-Feb-2006 phk

Simplify system time accounting for profiling.

Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.


155444 07-Feb-2006 phk

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


155402 06-Feb-2006 jhb

- Always call exec_free_args() in kern_execve() instead of doing it in all
the callers if the exec either succeeds or fails early.
- Move the code to call exit1() if the exec fails after the vmspace is
gone to the bottom of kern_execve() to cut down on some code duplication.


155313 04-Feb-2006 wsalamon

Call the audit syscall enter/exit functions for the amd64 architecture,
both 32-bit and 64-bit paths. System calls will now be audited.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)


155239 03-Feb-2006 davidxu

MFi386:
Clear carry flag in get_mconetxt so that setcontext does not
return a bogus error.


155234 03-Feb-2006 peter

Make PV entries dynamic on amd64. i386 has a pre-reserved block of kva
dedicated to storing pv entries, originally so that kva didn't have to be
allocated at inconvenient times. For amd64, we can get the same effect by
using the direct map area. Allocating pages is the same as with the object
backed method, but now we can just lookup the page in the direct map area.
Thus, no more pageable kva is reserved. This is the single largest
consumer of kva on our work machines and this change should help conserve
the fixed size 2GB pageable kva on the amd64 kernel.

There are a pair of sysctl nodes introduced, named the same as their
tunable counterparts. vm.pmap.shpgperproc and vm.pmap.pv_entry_max
They work just like the tunables of the same path, except the values are
linked. The pv entry cap is now dynamically changeable.

I didn't make them totally unlimited because we need some sort of safety
limit still. One could consume all physical memory without a cap.


154935 27-Jan-2006 jhb

Call WITNESS_CHECK() in the page fault handler and immediately assume it
is a fatal fault if we are holding any non-sleepable locks. This should
cut down on the number of bogus LORs we currently get when the kernel
panics due to a NULL (or bogus) pointer dereference that goes wandering
off into the VM system which tries to acquire locks and then kicks off
the spurious LORs. This should probably be ported to all the archs at
some point.

Tested on: i386


154367 14-Jan-2006 scottl

Free the newtag if we exit with a failure from alloc_bounce_zone().

Found by: Coverity Prevent(tm)


154243 12-Jan-2006 obrien

Move linux support to the linux section.


154170 10-Jan-2006 phk

Move the old BSD4.3 tty compatibility from (!BURN_BRIDGES && COMPAT_43)
to COMPAT_43TTY.

Add COMPAT_43TTY to NOTES and */conf/GENERIC

Compile tty_compat.c only under the new option.

Spit out
#warning "Old BSD tty API used, please upgrade."
if ioctl_compat.h gets #included from userland.


154128 09-Jan-2006 imp

By popular demand, move __HAVE_ACPI and __PCI_REROUTE_INTERRUPT into
param.h. Per request, I've placed these just after the
_NO_NAMESPACE_POLLUTION ifndef. I've not renamed anything yet, but
may since we don't need the __.

Submitted by: bde, jhb, scottl, many others.


154079 06-Jan-2006 jhb

- Make pcib_devclass private to sys/dev/pci/pci_pci.c and change all the
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.


154074 06-Jan-2006 jhb

Fix various places that were testing td_critnest to see if interrupts
should remain disabled during a trap or not to check
td_md.md_spinlock_count instead.


153995 03-Jan-2006 jkim

- Explicitly validate an empty filter to match bpf_filter() comment[1].
- Do not use BPF JIT compiler for an empty filter.

[1] Pointed out by: darrenr


153955 01-Jan-2006 imp

Define __HAVE_ACPI and/or __PCI_REROUTE_INTERRUPT, as appropriate for
each platform. These will be used in the pci code in preference to
the complicated #ifdefs we have there now.


153947 01-Jan-2006 netchild

Unbreak kernel build.

A happy new year to all.

Submitted by: Goran Gajic <ggajic@afrodita.rcub.bg.ac.yu>, bz
Pointy hat to: netchild
Appologies to: all


153940 31-Dec-2005 netchild

MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)

MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)


153766 27-Dec-2005 pjd

Fix watch address truncation. The address was truncated when it was passed to
amd64_set_watch() as 'unsigned int' and 'unsigned int' is 32bit long on amd64.

Even with that fix hardware watchpoint don't work for me on amd64, ie. when
I set the watchpoint and write a byte there, nothing happens.


153741 26-Dec-2005 sobomax

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


153694 23-Dec-2005 jeff

- Improve the INKERNEL macro such that it can no longer give false positives.
This fixes the stack(9) functionality.

Submitted by: Antoine Brodin <antoine.brodin@laposte.net>


153666 22-Dec-2005 jhb

Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)


153570 20-Dec-2005 jhb

Move the hostb driver out of the i386 and amd64 PCI code (where it was
duplicated anyways) and into a single MI driver. Extend the driver a bit
to implement the bus and PCI kobj interfaces such that other drivers can
attach to it and transparently act as if their parent device is the PCI
bus (for the most part).


153504 18-Dec-2005 marcel

Make our ELF64 type definitions match standards. In particular this
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.

MFC after: 2 weeks


153469 16-Dec-2005 scottl

Don peril sensitive sunglasses and jack up the MAX_BPAGES limit to 8192
on amd64. If you're going to stuff >4GB into your box, reserving 32MB for
bonce pages amounts to a rounding error in the overall scheme of things.


153448 15-Dec-2005 jhb

Remove linux_mib_destroy() (which I actually added in between 5.0 and 5.1)
which existed to cleanup the linux_osname mutex. Now that MTX_SYSINIT()
has grown a SYSUNINIT to destroy mutexes on unload, the extra destroy here
was redundant and resulted in panics in debug kernels.

MFC after: 1 week
Reported by: Goran Gajic ggajic at afrodita dot rcub dot bg dot ac dot yu


153426 14-Dec-2005 jhb

Fix stale comment.


153383 13-Dec-2005 jhb

Revert previous commit. The BIOS braindamage is even worse than I
originally thought. The BIOS that cleared CPUID_APIC actually managed
to disable the local APIC entirely and even Windows 64 doesn't boot on
it.

Reported by: bz


153377 13-Dec-2005 jhb

Don't check the CPUID_APIC bit in the cpu_features flags field to determine
if the boot CPU has a local APIC because some BIOS vendors are not
competent enough to set this bit. Instead, just assume that we always have
a local APIC on amd64. For i386 the check is a bit more subtle. FreeBSD
requires either an MP Table or an ACPI MADT table to enumerate APICs. The
only systems that have one of those tables that don't have local APICs are
some presumably rare (and old) SMP 486 systems using external APICs. Thus,
instead of checking the CPUID_APIC flag, check the CPU class and abort if
we are running on a 486.

MFC after: 1 week
Reported by: bz


153364 12-Dec-2005 peter

For the amd64 platform, we can depend on the TSC being present. This patch
changes DELAY to use the TSC once it has been calibrated. This does NOT
use the TSC for long-term timekeeping. It only uses it to bound the
DELAY() spinloop. This should not be affected by the Athlon64 X2 TSC
quirks because the cpu is not halted while we use DELAY().


153268 09-Dec-2005 davidxu

Sync with i386, fix compiling for non-SMP.


153241 08-Dec-2005 jhb

MFi386:
- Move PUSH_FRAME and POP_FRAME to asmacros.h and use PUSH_FRAME in
atpic entry points.
- Move PCPU_* asm macros out of the middle of the asm profiling macros.
- Pass IRQ vector argument as an int rather than void * to reduce diffs
with i386.
- EOI the lapic in C for the lapic timer handler.
- GC unused Xcpuast function.
- Split IPI_STOP handling code of ipi_nmi_handler() out into a
cpustop_handler() function and call it from Xcpustop rather than
duplicating all the logic in assembly.
- Fixup the list of symbols with interrupt frames in ddb traces.
Xatpic_fastintr* have never existed on amd64, and the lapic timer
handler and various IPI handlers were missing.
- Use trapframe instead of intrframe for interrupt entry points (on amd64
the interrupt vector was already a separate argument, so the two frames
were already identical) and GC intrframe.

Submitted by: peter (3)


153180 06-Dec-2005 peter

Catch up to the system siginfo changes. Use a union for the ia32 layout
of siginfo just like the system one. There are now two fields to copy
instead of one.


153179 06-Dec-2005 jhb

- Cleanup whitespace and extra ()s in vtophys() macros.
- Move vtophys() macros next to vtopte() where vtopte() exists to match
comments above vtopte().
- Remove references to the alternate address space in the comment above
vtopte(). amd64 never had the alternate address space, and i386 lost it
prior to PAE support being added.
- s/entires/entries/ in comments.

Reviewed by: alc


153177 06-Dec-2005 jkim

Fix ZERO_EDX() macro from the previous commit. It was emitting
`xor %ecx, %ecx', not `xor %edx, %edx'.


153168 06-Dec-2005 ru

Drop _MACHINE_ARCH and _MACHINE defines (not to be confused with
MACHINE_ARCH and MACHINE). Their purpose was to be able to test
in cpp(1), but cpp(1) only understands integer type expressions.
Using such unsupported expressions introduced a number of subtle
bugs, which were discovered by compiling with -Wundef.


153157 06-Dec-2005 jkim

s/M_WAITOK/M_NOWAIT/ while mutex is held.

Pointed out by: csjp


153156 06-Dec-2005 jkim

- Micro-optimize `mov $0, %edx' -> `xor %edx, %edx'.
- Correct amd64 macro style (no functional change).


153151 06-Dec-2005 jkim

Add experimental BPF Just-In-Time compiler for amd64 and i386.

Use the following kernel configuration option to enable:

options BPF_JITTER

If you want to use bpf_filter() instead (e. g., debugging), do:

sysctl net.bpf.jitter.enable=0

to turn it off.

Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is
partially supported because 1) no need, 2) avoid expensive m_copydata(9).

Obtained from: WinPcap 3.1 (for i386)


153136 05-Dec-2005 jhb

Really slam the door on mixed mode now that we don't depend on it for a
working IRQ0 with APIC anymore. Previously, it was possible to have
some other ATPIC IRQS "leak" through in a few edge cases. For example, on
my x86 test machine, ACPI re-routes the SCI (IRQ 9) to intpin 13 on the
first I/O APIC. This leaves a hole for IRQ 13 (since the APIC doesn't
provide a source for IRQ 13 in that case) with the result that the ATPIC
IRQ13 source was registered instead. This changes the 8259A drivers to
only register their interrupt sources if none of the 16 ISA IRQs have an
interrupt source already installed.

MFC after: 1 week


153033 03-Dec-2005 anholt

Merge DRM CVS as of 2005-12-02, adding i915 DRM support thanks to Alexey Popov,
and a new r300 PCI ID.


152909 28-Nov-2005 anholt

Update DRM to CVS snapshot as of 2005-11-28. Notable changes:
- S3 Savage driver ported.
- Added support for ATI_fragment_shader registers for r200.
- Improved r300 support, needed for latest r300 DRI driver.
- (possibly) r300 PCIE support, needs X.Org server from CVS.
- Added support for PCI Matrox cards.
- Software fallbacks fixed for Rage 128, which used to render badly or hang.
- Some issues reported by WITNESS are fixed.
- i915 module Makefile added, as the driver may now be working, but is untested.
- Added scripts for copying and preprocessing DRM CVS for inclusion in the
kernel. Thanks to Daniel Stone for getting me started on that.


152906 28-Nov-2005 jhb

If we get a stray interrupt, return after logging it. In the extremely
rare case of a stray interrupt to an unregistered source (such as a stray
interrupt from the 8259As when using APIC), this could result in a page
fault when it tried to walk the list of interrupt handlers to execute
INTR_FAST handlers. This bug was introduced with the intr_event changes,
so it's not present in 5.x or 6.x.

Submitted by: Mark Tinguely tinguely at casselton dot net


152865 27-Nov-2005 ru

- Allow duplicate "machine" directives with the same arguments.
- Move existing "machine" directives to DEFAULTS.


152775 24-Nov-2005 le

Fix typo.


152753 24-Nov-2005 ru

Add missing "struct" in i386/i386/machdep.c,v 1.497 by deischen@.


152662 21-Nov-2005 jhb

Don't enable PUC_FASTINTR by default in the source. Instead, enable it
via the DEFAULTS kernel configs. This allows folks to turn it that option
off in the kernel configs if desired without having to hack the source.
This is especially useful since PUC_FASTINTR hangs the kernel boot on my
ultra60 which has two uart(4) devices hung off of a puc(4) device.

I did not enable PUC_FASTINTR by default on powerpc since powerpc does not
currently allow sharing of INTR_FAST with non-INTR_FAST like the other
archs.


152651 21-Nov-2005 jhb

Expand the hack to mask the atpics if 'device atpic' is not in the kernel
during boot up. Now we do a full reset of the 8259As and setup a simple
interrupt handler (we actually borrow the apic one that just does an
immediate iret) to handle any spurious interrupts triggered by either chip.
This should fix some folks that were getting a Trap 30 during bootup of
certain SMP AMD systems. This might get pushed into the 6.0 branch as an
errata. For now a suitable workaround is to add 'device atpic' to your
kernel config.

Tested by: scottl
Helpful info from: dillon
MFC after: 1 week


152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


152588 18-Nov-2005 jhb

- Always print the trap number so that we have something to start with for
mystery traps. If we don't have a message for a given trap, just use
UNKNOWN for the message.
- Add trap messages for T_XMMFLT and T_RESERVED.

MFC after: 1 week


152537 17-Nov-2005 obrien

Fix spelling mistake.

Submitted by: kris


152531 16-Nov-2005 jhb

Revert a part of the previous commits to these files that made the NMI
IPI_STOP handling code use atomic_readandclear() to execute the restart
function on the first CPU to resume and restore the behavior of always
executing the restart function on the BSP since this is in fact what the
non-NMI IPI_STOP handler does. I did add back in a statement to clear
the restart function pointer after it is executed to match the behavior
of the non-NMI IPI_STOP handler.


152529 16-Nov-2005 jhb

Revert previous commit to these files. There isn't a race necessitating
an xchg instruction as we only try to execute the startup function if
the CPU ID is 0 (i.e. the BSP). I missed this earlier.


152528 16-Nov-2005 jhb

Fix a typo in the check for an invalid APIC. If we are told about an
I/O APIC that doesn't exist, then a read of the version register is going
to return -1 which is 0xffffffff not 0xffffff.

Tested on: i386
Tested by: Nikos Ntarmos ntarmos at ceid dot upatras dot gr
MFC after: 1 week


152359 13-Nov-2005 alc

In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock
cannot possibly occur.


152306 11-Nov-2005 ru

Add /dev/speaker support to amd64.

The following repo-copies were made (by Mark Murray):

sys/i386/isa/spkr.c -> sys/dev/speaker/spkr.c
sys/i386/include/speaker.h -> sys/dev/speaker/speaker.h
share/man/man4/man4.i386/spkr.4 -> share/man/man4/spkr.4


152224 09-Nov-2005 alc

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


152102 05-Nov-2005 marcel

Add uart(4). When both sio(4) and uart(4) can handle a serial port,
sio(4) will claim it. This change therefore only affects how ports
are handled when they are not claimed by sio(4), and in principle
will improve hardware support.

MFC after: 2 months


152076 04-Nov-2005 peter

Define M_IOAPIC the same as i386


152062 04-Nov-2005 ru

Catch up with the recent <sys/signal.h> change and make this compile.


152042 04-Nov-2005 alc

Begin and end the initialization of pvzone in pmap_init().
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().

Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.

Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.

Introduce the vm.pmap.pv_entries tunable on alpha and ia64.

Eliminate some unnecessary white space.

Discussed with: tegge (item #1)
Tested by: marcel (ia64)


151980 02-Nov-2005 ps

Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by: jhb


151979 02-Nov-2005 jhb

Change the x86 code to allocate IDT vectors on-demand when an interrupt
source is first enabled similar to how intr_event's now allocate ithreads
on-demand. Previously, we would map IDT vectors 1:1 to IRQs. Since we
only have 191 available IDT vectors for I/O interrupts, this limited us
to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC
intpins. On many machines, however, each PCI-X bus has its own APIC even
though it only has 1 or 2 devices, thus, we were reserving between 24 and
32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors. With this
change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT
vectors. Also, this change provides an API (apic_alloc_vector() and
apic_free_vector()) that will allow a future MSI interrupt source driver to
request IDT vectors for use by MSI interrupts on x86 machines.

Tested on: amd64, i386


151950 01-Nov-2005 jhb

Throw the switch and turn on STOP_NMI on in GENERIC for amd64 and i386.

Requested by: kris
Ok'd by: scottl


151948 01-Nov-2005 jkim

Catch up with ACPI-CA 20051021 import


151910 31-Oct-2005 alc

Instead of a panic()ing in pmap_insert_entry() if get_pv_entry()
fails, reclaim a pv entry by destroying a mapping to an inactive
page.

Change the format strings in many of the assertions that were recently
converted from PMAP_DIAGNOSTIC printf()s so that they are compatible
with PAE. Avoid unnecessary differences between the amd64 and i386
format strings.


151907 31-Oct-2005 jhb

Hook nve(4) up in i386 and amd64 NOTES.

MFC after: 1 week


151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


151889 30-Oct-2005 alc

Replace diagnostic printf()s by assertions. Use consistent style for
similar assertions.


151759 27-Oct-2005 peter

MFi386: bring over DEFAULTS (repocopy) and adapt. While there isn't a
4.x->6.x amd64 upgrade path, the config files are kept in approximate sync.


151758 27-Oct-2005 obrien

Remove atpic as we've changed to using the lapic timer vs. using irq0


151748 27-Oct-2005 jhb

Create a default kernel config for i386 and move 'device isa' and
'device npx' (both of which aren't really optional right now) and
'device io' and 'device mem' (to preserve POLA for 4.x users upgrading
to 6.0) from GENERIC into DEFAULTS.

Requested by: scottl
Reviewed by: scottl


151724 26-Oct-2005 peter

MFi386: Various apic fixes and tweaks
* Don't recursively panic if we've already paniced and the local apic is
now stuck.
* Add hw.apic.* tunables/sysctls for extint controls
* Change "lapic%d timer" to "cpu%d timer" intname to match i386


151719 26-Oct-2005 peter

Change PHYSMAP_SIZE to allow for more memory segments. The old value was
too low for certain Dell amd64 machines.


151658 25-Oct-2005 jhb

Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)

Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)


151643 25-Oct-2005 wpaul

Modify the pci_cfgdisable() routine to bring it more in line with
other OSes (Solaris, Linux, VxWorks). It's not necessary to write a 0
to the config address register when using config mechanism 1 to turn
off config access. In fact, it can be downright troublesome, since it
seems to confuse the PCI-PCI bridge in the AMD8111 chipset and cause
it to sporadically botch reads from some devices. This is the cause
of the missing USP ports problem I was experiencing with my Sun Opteron
system.

Also correct the case for mechanism 2: it's only necessary to write
a 0 to the ENABLE port.


151634 24-Oct-2005 jhb

Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
enabled if an attempt is made to send an IPI_STOP IPI. If the kernel
option is enabled, there is also a sysctl to change the behavior at
runtime (debug.stop_cpus_with_nmi which defaults to enabled). This
includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
that is restarted making use of atomic_readandclear() rather than
assuming that the BSP is always included in the set of restarted CPUs.
Also, the NMI handler didn't clear the function pointer meaning that
subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
of stoppedpcbs[] and always enable it for i386 and amd64 instead of
being dependent on KDB_STOP_NMI. It works fine in both the NMI and
non-NMI cases.


151633 24-Oct-2005 jhb

When restarting the BSP during cpu_reset() use a membar to ensure that
the updated cpustop_restartfunc is seen when the BSP resumes execution.
This matches the membar already present in restart_cpus().


151632 24-Oct-2005 jhb

Use xchg in Xcpustop to close a race and make cpustop_restartfunc truly
one-shot in the SMP case (before using the simple mov / cmp / mov sequence
could allow multiple CPUs to execute the restart function on resume).


151631 24-Oct-2005 jhb

- Various small whitespace and style nits.
- Use PCPU_GET(cpumask) in preference to 1 << PCPU_GET(cpuid) in a few
places.


151598 24-Oct-2005 ps

include opt_compat.h to unbreak the build


151543 21-Oct-2005 ade

Specifically panic() in the case where pmap_insert_entry() fails to
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.

Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.

Reviewed by: alc, jhb
MFC after: 1 week


151431 17-Oct-2005 jkim

Redo physical/logical CPU count.

Suggested by: jhb


151429 17-Oct-2005 davidxu

Micro optimization for context switch. Eliminate code for saving gs.base
and fs.base. We always update pcb.pcb_gsbase and pcb.pcb_fsbase
when user wants to set them, in context switch routine, we only need to
write them into registers, we never have to read them out from registers
when thread is switched away. Since rdmsr is a serialization instruction,
micro benchmark shows it is worthy to do.

Reviewed by: peter, jhb


151424 17-Oct-2005 jhb

Another bit of sx(4) removal.


151418 17-Oct-2005 jkim

Split displaying number of physical and logical cores.


151375 16-Oct-2005 obrien

For AMD processors, nullify CPUID.HTT. FreeBSD has no need for the
information it conveys, and it is only confusing people.
This fixes incorrect output in the previous commit.


151353 15-Oct-2005 jkim

Correct few MSR addresses.

PR: amd64/85852
Submitted by: Nate Eldredge <nge at cs dot hmc dot edu>


151348 14-Oct-2005 jkim

- Print number of physical/logical cores and more CPUID info.
- Add newer CPUID definitions for future use.

Many thanks to Mike Tancsa <mike at sentex dot net> for providing test
cases for Intel Pentium D and AMD Athlon 64 X2.

Approved by: anholt (mentor)


151343 14-Oct-2005 jhb

The signal code is now an int rather than a long, so update debug printfs.


151333 14-Oct-2005 ru

Sort ath_rate_* entries. Mark ath_rate_sample as the desired algorithm.

Discussed with: sam


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


151051 07-Oct-2005 glebius

Polling is now configured with help of ifconfig(8), not sysctl.

Prodded by: maxim


150952 04-Oct-2005 peter

Don't set segment registers via ptrace yet. Its not ready.


150789 01-Oct-2005 glebius

Big polling(4) cleanup.

o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.

o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.

Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.

Reviewed by: ru, sam, jhb


150663 28-Sep-2005 rwatson

Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by: bde


150648 27-Sep-2005 peter

I believe the stack underflows during early development that caused me to
add spare padding at the beginning of the pcb are long gone. Remove the
padding fields.


150647 27-Sep-2005 peter

Kill pcb_rflags. It served no purpose.

Reported by: bde


150640 27-Sep-2005 peter

Fix a minor nit that has been bugging me for a while. Fix the obvious
cases of using a 64 bit operation to zero a register. 32 bit opcodes are
smaller and supposedly faster, and clear the upper 32 bits for free.


150639 27-Sep-2005 peter

Add a bare minimum (but wrong) R_X86_64_JMP_SLOT relocation type for
kernel modules. We actually need to include any addends and the symbol
offset value, but for gcc/binutils didn't set it anywhere I've found on
'cc -fpic -shared' kernel modules.


150638 27-Sep-2005 peter

Don't report Maxmem as 'real memory'. It is really the highest address
available and can give the wrong impression when there are memory holes.
Report the total amount of usable memory that we detected instead of the
highest address.


150637 27-Sep-2005 peter

MFi386: If we take a trap with interrupts disabled while in a critical
section, don't enable them if we're servicing an NMI.


150635 27-Sep-2005 peter

Don't let the upper bits of %dr6/%dr7 get set.

Submitted by: Nate Eldredge <neldredge@math.ucsd.edu>


150631 27-Sep-2005 peter

Implement 32 bit getcontext/setcontext/swapcontext on amd64. I've added
stubs for ia64 to keep it compiling. These are used by 32 bit apps such
as gdb.


150627 27-Sep-2005 jhb

Add a new atomic_fetchadd() primitive that atomically adds a value to a
variable and returns the previous value of the variable.

Tested on: i386, alpha, sparc64, arm (cognet)
Reviewed by: arch@
Submitted by: cognet (arm)
MFC after: 1 week


150546 25-Sep-2005 phk

__RMAN_RESOURCE_VISIBLE is not actually needed.


150473 22-Sep-2005 ups

Fix the "fpudna: fpcurthread == curthread XXX times" problem.

Tested by: kris@
Reviewed by: peter@
MFC after: 3 days


150335 19-Sep-2005 rwatson

Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after: 1 week


150270 18-Sep-2005 csjp

Introduce a kernel config for the Mandatory Access Control framework.
This kernel config briefly describes some of the major MAC policies
available on FreeBSD. The hope is that this will raise the awareness
about MAC and get more people interested.

Discussed with: scottl


150266 18-Sep-2005 imp

MFi386: pci attribute allocation fixes.


150182 15-Sep-2005 jhb

Stop using the '+' constraint modifier with inline assembly. The '+'
constraint is actually only allowed for register operands. Instead, use
separate input and output memory constraints.

Education from: alc
Reviewed by: alc
Tested on: i386, alpha
MFC after: 1 week


150003 11-Sep-2005 obrien

Canonize the include of acpi.h.


149925 10-Sep-2005 marcel

Move the prototypes of db_md_set_watchpoint(), db_md_clr_watchpoint()
and db_md_list_watchpoints() to ddb/ddb.h.


149873 08-Sep-2005 scottl

Hook up the hptmv driver for amd64.

MFC After: 3 days


149786 04-Sep-2005 alc

Eliminate unnecessary TLB invalidations by pmap_enter(). Specifically,
eliminate TLB invalidations when permissions are relaxed, such as when a
read-only mapping is changed to a read/write mapping. Additionally,
eliminate TLB invalidations when bits that are ignored by the hardware,
such as PG_W ("wired mapping"), are changed.

Reviewed by: tegge


149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


149526 27-Aug-2005 jkoshy

- Special-case NMI handling on the AMD64.

On entry or exit from the kernel the 'alltraps' and 'doreti' code
used taken by normal traps disables interrupts to protect the
critical sections where it is setting up %gs.

This protection is insufficient in the presence of NMIs since NMIs
can be taken even when the processor has disabled normal interrupts.
Thus the NMI handler needs to actually read MSR_GBASE on entry to
the kernel to determine whether a swap of %gs using 'swapgs' is
needed. However, reads of MSRs are expensive and integrating this
check into the 'alltraps'/'doreti' path would penalize normal
interrupts.

- Teach DDB about the 'nmi_calltrap' symbol.

Reviewed by: bde, peter (older versions of this change)


149484 26-Aug-2005 alc

Remedy the following three problems:

1. The amd64 pmap, unlike the i386 pmap, maintains a reference count
for each page directory (PD) page. However, in the transformation
of the i386 pmap into the amd64 pmap, operations, such as
pmap_copy() and pmap_object_init_pt(), that create 2MB "superpage"
mappings by setting the PG_PS bit in a PD entry were not modified
to adjust the underlying PD page's reference count. Consequently,
superpage mappings could disappear prematurely.

2. pmap_object_init_pt() could crash or corrupt memory if either the
virtual address range being mapped crosses a 1GB boundary in the
virtual address space or nothing is mapped in the 1GB area.

3. When pmap_allocpte() destroys a 2MB "superpage" mapping it does not
reduce the pmap's resident count accordingly. It should. (This
bug is inherited from i386.)

Discussed with: peter
Reviewed by: tegge


149475 25-Aug-2005 ups

NMI handler should not enable interrupts.

Tested by: kris@
MFC after: 3 weeks


149377 22-Aug-2005 alc

Pass the PDE from pmap_remove() to pmap_remove_page() so that the latter
procedure doesn't have to recompute it.


149364 22-Aug-2005 alc

Change pmap_extract() and pmap_extract_and_hold() to use PG_FRAME rather
than ~PDRMASK to extract the physical address of a superpage from a PDE.
The use of ~PDRMASK is problematic if the PDE has PG_NX set. Specifically,
the PG_NX bit will be included in the physical address if ~PDRMASK is used.

Reviewed by: peter


149341 20-Aug-2005 alc

Introduce pmap_pml4e_to_pdpe() and pmap_pdpe_to_pde() and use them to avoid
recomputation of the pml4e and pdpe in pmap_copy(), pmap_protect(), and
pmap_remove().


149337 20-Aug-2005 stefanf

Move MINSIGSTKSZ from <machine/signal.h> to <machine/_limits.h> and rename
it to __MINSIGSTKSZ. Define MINSIGSTKSZ in <sys/signal.h>.

This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.

Discussed with: bde


149300 19-Aug-2005 pjd

Avoid code duplication and implement bitcount32() function in systm.h only.

Reviewed by: cperciva
MFC after: 3 days


149272 19-Aug-2005 alc

Correct a performance bug in revision 1.462. The effect of the bug is to
execute the outer loop in procedures such as pmap_protect() many more times
than necessary.

Reviewed by: tegge


149233 18-Aug-2005 jhb

Add aliases for atomic operations on 64-bit integers just like other
64-bit platforms.

MFC after: 1 week


149058 14-Aug-2005 alc

Simplify the page table page reference counting by pmap_enter()'s change of
mapping case.

Eliminate a stale comment from pmap_enter().

Reviewed by: tegge


148976 11-Aug-2005 alc

Eliminate unneeded diagnostic code.

Eliminate an unused #include. (Kernel stack allocation and deallocation
long ago migrated to the machine-independent code.)


148971 11-Aug-2005 alc

Eliminate unneeded diagnostic code.

Reviewed by: tegge


148952 11-Aug-2005 alc

Decouple the unrefing of a page table page from the removal of a pv entry.
In other words, change pmap_remove_entry() such that it no longer unrefs
the page table page. Now, it only removes the pv entry.

Reviewed by: tegge


148835 07-Aug-2005 alc

When support for 2MB/4MB pages was added in revision 1.148 an error was
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.

The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.


148667 03-Aug-2005 jeff

- Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
Concept code from: Neal Fachan <neal@isilon.com>


148664 03-Aug-2005 jeff

- Improve the definition of INKERNEL() to include the DMAP area and the
proper start of the kernel area.

Discussed with: peter


148540 29-Jul-2005 jhb

Move MODULE_DEPEND() statements for SYSVIPC dependencies to linux_ipc.c
so that they aren't duplicated 3 times and are also in the same file as
the code that depends on the SYSVIPC modules.


148367 24-Jul-2005 mux

Add back ed(4) in amd64 GENERIC. It now works nicely and since those
chips are commonly found, it makes sense to have it in GENERIC. This
is a candidate for a RELENG_6 MFC.

Approved by; peter
Requested by: pav
Tested by: pav


148286 22-Jul-2005 ru

Fallout from the previous revision: lnc isn't quite ready for amd64 yet.


148275 22-Jul-2005 obrien

Fix $FreeBSD$.


148267 21-Jul-2005 peter

Like on i386, bypass lock prefix for atomic ops on !SMP kernels.


148264 21-Jul-2005 peter

MFi386: add vpd driver (vital product data.. model & serial numbers etc)


148263 21-Jul-2005 peter

Add the ed driver for lint building. The PCI instances are still useful.
In theory, there are no isa slots on any amd64/em64t systems, but it
doesn't hurt to keep these tiny fragments compiling.


148262 21-Jul-2005 peter

Actually create the double fault stack page for AP cpus so that we have a
chance of getting a working double fault instead of converting it to an
instant triple fault reset.


148231 21-Jul-2005 phk

Make the facility for recognizing BIOS-signatures more general
and return a printable representation.

This fixes recognition of the PC Engines WRAP and improves the
recognition of the Soekris boards (Bios version can now be
seen in the dmesg output for instance).

Also, add watchdog support for PCM-582x platforms.

Submitted by: Adrian Steinmann <ast@marabu.ch>
Slightly changed by: phk
PR: 81360


148217 21-Jul-2005 jkim

Fix smbios(4) and add support for amd64

Approved by: anholt (mentor)


148211 20-Jul-2005 anholt

Add the latest r300 code from r300.sf.net. This is based on the patch supplied
by Vladimir Dergachev for inclusion in DRM CVS, with minor modifications for
FreeBSD CVS and the appropriate license from Nicolai Haehnle on r300_reg.h.
Fixes hangs when using r300.sf.net userland, tested on a Radeon 9600 on amd64.


148067 15-Jul-2005 jhb

Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables. This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after: 3 days
Tested on: i386, alpha, sparc64
Compiled on: ia64, powerpc, amd64
Kernel toolchain busted on: arm


147991 14-Jul-2005 kensmith

Add recently invented COMPAT_FREEBSD5 option.

MFC after: 3 days


147975 13-Jul-2005 jhb

Regen.


147974 13-Jul-2005 jhb

Make a pass through all the compat ABIs sychronizing the MP safe flags
with the master syscall table as well as marking several ABI wrapper
functions safe.

MFC after: 1 week


147969 13-Jul-2005 jhb

Fixup some more fallout from the lapic/i8254 changes:
- Make sure timer0_max_count is set to a correct value in the lapic case.
- Revert i8254_restore() to explicitly reprogram timer 0 rather than
calling set_timer_freq() to do it. set_timer_freq() only reprograms
the counter if the max count changes which it never does on resume. This
unbreaks suspend/resume for several people.

Tested by: marks, others
Reviewed by: bde
MFC after: 3 days


147889 10-Jul-2005 davidxu

Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)


147855 09-Jul-2005 jhb

Some cleanups and tweaks to some of the atomic.h files in preparation for
further changes and fixes in the future:
- Use aliases via macros rather than duplicated inlines wherever possible.
- Move all the aliases to the bottom of these files and the inline
functions to the top.
- Add various comments.
- On alpha, drop atomic_{load_acq,store_rel}_{8,char,16,short}().
- On i386 and amd64, don't duplicate the extern declarations for functions
in the two non-inline cases (KLD_MODULE and compiler doesn't do inlines),
instead, consolidate those two cases.
- Some whitespace fixes.

Approved by: re (scottl)


147783 05-Jul-2005 jhb

Remove a || 1 that crept into the i8254 commit and was subsequently
copied and pasted. I had actually tested without this change in my
trees as had the other testers.

Reported by: bde, Rostislav Krasny rosti dot bsd at gmail dot com
Approved by: re (scottl)
Pointy hat to: jhb


147744 02-Jul-2005 thompsa

Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).

This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.

IP_HDR_ALIGNED_P()
IP6_HDR_ALIGNED_P()

Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.

PR: ia64/81284
Obtained from: NetBSD
Approved by: re (dwhite), mlaier (mentor)


147740 02-Jul-2005 marcel

Fix a buglet that was present in the ia64 code and that got inherited
by amd64 and i386: For buffered writes we collect data and write it
out a ${DEV_BSIZE}-sized block at a time. The fragsz variable is used
to keep track of how much data we have collected in the buffer so far
and it's reset to zero immediately after writing a block to the dump
device.
When the last, possibly partially filled buffer is flushed, we didn't
reset fragsz to 0 and as such would stop reflecting reality. Since we
currently only need to do buffered writes once, this isn't a problem.
However, when kernel dumps are made by hand (say by callling doadump
from within DDB), the improperly cleared state from the first call to
dumpsys causes the next call to dumpsys to create an invalid code file.
This change resets fragsz after flushing the partially filled buffer so
that it fixes the two problems at once.

Approved by: re (scottl)


147733 01-Jul-2005 peter

MFi386: r1.221: use simple timecounter that is aware of irq0 being off.

Approved by: re


147692 30-Jun-2005 peter

Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
sscanf fails. They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note
that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet. We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by: re


147687 30-Jun-2005 peter

Sync i386->amd64.
* Add ichwd (The Intel EM64T folks have an ICH)
* Cosmetic comment syncs
* Merge cpufreq change over to NOTES
* add pbio (it compiles, but isn't useful since no boxes have ISA slots)
* copy ath settings (note: wlan disabled here since its in global NOTES)
* copy profiling, including fixing a previous i386->amd64 merge typo.

Approved by: re (blanket i386 <-> amd64 sync/convergence)


147677 30-Jun-2005 peter

Add a special-case handler for general protection faults. It appears to
be possible to get the swapgs state reversed if doreti traps during
the iretq. Attempt to handle this. load_gs() might need special
handling too. Running the kernel with the user's TLS and the
kernel's PCPU space interchanged would be bad(TM).

Discovered as a result of a conversation with: bde
Approved by: re


147674 29-Jun-2005 peter

Move the KDB_STOP_NMI option from opt_global.h to opt_kdb.h

Approved by: re


147671 29-Jun-2005 peter

Switch AMD64 and i386 platforms to using ELF as their kernel crash
dump format. The key reason to do this is so that we can dump sparse
address space. For example, we need to be able to skip the PCI hole
just below the 4GB boundary. Trying to destructively dump MMIO device
registers is Really Bad(TM). The frequent result of trying to do a
crash dump on a machine with 4GB or more ram was ugly (lockup or reboot).

This code has been taken directly from the IA64 dump_machdep.c code,
with just a few (mostly minor) mods.

Introduce a dump_avail[] array in the machdep.c code so that we have a
source of truth for what memory is present in a machine that needs to be
dumped. We can't use phys_avail[] because all sorts of things slice
memory out of it that we really need to dump. eg: the vm page array
and the dmesg buffer. dump_avail[] is pretty much an unmolested version
of phys_avail[]. It does have Maxmem correction.

Bump the i386 and amd64 dump format to version 2, but nothing actually
uses this. amd64 was actually using the i386 dump version number.

libkvm support to follow.

Approved by: re


147653 29-Jun-2005 jhb

Increase MAXCPU to 16 in SMP kernels so that APIC IDs from 0 to 15 are
allowed for CPUs.

Tested by: amd64 at cybernetwork dot org
Approved by: re (scottl)
MFC after: 1 week


147604 25-Jun-2005 ups

Disable the interrupts in trap_fatal before calling kdb_trap.
(required now that critical sections no longer block interrupts)

Reviewed by: jhb@
Approved by: re (scottl)
Tested by: kris@,glebius@


147588 24-Jun-2005 jhb

Correct the amount of data to allocate in these local copies of
exec_copyin_strings() to catch up to rev 1.266 of kern_exec.c. This fixes
panics on amd64 with compat binaries since exec_free_args() was freeing
more memory than these functions were allocating and the mismatch could
cause memory to be freed out from under other concurrent execs.

Approved by: re (scottl)


147569 24-Jun-2005 peter

Various trivial comment fixes

Approved by: re


147568 24-Jun-2005 peter

Eliminate a source of 'trap xx with interrupts disabled'. I was jumping to
the wrong backend code and neglecting to re-enable interrupts after the
stack prep.

Approved by: re


147566 24-Jun-2005 peter

MFi386: 1.258: Minor cleanups

Approved by: re (blanket i386<->amd64 sync)


147565 24-Jun-2005 peter

Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by: re


147504 20-Jun-2005 obrien

Add .cvsignore files just like in sys/<arch>/compiled, this keeps CVS from
questing kernel config files not in CVS.

Approved by: re(kensmith)


147378 14-Jun-2005 ups

Move IPI_PREEMPTION option from global NOTES file to i386+amd64 specific
NOTES files.

Approved by: re (scottl)


147217 10-Jun-2005 alc

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


147191 09-Jun-2005 jkoshy

MFP4:

- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.

- bug fixes & documentation.


147181 09-Jun-2005 ups

Add IPI support for preempting a thread on another CPU.

MFC after: 3 weeks


147142 08-Jun-2005 sobomax

Regen after addition of linux_getpriority wrapper.

PR: kern/81951
Submitted by: Andriy Gapon <avg@icyb.net.ua>
MFC after: 1 week


147141 08-Jun-2005 sobomax

Properly convert FreeBSD priority values into Linux values in the
getpriority(2) syscall.

PR: kern/81951
Submitted by: Andriy Gapon <avg@icyb.net.ua>


146807 30-May-2005 rwatson

Rebuild generated system call definition files following the addition of
the audit event field to the syscalls.master file format.

Submitted by: wsalamon
Obtained from: TrustedBSD Project


146806 30-May-2005 rwatson

Introduce a new field in the syscalls.master file format to hold the
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used. The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.

Submitted by: wsalamon
Obtained from: TrustedBSD Project


146799 30-May-2005 jkoshy

Kernel hooks to support PMC sampling modes.

Reviewed by: alc


146794 29-May-2005 marcel

Create nexus in configure_first() instead of in configure(). This
makes sure that sysinit tasks that run after configure_first(),
but before configure() have a nexus to hang devices off.


146767 29-May-2005 schweikh

Chop a '>' in a feature name (RSVD2>) that snuck in;
this now balances the <> flags displayed at boot, e.g. without this
Features2=0x41d<SSE3,RSVD2>,MON,DS_CPL,CNTX-ID>

MFC after: 1 week


146734 29-May-2005 nyan

Remove bus_{mem,p}io.h and related code for a micro-optimization on i386
and amd64. The optimization is a trivial on recent machines.

Reviewed by: -arch (imp, marcel, dfr)


146721 28-May-2005 nyan

Change the spkr_set_pitch() function to a macro to fix low level profiling.


146582 24-May-2005 damien

Add new ral(4) and ural(4) drivers.

Approved by: silby (mentor)


146551 23-May-2005 obrien

Sync the style of these two files.


146507 22-May-2005 peter

Fix some of the problems Bruce observed with this code.


146492 22-May-2005 peter

MFi386: set PMC vector


146491 22-May-2005 peter

MFi386: remove comment


146461 21-May-2005 peter

For non-profiling kernels, there were two symbols assigned to the same
address. One was alltraps_with_regs_pushed, the other was calltrap.

When the stack tracer walks up, it looks for magic symbol names to
determine how to parse non-standard stack frames, such as a trapframe.
It was looking for "calltrap". Which of the two symbols you got depended
on things like Phase of moon, etc. If you were unlucky, you got a
garbage stack trace for things like 'debug.trace_on_panic', which would
completely hide the actual source of the problem.


146457 20-May-2005 obrien

Adjust the start_ap delay to match i386.


146456 20-May-2005 obrien

Fix mismerge in rev 1.226: wait 5 seconds as the comment documents,
not .5 seconds.


146214 14-May-2005 nyan

- Move bus dependent defines to {isa,cbus}_dmareg.h.
- Use isa/isareg.h rather than <arch>/isa/isa.h.

Tested on: i386, pc98


146211 14-May-2005 nyan

- Move timerreg.h to <arch>/include and split i8253 specific defines into
i8253reg.h, and add some defines to control a speaker.
- Move PPI related defines from i386/isa/spkr.c into ppireg.h and use them.
- Move IO_{PPI,TIMER} defines into ppireg.h and timerreg.h respectively.
- Use isa/isareg.h rather than <arch>/isa/isa.h.

Tested on: i386, pc98


146172 13-May-2005 nectar

Default hyperthreading on in -CURRENT. No seatbelts in CURRENT (^_^)

Requested by: peter, jhb


146170 13-May-2005 nectar

Add a knob for disabling/enabling HTT, "machdep.hyperthreading_allowed".
Default off due to information disclosure on multi-user systems.

Submitted by: cperciva
Reviewed by: jhb


146135 12-May-2005 nyan

Remove unused IO_NPX* defines.


145911 05-May-2005 peter

Remove unused (besides being initialized) variable.


145889 04-May-2005 davidxu

Turn on PCB_FULLCTX in set_regs to fully restore context
set by debugger.


145727 30-Apr-2005 dwhite

Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by: ups


145531 25-Apr-2005 scottl

Remove the ACPI_MAX_THREADS option.


145433 23-Apr-2005 davidxu

Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.


145345 20-Apr-2005 marcel

Revert previous commit: The hwpmc(4) driver compiles on all platforms.


145343 20-Apr-2005 ps

Don't enter the debugger if KDB_UNATTENDED is set or if
debug.debugger_on_panic=0.

MFC after: 2 weeks


145337 20-Apr-2005 marcel

o Reverse the inclusion chain from MD->MI to MI->MD by removing the
inclusion of <sys/pmc.h> and depending on being included from
that header file.
o Include any MD specific header files that otherwise need to be
included from MI files.

Ok'd: jkoshy@


145307 19-Apr-2005 imp

Move this to the specific architectures that are supported. #ifdef foo
in sys/pmc.h precludes it from working on !i386, !amd64. When that changes,
it can be moved back into conf/NOTES.


145256 19-Apr-2005 jkoshy

Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities
and documentation into -CURRENT.

Bump FreeBSD_version.

Reviewed by: alc, jhb (kernel changes)


145253 18-Apr-2005 imp

Break out the definition of bus_space_{tag,handle}_t and a few other types
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).

Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb


145132 16-Apr-2005 anholt

Update to DRM CVS as of 2005-04-12, bringing many changes:
- Split core DRM routines back into their own module, rather than using the
nasty templated system like before.
- Development-class R300 support in radeon driver (requires userland pieces, of
course).
- Mach64 driver (haven't tested in a while -- my mach64s no longer fit in the
testbox). Covers Rage Pros, Rage Mobility P/M, Rage XL, and some others.
- i915 driver files, which just need to get drm_drv.c fixed to allow attachment
to the drmsub device. Covers i830 through i915 integrated graphics.
- savage driver files, which should require minimal changes to work. Covers the
Savage3D, Savage IX/MX, Savage 4, ProSavage.
- Support for color and texture tiling and HyperZ features of Radeon.

Thanks to: scottl (much p4 handholding)
Jung-uk Kim (helpful prodding)
PR: [1] kern/76879, [2] kern/72548
Submitted by: [1] Alex, lesha at intercaf dot ru
[2] Shaun Jurrens, shaun at shamz dot net


145124 15-Apr-2005 peter

MFi386: sync rtc code - don't setup an interrupt handler for irq0 when
the lapic timer is active. Don't enable periodic interrupts unless we are
using them. Replace spl protection with a spinlock.


145123 15-Apr-2005 peter

MFi386: remove NO_MIXED_MODE


145122 15-Apr-2005 peter

MFi386: use the lapic timer for UP systems that are using the apic so that
IRQ0 and mixed mode isn't a problem anymore. This removes mixed mode
support because nothing is left that uses it.


145121 15-Apr-2005 peter

MFi386: use c99 types


145120 15-Apr-2005 peter

Show that I can actually count.


145119 15-Apr-2005 peter

MFi386: track bus.h changes (unsplit bus_${machine}.h)


145077 14-Apr-2005 peter

Implement 32-bit compatable fsbase/gsbase methods so that we can run
(newer) unmodified static i386 binaries again.


145055 14-Apr-2005 jhb

Always use the local APIC timer, even on UP machines.


144994 13-Apr-2005 anholt

Follow i386's suit and include AGP support in the generic kernel.


144971 12-Apr-2005 jhb

Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.


144968 12-Apr-2005 jhb

The memory operands to fldcw and ldmxcsr are inputs, not outputs.


144884 10-Apr-2005 alc

Align the entry point to assembly language functions to a 16-byte boundary.
(The Opteron's instruction fetcher reads instructions from the L1 cache in
16-byte, aligned packets.)


144868 10-Apr-2005 alc

Eliminate a conditional branch and as a side-effect eliminate a branch to
a return instruction. (The latter is discouraged by the Opteron
optimization manual because it disables branch prediction for the return
instruction.)

Reviewed by: bde


144813 08-Apr-2005 obrien

'apic' isn't optional on amd64, so don't speak as if it is.


144696 06-Apr-2005 cperciva

Fully initialize the required TSS fields so that the io permission
bitmap is set correctly.

Patch from: peter
Security: FreeBSD-SA-05:03.amd64


144670 05-Apr-2005 jhb

Fix a change in a debug printf I missed in an earlier commit.


144637 04-Apr-2005 jhb

Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more


144544 02-Apr-2005 netchild

The file machine/ieeefp.h needs sys/cdefs.h on amd64 and i386 after my
compiler features tests. This is ok, since machine/ieeefp.h is an internal
interface. But floatingpoint.h is a public interface and some ports use it,
so include sys/cdefs.h in the amd64 and i386 version of floatingpoint.h.

Note: some architectures don't provide recursive inclusion protection in
floatingpoint.h, namely alpha and ia64. Except for this part and now the
include of sys/cdefs.h, all those files are equal (from a compiler POV),
so they could be moved to only one version in src/include/.

Approved by: joerg


144449 31-Mar-2005 jhb

- Use a custom version of copyinuio() to implement readv/writev using
kern_readv/writev.
- Use kern_sched_rr_get_interval() rather than the stackgap.


144441 31-Mar-2005 jhb

- Fix some sign extension problems with implicit 32 to 64 bit conversions.
- Fix the mmap2() wrapper to not truncate high addresses.

Submitted by: Christian Zander


144427 31-Mar-2005 obrien

MFR5: rev 1.421.2.6: Enable support for 32-bit Linux binaries by default.
There are too many questions in freebsd-amd64@ about how to enable Linux
support that it seems a required piece of functionality. Thus we should
just have it on by default.


144423 31-Mar-2005 scottl

Glue the arcmsr driver into the tree.


144354 30-Mar-2005 peter

Checkpoint today's tidy-up of the WIP disassembler. It now agrees with
objdump --disassemble when disassembling itself in userland. I've added
the cmovCC instruction group and tweaked a bunch of size sensitive array
indexes to either fix my mistakes and/or force it to work by any means
necessary.

I'm committing this because it is usable enough to see what is going on
when single stepping via ddb.

It might still tell lies, but its lies will be far more subtle now. I'm
not sure that this is a good thing or not.


144353 30-Mar-2005 peter

Commit my checkpoint of db_disasm.c that I hacked to understand some amd64
instructions as it was when I dropped it back in May 31, 2003. I'm
committing this as an intermediate stage because back then I thought I
understood what I was doing with this file.


144011 23-Mar-2005 das

Make ps_nargvstr and ps_nenvstr unsigned. This fixes an input
validation error in procfs/linprocfs that can be exploited by local
users to cause a kernel panic. All versions of FreeBSD with the patch
referenced in SA-04:17.procfs have this bug, but versions without that
patch have a more serious bug instead. This problem only affects
systems on which procfs or linprocfs is mounted.

Found by: Coverity Prevent analysis tool
Security: Local DOS


143985 22-Mar-2005 sobomax

Add USB Communication Device Class Ethernet driver. Originally written for
FreeBSD based on aue(4) it was picked by OpenBSD, then from OpenBSD ported
to NetBSD and finally NetBSD version merged with original one goes into
FreeBSD.

Obtained from: http://www.gank.org/freebsd/cdce/
NetBSD
OpenBSD


143809 18-Mar-2005 murray

Add a comment to note that pseudo-device bpf is required for DHCP.
This is mentioned in the Handbook but it is not as obvious to new
users why bpf is needed compared to the other largely self-explanatory
items in GENERIC.

PR: conf/40855
MFC after: 1 week


143801 18-Mar-2005 phk

s/SLIST/STAILQ/
/imp/a\
pointy hat
.


143721 16-Mar-2005 imp

Remove comments relevant only to pc98 as there are no amd64 pc98 machines.


143711 16-Mar-2005 obrien

Make it clear nve needs mii, and shorten long comment line.


143674 16-Mar-2005 iedowse

Enable ehci by default on i386 and amd64. It had got to the stage
where having this disabled was actually hurting us, since so many
BIOSes include legacy USB emulation that takes control of all usb
ports and only the ehci driver knows how to disable it.


143658 15-Mar-2005 das

Remove fpsetsticky(). This was added for SysV compatibility, but due
to mistakes from day 1, it has always had semantics inconsistent with
SVR4 and its successors. In particular, given argument M:

- On Solaris and FreeBSD/{alpha,sparc64}, it clobbers the old flags
and *sets* the new flag word to M. (NetBSD, too?)
- On FreeBSD/{amd64,i386}, it *clears* the flags that are specified in M
and leaves the remaining flags unchanged (modulo a small bug on amd64.)
- On FreeBSD/ia64, it is not implemented.

There is no way to fix fpsetsticky() to DTRT for both old FreeBSD apps
and apps ported from other operating systems, so the best approach
seems to be to kill the function and fix any apps that break. I
couldn't find any ports that use it, and any such ports would already
be broken on FreeBSD/ia64 and Linux anyway.

By the way, the routine has always been undocumented in FreeBSD,
except for an MLINK to a manpage that doesn't describe it. This
manpage has stated since 5.3-RELEASE that the functions it describes
are deprecated, so that must mean that functions that it is *supposed*
to describe but doesn't are even *more* deprecated. ;-)

Note that fpresetsticky() has been retained on FreeBSD/i386. As far
as I can tell, no other operating systems or ports of FreeBSD
implement it, so there's nothing for it to be inconsistent with.

PR: 75862
Suggested by: bde


143598 14-Mar-2005 scottl

Refactor the bus_dma header files so that the interface is described in
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.


143450 12-Mar-2005 scottl

MFCi386: Prevent integer underflow that could result in all memory being
consumed.


143442 12-Mar-2005 obrien

FreeBSD consumer bits of the nForce MCP NIC binary blob.

Demanded by: DES
Encouraged by: scottl
Obtained from: q@onthenet.com.au (partially)
KNF'ed by: obrien


143434 11-Mar-2005 peter

Remove diffs to i386 version that came in via the compiler support ifdefs.
This changes things like whitespace, inconsistent use of #ifndef vs
#if !defined(), different macro argument orders, mismatched comments, etc.


143433 11-Mar-2005 peter

MFi386: reduce apic clock interrupt rate


143430 11-Mar-2005 peter

Fix a mismerge of i386 rev 1.209


143429 11-Mar-2005 peter

Match i386 rev 1.38 with __cplusplus support


143294 08-Mar-2005 mux

Fixup KTR traces.


143284 08-Mar-2005 mux

Use __func__ in the KTR_BUSDMA traces. This avoids copy and paste
errors like in the bus_dmamap_load_mbuf_sg() case where we were wrongly
displaying the function name as bus_dmamap_load_mbuf.


143202 07-Mar-2005 scottl

Remove dead code.


143198 07-Mar-2005 sobomax

Regen after addition of linux_nosys handler.


143197 07-Mar-2005 sobomax

Handle unimplemented syscall by instantly returning ENOSYS instead of sending
signal first and only then returning ENOSYS to match what real linux does.

PR: kern/74302
Submitted by: Travis Poppe <tlp@LiquidX.org>


143162 05-Mar-2005 des

MFi386: use TUNABLE_ULONG_FETCH to retrieve hw.physmem.


143159 05-Mar-2005 des

Replace goto with continue.


143063 02-Mar-2005 joerg

netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.

Submitted by: netchild
Reviewed by: various developers on arch@, some time ago


142866 01-Mar-2005 obrien

Catch up with the "physical memory" sysctl change.
(MFi386: rev 1.608)


142841 28-Feb-2005 peter

MFi386: Sync whitespace and an abbreviation


142840 28-Feb-2005 peter

MFi386: Update alc's copyright notice


142839 28-Feb-2005 peter

MFi386: Bring over John's local apic timer code


142765 28-Feb-2005 pjd

Typo.


142733 28-Feb-2005 obrien

Spell "options" correctly as "options ".


142732 28-Feb-2005 obrien

Connect "options MP_WATCHDOG" to the LINT builds.


142722 27-Feb-2005 obrien

MFi386: rev 1.3:
- Add debug.watchdog tunable, so we can specify watchdog CPU from loader
which will help to debug hangs on boot.
- Remove 'U' from debug.watchdog sysctl definition, so if we set it to '-1'
it really shows '-1'.
- Fix comment.


142517 25-Feb-2005 trhodes

Remove recently added note about DEVICE_POLLING not working with SMP.
Remove warning from kern_poll.c to allow DEVICE_POLLING to be built with SMP.

Discussed with: ru, glebius


142436 25-Feb-2005 delphij

Remove acpi_perf from {ARCH}/conf/NOTES, to make tinderbox happy.

Reported by: tinderbox
Inspired by: acpi_perf build structure removal commit


142280 23-Feb-2005 trhodes

According to kern_poll.c, you cannot use DEVICE_POLLING with SMP. Add a
commen about this in every NOTES file which lists DEVICE_POLLING.

PR: 46793
MFC: 1 day


142261 22-Feb-2005 jhb

MFi386: r1.17: Treat pin 0 as IRQ 0 rather than ExtINT if mixed mode is not
enabled by the enumerator.


142257 22-Feb-2005 jhb

- Add a new quirk to indicate that pin 0 of the first I/O APIC is really
IRQ 0 and not an ExtINT pin. The MADT enumerators ignore the PC-AT flag
and ignore overrides that map IRQ 0 to pin 2 when this quirk is present.
- Add a block comment above the quirks to document each quirk so that we
can use more verbose descriptions quirks.

MFC after: 2 weeks


142107 19-Feb-2005 ru

Use a common multi-inclusion protection, and add such a
protection to alpha/include/exec.h.


142057 18-Feb-2005 jhb

- Add a custom version of exec_copyin_args() to deal with the 32-bit
pointers in argv and envv in userland and use that together with
kern_execve() and exec_free_args() to implement linux_execve() for the
amd64/linux32 ABI without using the stackgap.
- Implement linux_nanosleep() using the recently added kern_nanosleep().
- Use linux_emul_convpath() instead of linux_emul_find() in
exec_linux_imgact_try().

Tested by: cokane
Silence on: amd64


141944 15-Feb-2005 njl

MFi386 rev 1.61: Fix a few bugs in the legacy cpu attachment ivars.


141491 08-Feb-2005 peter

MFi386: read from RTC_INTR after writing to RTC_STATUSB


141391 06-Feb-2005 phk

Since we are quite unlikely to ever face another platform which
uses the i8237 without trying to emulate the PC architecture move
the register definitions for the i8237 chip into the central include
file for the chip, except for the PC98 case which is magic.

Add new isa_dmatc() function which tells us as cheaply as possible
if the terminal count has been reached for a given channel.


141380 06-Feb-2005 njl

Staticize the legacy cpu devclasses and revert the name for the acpi_cpu
devclass. As pointed out by dfr@, devclasses don't have to share the same
linkage if multiple drivers have the same name. Newbus should match the
devclasses based on name and allocate non-conflicting unit numbers.


141378 06-Feb-2005 njl

Finish the job of sorting all includes and fix the build by including
malloc.h before proc.h on sparc64. Noticed by das@

Compiled on: alpha, amd64, i386, pc98, sparc64


141374 05-Feb-2005 njl

Make cpu_est_clockrate() more accurate by disabling interrupts for the
millisecond it is calibrating. Suggested by jhb@ and bde@. Don't clobber
the tsc_freq with the new value since it isn't accurate enough for
timecounters and the timecounter system as a whole needs support for
changing rates before we do this. Subtract 0.5% from our measurement
to account for overhead in DELAY. Note that this interface is for
estimating the clockrate and needs to work well at runtime so doing a full
calibration including disabling interrupts for a second is not feasible.


141369 05-Feb-2005 njl

Build cpufreq and acpi_perf on platforms that are likely to be able to
use them.


141367 05-Feb-2005 alc

Implement proper handling of PG_G mappings in pmap_protect(). (I don't
believe that this omission mattered before the introduction of MemGuard.)

Reviewed by: tegge@
MFC after: 1 week


141244 04-Feb-2005 njl

MFi386: Merge updates to the cpu pseudo-driver. Compile, not runtime
tested.


141237 04-Feb-2005 njl

Add an implementation of cpu_est_clockrate(9). This function estimates the
current clock frequency for the given CPU id in units of Hz.


140992 29-Jan-2005 sobomax

o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
linuxlator/i386.

Obtained from: DragonFlyBSD (partially)
MFC after: 2 weeks


140555 21-Jan-2005 peter

JumboMFi386: use bitmapped IPI handler. Update elcr and default mptable
config handler. Tidy up various local apic initialization.


140554 21-Jan-2005 peter

MFi386: handle PSL_T properly across fork. Typo fix.


140553 21-Jan-2005 peter

MFi386: whitespace, copyright header, etc updates


140552 21-Jan-2005 peter

MFi386: use %rip - 1 for the symbol search address (for noreturn funcs)


140257 14-Jan-2005 jhb

Remove redundant code to drop per-thread debug register state from
cpu_exit() as this is already performed in cpu_thread_exit() and the
debug state is per-thread rather than per-process.


140032 11-Jan-2005 imp

There are no PC98 amd64 machines, so gc a few stray ifdefs.


139840 07-Jan-2005 scottl

Introduce bus_dmamap_load_mbuf_sg(). Instead of taking a callback arg, this
cuts to the chase and fills in a provided s/g list. This is meant to optimize
out the cost of the callback since the callback doesn't serve much purpose for
mbufs since mbuf loads will never be deferred. This is just for amd64 and
i386 at the moment, other arches will be coming shortly.


139817 07-Jan-2005 imp

These are no longer relevant. They are scripts for extracting hints
from 4.x kernel config files. User's wishing to upgrade from 4.x to 6
will need to go through 5.x, or grab this script from there. These
scripts will remain in RELENG_5...


139731 05-Jan-2005 imp

Begin all license/copyright comments with /*-


139730 05-Jan-2005 imp

PC98 will never be defined for amd64


139699 05-Jan-2005 kuriyama

o Use tab instead of spaces for puc(4) line.
o Use capitalized "Ethernet" for consistency.


139447 30-Dec-2004 jhb

Minor sync to i386 GENERIC in the form of comments and whitespace.


139345 27-Dec-2004 njl

MFi386: Restore cpu_reset proxy code to enable reset from ddb on an AP.


139344 27-Dec-2004 njl

Reduce diffs to i386.


139279 24-Dec-2004 imp

Get rid of #ifdef for legacy system. Move that into the MD code.
Export minimal symbols to allow this to happen.


139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


139143 21-Dec-2004 alc

Use vtopde() instead of pmap_pde() in pmap_kextract(); vtopde() is smaller
and faster in cases, such as pmap_kextract(), where the pde is known to
exist.


138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


138500 06-Dec-2004 peter

MFi386: rev 1.12: re-allow fast interrupts to cause preemption


138375 04-Dec-2004 alc

Replace (inlined) pmap_pte() calls with smaller, faster code where
possible, such as the inner loop of pmap_copy().

Remove two comments that apply to i386 but not amd64.


138304 02-Dec-2004 alc

For efficiency eliminate the call to pmap_pte() from pmap_protect()'s and
pmap_remove()'s inner loop. Instead, call pmap_pde_to_pte(), a new
function, prior to the inner loop.

Reviewed by: peter@, tegge@


138253 01-Dec-2004 marcel

Change gdb_cpu_setreg() to not take the value to which to set the
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.

This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.

Tested on: alpha, amd64, i386, ia64, sparc64


138237 30-Nov-2004 peter

Remove unused cnt variable for the SMP case. Trim some excessive blank
lines while here.


138212 30-Nov-2004 peter

Update the gdb register extraction support to use the pcb wherever
possible, like on i386. Registers are handled differently for caller
vs callee saved registers.


138208 29-Nov-2004 peter

MFi386: join the %cr0 setup line now that i386 has lost the I386 ifdefs.


138207 29-Nov-2004 peter

Take advantage of the shutdown processing being wired to the BSP and
eliminate the evil cpu_reset_proxy code now that it will never be
activated. i386 should pick this up as well.


138194 29-Nov-2004 scottl

Don't flag alignment constraints as a reason for bouncing. This fixes the
trigger for other misbehaviour in the sym driver that was causing freezes at
boot. Thanks to phk@ for reporting and testing this.


138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


137966 21-Nov-2004 scottl

Remove an extra #include


137964 21-Nov-2004 scottl

Consolidate all of the bounce tests into the BUS_DMA_COULD_BOUNCE flag.
Allocate the bounce zone at either tag creation or map creation to help
avoid null-pointer derefs later on. Track total pages per zone so that
each zone can get a minimum allocation at tag creation time instead of
being defeated by mis-behaving tags that suck up the max amount.


137917 20-Nov-2004 das

Remove references to U area and garbage collect includes.

Reviewed by: arch@


137914 20-Nov-2004 das

Remove UAREA_PAGES.

Reviewed by: arch@


137912 20-Nov-2004 das

U areas are going away, so don't allocate one for process 0.

Reviewed by: arch@


137893 19-Nov-2004 scottl

Revert part of rev 1.56. Tag boundaries are handled by splitting segments,
not through bouncing.


137499 10-Nov-2004 scottl

MFi386 rev 1.63-1.64:
Use tag-specific pools of bounce pages instead of a single global pool.


137262 05-Nov-2004 peter

MFi386 1.238 (jhb): Allow hints to disable cpus


137261 05-Nov-2004 peter

MFi386:
rev 1.61 (scottl): Add KTR tracing
rev 1.62 (scottl): Optimize (td->pmap, inlines, etc)


137165 03-Nov-2004 scottl

Don't use atomic ops to increment interrupt stats. This was only done on
amd64 and i386 anyways. The stats are only kept for informational purposes.


137137 02-Nov-2004 andre

Reduce annoying SCSI probing delay from 15 to 5 seconds in all GENRIC kernels.

Discussed on: -current


137117 01-Nov-2004 jhb

- Change the ddb paging "support" to use a variable (db_lines_per_page) to
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.

MFC after: 1 month
Inspired by: kris


137099 31-Oct-2004 des

Add TUNABLE_LONG and TUNABLE_ULONG, and use the latter for the
hw.pci.host_mem_start tunable. Add comments to TUNABLE_INT and
TUNABLE_QUAD recommending against their use.

MFC after: 3 weeks


137098 31-Oct-2004 des

Whitespace cleanup


137012 28-Oct-2004 simokawa

MFi386: preserve dcons buffer passed by loader.


136995 27-Oct-2004 peter

Raise MAXDSIZ from 8G to 32G. The old limit was just an arbitary choice
that was greater than 4G. I originally used the same values as i386 in
order to save opening a new PML4 page slot, but in the day of gigabytes
of memory, worrying about a 4K page seems futile. Moving from 8 to 32G
moves the page to a different index, it doesn't increase the number of
pages used.


136521 14-Oct-2004 njl

Print flags in the nexus for child devices.


136401 11-Oct-2004 peter

MFi386: sync with latest updates


136366 11-Oct-2004 njl

Move the code for halting the CPU (acpi_cpu_c1) into machdep files.
This removes the last MD portion of acpi_cpu.c.

MFC after: 2 weeks


136252 08-Oct-2004 alc

Make pte_load_store() an atomic operation in all cases, not just i386 PAE.

Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors. (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)

Reviewed by: tegge@
PR: i386/61852


136152 05-Oct-2004 jhb

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


136101 03-Oct-2004 alc

Undo revision 1.251. This change was a performance pessimizing work-around
that is no longer required. (In fact, it is not clear that it was ever
required in HEAD or RELENG_4, only RELENG_3 required a work-around.) Now,
as before revision 1.251, if the preexisting PTE is invalid, pmap_enter()
does not call pmap_invalidate_page() to update the TLB(s).

Note: Even with this change, the handling of a copy-on-write fault is
inefficient, in such cases pmap_enter() calls pmap_invalidate_page() twice.

Discussed with: bde@
PR: kern/16568


136070 03-Oct-2004 alc

The physical address stored in the vm_page is page aligned. There is no
need to mask off the page offset bits. (This operation made some sense
prior to i386/i386/pmap.c revision 1.254 when we passed a physical address
rather than a vm_page pointer to pmap_enter().)


136050 02-Oct-2004 alc

Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These
uses predate the change in the pmap_enter() interface that replaced the
page's physical address by the address of its vm_page structure. The
PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page
structure that was being passed in.


136049 02-Oct-2004 alc

Remove an unused declaration. (I should have included this change in
revision 1.486.)


135939 29-Sep-2004 alc

Prevent the unexpected deallocation of a page table page while performing
pmap_copy(). This entails additional locking in pmap_copy() and the
addition of a "flags" parameter to the page table page allocator for
specifying whether it may sleep when memory is unavailable. (Already,
pmap_copy() checks the availability of memory, aborting if it is scarce.
In theory, another CPU could, however, allocate memory between
pmap_copy()'s check and the call to the page table page allocator,
causing the current thread to release its locks and sleep. This change
makes this scenario impossible.)

Reviewed by: tegge@


135914 29-Sep-2004 peter

MFi386: rev 1.239 - invalidate tlb after pte update


135913 29-Sep-2004 peter

MFi386: rev 1.236 - improve panic message for a busted mptable


135691 24-Sep-2004 peter

Like on i386, use the definition of struct bios_smap from machine/pc/bios.h
again.


135690 24-Sep-2004 peter

Converge towards i386. I originally resisted creating <machine/pc/bios.h>
because it was mostly irrelevant - except for the silly BIOS_PADDRTOVADDR
etc macros. Along the way of working around this, I missed a few things.

* Make syscons properly inherit the bios capslock/shiftlock/etc state like
i386 does. Note that we cannot inherit the bios key repeat rate because
that requires a bios call (which is impossible for us).
* Give syscons the ability to beep on amd64. Oops.

While here, make bios.c compile and add it to files.amd64.


135689 24-Sep-2004 peter

Severely strip down the repocopied i386/bios.c and bios.h files. It turns
out that bios_sigsearch() etc is useful for finding tables in roms.


135565 22-Sep-2004 alc

Correct a long-standing error in _pmap_unwire_pte_hold() affecting
multiprocessors. Specifically, the error is conditioning the call to
pmap_invalidate_page() on whether the pmap is active on the current CPU.
This call must be unconditional. Regardless of whether the pmap is active
on the CPU performing _pmap_unwire_pte_hold(), it could be active on another
CPU. For example, a call to pmap_remove_all() by the page daemon could
result in a call to _pmap_unwire_pte_hold() with the pmap inactive on the
current CPU and active on another CPU. In such circumstances, failing to
call pmap_invalidate_page() results in a stale TLB entry on the other CPU
that still maps the now deallocated page table page. What happens next is
typically a mysterious panic in pmap_enter() by the other CPU, either
"pmap_enter: attempted pmap_enter on 4MB page" or "pmap_enter: pte vanished,
va: 0x%lx". Both occur because the former page table page has been recycled
and allocated to a new purpose. Consequently, it no longer contains zeroes.

See also Peter's i386/i386/pmap.c revision 1.448 and the related e-mail
thread last year.

Many thanks to the engineers at Sandvine for providing clear and concise
information until all of the pieces of the puzzle fell into place and
for testing an earlier patch.

MT5 Candidate


135561 22-Sep-2004 peter

MFi386: adapt rev 1.19 (debugger fixes)


135559 22-Sep-2004 peter

Minor sync-up with i386. Catch up on de-quoting and de-counting after
config changes.


135558 22-Sep-2004 peter

MFi386: add ispfw (except using correct device<tab><tab>ispfw format,
<space><tab> is for the options line)


135529 20-Sep-2004 jhb

- Add support for "paging" in stack trace output. That is, when you do
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.

MFC after: 1 month
Tested on: i386, alpha, sparc64


135479 19-Sep-2004 alc

Simplify the reference counting of page table pages. Specifically, use
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:

1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.

2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.

3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.

Reviewed by: tegge@


135452 19-Sep-2004 alc

Remove an outdated assertion from _pmap_allocpte(). (When vm_page_alloc()
succeeds, the page's queue field is unconditionally set to PQ_NONE by
vm_pageq_remove_nowakeup().)


135443 18-Sep-2004 alc

Release the page queues lock earlier in pmap_protect() and pmap_remove() in
order to reduce contention.


135262 15-Sep-2004 phk

Add new a function isa_dma_init() which returns an errno when it fails
and which takes a M_WAITOK/M_NOWAIT flag argument.

Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.

Problem uncovered by: tegge


135261 15-Sep-2004 phk

Remove now unused #include files.


135123 12-Sep-2004 alc

Use an atomic op to update the pte in pmap_protect(). This is to prevent
the loss of a page modified (PG_M) bit in a race between processors.

Quoting Tor:
One scenario where the old code could cause a lost PG_M bit is a
multithreaded linux program (or FreeBSD program using the
linuxthreads port) where one thread was starting a subprocess.
The thread doing fork() would call vmspace_fork(), which would then
call vm_map_copy_entry() which would call pmap_protect() on an area
possibly accessed by other threads.

Additionally, make the clearing of PG_M by pmap_protect() unconditional if
write permission is removed. Previously, PG_M could persist on a read-only
unmanaged page. That seems inconsistent and confusing.

In collaboration with: tegge@

MT5 candidate
PR: 61852


135065 11-Sep-2004 scottl

Double the number of kernel page tables for amd64 and for i386/PAE. The old
value was only enough for 8GB of RAM, the new value can do 16GB. This still
isn't optimal since it doesn't scale. Fixing this for amd64 looks to be
fairly easy, but for i386 will be quite difficult.

Reviewed by: peter


135048 10-Sep-2004 wpaul

Add device driver support for the VIA Networking Technologies
VT6122 gigabit ethernet chip and integrated 10/100/1000 copper PHY.
The vge driver has been added to GENERIC for i386, pc98 and amd64,
but not to sparc or ia64 since I don't have the ability to test
it there. The vge(4) driver supports VLANs, checksum offload and
jumbo frames.

Also added the lge(4) and nge(4) drivers to GENERIC for i386 and
pc98 since I was in the neighborhood. There's no reason to leave them
out anymore.


134960 08-Sep-2004 alc

Use atomic ops in pmap_clear_ptes() to prevent SMP races that could
result in the loss of an accessed or modified bit from the pte.

In collaboration with: tegge@

MT5 candidate


134934 08-Sep-2004 scottl

Fix a problem with tag->boundary inheritence that has existed since day one
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.

This is a MT5 candidate.

Reviewed by: marcel
Submitted by: sos (in part)


134917 07-Sep-2004 scottl

Switch the default scheduler to 4BSD to match what will go into RELENG_5 soon.
It can be switched back once 5.3 is tested and released. Also turn on
PREEMPTION as many of the stability problems with it have been fixed.

MT5: 3 days.


134791 05-Sep-2004 julian

Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week


134649 02-Sep-2004 scottl

Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c). This
is a possible MT5 candidate.


134591 01-Sep-2004 julian

Give the 4bsd scheduler the ability to wake up idle processors
when there is new work to be done.

MFC after: 5 days


134586 01-Sep-2004 julian

Give setrunqueue() and sched_add() more of a clue as to
where they are coming from and what is expected from them.

MFC after: 2 days


134571 31-Aug-2004 julian

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


134568 31-Aug-2004 julian

Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after: 2 days


134553 30-Aug-2004 peter

Add the mp_watchdog hooks, although it locks up my SMP test box. It might
be useable to somebody.


134509 30-Aug-2004 alc

Remove unnecessary check for curthread == NULL.


134416 28-Aug-2004 obrien

s/smp_rv_mtx/smp_ipi_mtx/g

Requested by: jhb


134406 27-Aug-2004 arved

Fix a comment, IA32 was renamed to COMPAT_IA32

Approved by: marcel


134398 27-Aug-2004 marcel

Move the kernel-specific logic to adjust frompc from MI to MD. For
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
instruction of a functions implementation. It holds the address
of a function descriptor. Hence the user(), btrap(), eintr() and
bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
be layed-out contiguously. This can not be achieved on ia64 and is
generally just bad programming.

The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.

This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...

Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386


134393 27-Aug-2004 alc

The machine-independent parts of the virtual memory system always pass a
valid pmap to the pmap functions that require one. Remove the checks for
NULL. (These checks have their origins in the Mach pmap.c that was
integrated into BSD. None of the new code written specifically for
FreeBSD included them.)


134383 27-Aug-2004 andre

Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over. This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.


134269 24-Aug-2004 jhb

Correct the arguments to kern_sigaltstack() as they were reversed.

PR: kern/68079
Submitted by: Georg-W. Koltermann gwk at rahn-koltermann dot de


134263 24-Aug-2004 njl

Catch up with i386 nexus.c rev 1.59: add bus_get_resource_list().


134233 24-Aug-2004 peter

It is now an error to call pmap_unuse_pt without the paddr of the pde
that contained the pte.


134232 24-Aug-2004 peter

Oops, I forgot to have the idle loop call mp_grab_cpu_hlt() on the amd64
SMP case.


134227 23-Aug-2004 peter

Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock.
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery. For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.

This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery. Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.

Obtained from: dwhite, alc
Reviewed by: jhb


133907 16-Aug-2004 peter

Sync with i386 - Optimize intr_execute_handlers a bit etc.


133906 16-Aug-2004 peter

Sync with i386 - remove unused includes


133905 16-Aug-2004 peter

Sync with i386 - get the softc via the devclass rather than caching the dev


133904 16-Aug-2004 peter

Sync with i386 - add ADAPTIVE_GIANT, remove pcic


133903 16-Aug-2004 peter

Sync with i386 - add foot shooting protection for the DDB/KDB thing.


133902 16-Aug-2004 peter

Sync with i386 - set rbp reg to 0 for upcalls as a frame marker, not that
it is guaranteed to be used in userland though.


133901 16-Aug-2004 peter

Sync with i386 - trace syscall entry/exit times, and a cosmetic fix.


133899 16-Aug-2004 peter

Sync with i386 - fix bounds check in lapic_create()


133898 16-Aug-2004 peter

Sync with i386 - pass resource requests up to parent


133897 16-Aug-2004 peter

Sync with i386 - s/cpu_swtch/cpu_switch/


133896 16-Aug-2004 peter

Sync with i386 - dont count needed bounce pages if loading a buffer that
was created with bud_dmamem_alloc()


133894 16-Aug-2004 peter

Sync with i386 - cosmetic fixes


133893 16-Aug-2004 peter

Catch up with i386 - remove lots of no longer used symbolic constants


133892 16-Aug-2004 peter

Sync with i386


133854 16-Aug-2004 obrien

Complete 'IA32' -> 'COMPAT_IA32' change for the Linuxulator32.


133853 16-Aug-2004 tjr

Un-comment LINPROCFS.


133846 16-Aug-2004 obrien

I missed an 'IA32' in the documentation.


133844 16-Aug-2004 obrien

I'm not sure what tjr envisioned for turning on FreeBSD/i386 rt support,
but make it COMPAT_IA32 for now.
Fix the 'DEBUG' argument code to unbreak the amd64 LINT build.


133843 16-Aug-2004 obrien

Fix the 'DEBUG' argument code to unbreak the amd64 LINT build.


133820 16-Aug-2004 tjr

Regen.


133819 16-Aug-2004 tjr

Add preliminary support for running 32-bit Linux binaries on amd64, enabled
with the COMPAT_LINUX32 option. This is largely based on the i386 MD Linux
emulations bits, but also builds on the 32-bit FreeBSD and generic IA-32
binary emulation work.

Some of this is still a little rough around the edges, and will need to be
revisited before 32-bit and 64-bit Linux emulation support can coexist in
the same kernel.


133770 15-Aug-2004 rwatson

Preemptive anti-footshooting: cause a #error if MP_WATCHDOG is compiled
with SCHED_ULE.


133759 15-Aug-2004 rwatson

Add an "options MP_WATCHDOG" to i386. This option allows one of the
logical CPUs on a system to be used as a dedicated watchdog to cause a
drop to the debugger and/or generate an NMI to the boot processor if
the kernel ceases to respond. A sysctl enables the watchdog running
out of the processor's idle thread; a callout is launched to reset a
timer in the watchdog. If the callout fails to reset the timer for ten
seconds, the watchdog will fire. The sysctl allows you to select which
CPU will run the watchdog.

A sample "debug.leak_schedlock" is included, which causes a sysctl to
spin holding sched_lock in order to trigger the watchdog. On my Xeons,
the watchdog is able to detect this failure mode and break into the
debugger, which cannot otherwise be done without an NMI button.

This option does not currently work with sched_ule due to ule's push
notion of scheduling, similar to machdep.hlt_logical_cpus failing to
work with that scheduler.

On face value, this might seem somewhat inefficient, but there are a
lot of dual-processor Xeons with HTT around, so using one as a watchdog
for testing is not as inefficient as one might fear.


133672 13-Aug-2004 ambrisko

Fix the memory scaling bug when basemem was converted to Kbytes from
bytes for AMD64. Otherwise the AP will be started at 640K which
won't work. Bug found on a Xeon 64bit system.


133529 11-Aug-2004 davidxu

Mark end of frames.


133464 11-Aug-2004 marcel

Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc


133431 10-Aug-2004 davidxu

As AMD64 architecture volume 1 chapter 3.1.2 says, high 32 bits of %rflags
are resevered, they can be written with anything, but they always read
as zero, we should simulate it in set_regs() as we are reading/writting
real hardware %rflags register.


133413 09-Aug-2004 davidxu

In syscall, always make a copy of parameters from trapframe, this
becauses some syscalls using set_mcontext can sneakily change
parameters and later when those syscalls references parameters,
they will wrongly use register values in mcontext_t.

Approved by: peter


133292 08-Aug-2004 alc

With the advent of pmap locking it makes sense for pmap_copy() to be less
forgiving about inconsistencies in the source pmap. Also, remove a new-
line character terminating a nearby panic string.


133255 07-Aug-2004 scottl

Move the definition of M_MEMDESC to a non-optional file. This allows
kernels configurations without the 'mem' device to compile.


133250 07-Aug-2004 alc

Eliminate a variable that became unused in the i386 to amd64 conversion.


133195 06-Aug-2004 markm

MFi386: Fix mem device. Grrr.


133194 06-Aug-2004 markm

MFi386: sort out the mem device. Grrrr.


133129 04-Aug-2004 markm

Fix module builds for i386 and amd64.


133124 04-Aug-2004 alc

Post-locking clean up/simplification, particularly, the elimination of
vm_page_sleep_if_busy() and the page table page's busy flag as a
synchronization mechanism on page table pages.

Also, relocate the inline pmap_unwire_pte_hold() so that it can be used
to shorten _pmap_unwire_pte_hold() on alpha and amd64. This places
pmap_unwire_pte_hold() next to a comment that more accurately describes
it than _pmap_unwire_pte_hold().


133087 03-Aug-2004 markm

Making a loadable null.ko for /dev/(null|zero) proved rather
unpopular, so remove this (mis)feature.

Encouragement provided by: jhb (and others)


133084 03-Aug-2004 mux

Instead of calling ia32_pause() conditionally on __i386__ or __amd64__
being defined, define and use a new MD macro, cpu_spinwait(). It only
expands to something on i386 and amd64, so the compiled code should be
identical.

Name of the macro found by: jhb
Reviewed by: jhb


133061 03-Aug-2004 dfr

Add style(9) foolishness.


133035 02-Aug-2004 markm

Diff reduction WRT i386 version.


133027 02-Aug-2004 dfr

Add definitions for TLS relocations.


132992 02-Aug-2004 obrien

Fix the build by providing 'PHYS_TO_DMAP' and 'M_MEMDESC'.


132972 01-Aug-2004 markm

Add the I/O device for those architectures that have it.


132961 01-Aug-2004 scottl

Turn off PREEMPTION by default while it gets debugged. It's been causing
4 weeks of problems including deadlocks and instant panics. Note that the
real bugs are likely in the scheduler.


132956 01-Aug-2004 markm

Break out the MI part of the /dev/[k]mem and /dev/io drivers into
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.


132924 31-Jul-2004 davidxu

Turn on PCB_FULLCTX for set_mcontext, functions like kse_switchin
needs to fully restore asynchronous context which did not come
from fast syscall.


132919 31-Jul-2004 alc

Add pmap locking to pmap_object_init_pt().


132888 30-Jul-2004 ps

MFia64:
Fix -O builds with gcc 3.4 by defining ffs as __builtin_ffs instead
of creating an inline function that just calls __builtin_ffs.


132852 29-Jul-2004 alc

Advance the state of pmap locking on alpha, amd64, and i386.

- Enable recursion on the page queues lock. This allows calls to
vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
queues lock held. Such calls are made to allocate page table pages
and pv entries.
- The previous change enables a partial reversion of vm/vm_page.c
revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
- Add partial locking to pmap_copy(). (As a side-effect, pmap_copy()
should now be faster on i386 SMP because it no longer generates IPIs
for TLB shootdown on the other processors.)
- Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now,
all changes to a user-level pmap on alpha, amd64, and i386 are performed
with appropriate locking.)


132846 29-Jul-2004 kan

Use newly added __used attribute to keep static function symbol from
being eliminated.


132808 28-Jul-2004 phk

Move a relic to its correct location(s): Put nfs diskless initialization
calls with the code they call. (Yet another example of mindless copy&paste).


132700 27-Jul-2004 rwatson

Pass a thread argument into cpu_critical_{enter,exit}() rather than
dereference curthread. It is called only from critical_{enter,exit}(),
which already dereferences curthread. This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.

Head nodding: jhb, bmilekic


132556 22-Jul-2004 imp

Remove ahb, aha, ie, le and wl devices. They are all ISA/EISA only.
I went ahead and left in the ISA cards that also have pccard
attachments. There's no way that these devices could attach.

OK'd by: peter


132555 22-Jul-2004 imp

There is no pcic device on amd64. OLDCARD isn't supported, and
NEWCARD will call it something different. and there are no ISA add-in
devices.


132482 21-Jul-2004 marcel

Unify db_stack_trace_cmd(). All it did was look up the thread given
the thread ID and call db_trace_thread().
Since arm has all the logic in db_stack_trace_cmd(), rename the
new DB_COMMAND function to db_stack_trace to avoid conflicts on
arm.
While here, have db_stack_trace parse its own arguments so that
we can use a more natural radix for IDs. If the ID is not a thread
ID, or more precisely when no thread exists with the ID, try if
there's a process with that ID and return the first thread in it.
This makes it easier to print stack traces from the ps output.

requested by: rwatson@
tested on: amd64, i386, ia64


132427 20-Jul-2004 alc

Remove the allpmaps list. It's unused.

Reviewed by: peter@


132408 19-Jul-2004 jhb

As a temporary hack, turn off deferred preemptions that are the result of
a fast interrupt handler doing an swi_sched(). This fixed the lockups I
saw on my laptop when using xmms in KDE and on rwatson's MySQL benchmarks
on SMP. This will eventually be removed and/or modified when I figure out
what the root cause is and fix that.


132383 19-Jul-2004 das

Make FLT_ROUNDS correctly reflect the dynamic rounding mode.


132353 18-Jul-2004 scottl

Enable ADAPTIVE_MUTEXES by default by changing the sense of the option to
NO_ADAPTIVE_MUTEXES. This option has been enabled by default on amd64 for
quite some time, and has been extensively tested on i386 and sparc64. It
shows measurable performance gains in many circumstances, and few negative
effects. It would be nice in t he future if adaptive mutexes actually went
to sleep after a certain amount of spinning, but that will require quite a
bit more testing.


132345 18-Jul-2004 maxim

In -CURRENT pseudo devices are not statically assigned at compile time,
remove a stale comment.

PR: kern/62285


132269 16-Jul-2004 ps

Fix the build. pcm is no more.


132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


132141 14-Jul-2004 peter

Like on i386, eliminate pv_ptem (which was suggested by alc). This
reduces the size of the pv_entry structure a small but significant amount.

This is implemented a little differently because it isn't so cheap to get
the physical address of the page tabke page on amd64.. instead of it
being directly accessible from the top level page directory, it is now
two additional tree levels down. However.. In almost all cases, we
recently had the physical address if the page table page a short while
before we needed it, but it slipped through our fingers. This patch
saves it for when we do need it. Also, for the one case where we do not
have the ptp paddr, we are always running in curproc context and so we
can do a vtopte-like trick. I've implemented vtopde() for this purpose.

There is still a CYA entry in pmap_unuse_pt() that needs to be removed. I
think it can be removed now but I forgot to test with it gone.


132088 13-Jul-2004 davidxu

Add ptrace_clear_single_step(), alpha already has it for years, the function
will be used by ptrace to clear a thread's single step state.


132082 13-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_remove_pages(). (The implementation of pmap_remove_pages() is
optional. If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().


131992 11-Jul-2004 marcel

MFi386: rev 1.213 -- fix DELAY while the debugger is active.
This also fixes the (runtime) breakage introduced in the previous
commit that was the result of a botched merge. This hasn't even
been compile-tested...


131969 11-Jul-2004 marcel

Add options KDB and GDB. KDB takes on the function of what DDB used
to be. Both DDB and GDB specify which KDB backends to include.


131960 11-Jul-2004 marcel

Remove the now unused GDB stubs. See src/sys/gdb/* for the new KDB
backend.


131952 10-Jul-2004 marcel

Mega update for the KDB framework: turn DDB into a KDB backend.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.

The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.

With this change, ia64 has support for breakpoints.


131943 10-Jul-2004 marcel

MFi386: don't fake the time counter when the debugger is active.
This breaks the fundamental property of DELAY(). Instead, avoid
grabbing clock_lock when kdb_active is non-zero.


131942 10-Jul-2004 marcel

Remove obsolete prototype of kdb_trap().


131941 10-Jul-2004 marcel

Update for the KDB framework:
o Make debugging support conditional upon KDB instead of DDB.
o Remove implementation of Debugger().
o Don't make setjump() and longjump() conditional upon DDB.
o s/ddb_on_nmi/kdb_on_nmi/g
o Call kdb_reenter() when kdb_active is non-zero. Call kdb_trap()
otherwise.


131905 10-Jul-2004 marcel

Implement makectx(). The makectx() function is used by KDB to create
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.


131903 10-Jul-2004 marcel

Introduce the KDB debugger frontend. The frontend provides a framework
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.


131899 10-Jul-2004 marcel

Introduce the GDB debugger backend for the new KDB framework. The
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.


131840 08-Jul-2004 brian

Change the following environment variables to kernel options:

bootp -> BOOTP
bootp.nfsroot -> BOOTP_NFSROOT
bootp.nfsv3 -> BOOTP_NFSV3
bootp.compat -> BOOTP_COMPAT
bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit. It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone


131814 08-Jul-2004 brian

Change the following kernel options to environment variables:

BOOTP -> bootp
BOOTP_NFSROOT -> bootp.nfsroot
BOOTP_NFSV3 -> bootp.nfsv3
BOOTP_COMPAT -> bootp.compat
BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

bootp="YES"
bootp.nfsroot="YES"
bootp.nfsv3="YES"
bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.


131778 08-Jul-2004 peter

MFi386: various io apic cleanups


131777 08-Jul-2004 peter

MFi386: use rman access methods instead of groping around inside
struct resource


131776 08-Jul-2004 peter

MFi386: whitespace nit fix (spare blank line)


131775 08-Jul-2004 peter

MFi386: fix up CR0 settings


131774 08-Jul-2004 peter

MFi386: 1.57: transparently respect alignment/boundary tags


131744 07-Jul-2004 alc

Simplify the control flow in pmap_extract(), enabling the elimination of a
PMAP_UNLOCK() call.


131730 07-Jul-2004 alc

White space and style changes only.


131666 06-Jul-2004 alc

Style changes to pmap_extract().


131481 02-Jul-2004 jhb

Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by: scottl (with his re@ hat)


131359 30-Jun-2004 imp

We need to make resources visible here as well.


131312 30-Jun-2004 njl

Add machdep quirks functions. On i386, this disables acpi on systems with
BIOS dates earlier than Jan 1, 1999. Add prototypes and quirks flags.


130983 23-Jun-2004 jhb

Fetch the actual acpi0 device_t and use device_is_attached() to see if
it's alive rather than trying to fetch its softc pointer via its devclass.

Glanced at by: imp, njl


130958 23-Jun-2004 alc

Implement the protection check required by the pmap_extract_and_hold()
specification. This enables the elimination of Giant from that function.


130814 20-Jun-2004 alc

- Simplify pmap_remove_pages(), eliminating unnecessary indirection.
- Simplify the locking of pmap_is_modified() by converting control flow to
data flow.


130765 20-Jun-2004 alc

Add pmap locking to pmap_is_prefaultable().


130764 20-Jun-2004 bde

Backed out previous commit. Blind substitution of dev_t by `struct cdev *'
was just wrong here because the dev_t's are user dev_t's.


130738 19-Jun-2004 alc

Remove unused pt_entry_ts. Remove an unneeded semicolon.


130731 19-Jun-2004 bde

Include <sys/_lock.h>'s prerequisite <sys/queue.h> before including the
former, not after.

Don't hide this bug by including <sys/queue.h> in <sys/_lock.h>.


130667 18-Jun-2004 peter

Try harder to give new processes a clean initial fpu state. fpu_cleanstate
wasn't actually clean, it was saving the xmm registers as left over by the
bios. fninit() doesn't clear those.

In fpudna(), instead of doing a fninit() and forgetting to load the initial
mxcsr, do a full fxrstor(&fpu_cleanstate). Otherwise we hand over whatever
random values are left in the xmm registers by the last user.

I'm not certain of whether this is excessive paranoia or not, but there was
an outright bug in neglecting to set the mxcsr value that caused awk to
SIGFPE in some case. Especially for Tim Robbins. :-)

i386 probably should do something about the mxcsr setings too.

Found by: tjr


130641 17-Jun-2004 njl

Revert last change. If acpi is loaded or compiled into the kernel, its
devclass will be present even if the driver was disabled by a hint. Using
device_get_softc() provides the right info even if it's overkill.

Explained by: jhb


130626 17-Jun-2004 alc

Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not
accessible through an object. Thus, PG_BUSY serves no purpose.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130577 16-Jun-2004 alc

Add some lock assertions. Lock a small part of pmap_enter().


130553 16-Jun-2004 alc

Correct an error in the implementation of pmap_is_prefaultable(). When I
introduced this function in revision 1.441, I inverted one of the
comparisons.


130539 15-Jun-2004 alc

Remove a stale comment.


130520 15-Jun-2004 alc

Add pmap locking to pmap_extract(), pmap_mincore(), and pmap_remove().


130510 15-Jun-2004 njl

We only need the devclass_find() result, not the softc.


130444 14-Jun-2004 alc

Introduce pmap locking to many of the pmap functions. There is more to
come later.


130441 13-Jun-2004 obrien

The majority of FreeBSD/amd64 machines are SMP, so use ADAPTIVE_MUTEXES
by default to improve performance.


130433 13-Jun-2004 alc

Prevent the loss of a PG_M bit through an SMP race in pmap_ts_referenced().


130427 13-Jun-2004 alc

Remove dead or unneeded code, e.g., spl calls.


130399 13-Jun-2004 alc

- Remove an unused declaration.
- Move a definition inside the scope of a #ifdef _KERNEL.


130386 12-Jun-2004 alc

In a multiprocessor, the PG_W bit in the pte must be changed atomically.
Otherwise, the setting of the PG_M bit by one processor could be lost if
another processor is simultaneously changing the PG_W bit.

Reviewed by: tegge@


130344 11-Jun-2004 phk

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


130322 10-Jun-2004 peter

Argh. Add the mini-stack-frame back in for mcount's benefit for syscall
stubs.


130321 10-Jun-2004 peter

Make profiling work for varargs functions.. %al is an additional argument
which indicates the number of xmm registers used in the varargs. This
stops the explosion that happened when profiling printf() etc.


130315 10-Jun-2004 peter

Insta-MFi386: ignore disabled cpu apic id's entirely


130313 10-Jun-2004 jhb

- Use the correct devclass name ("acpi" vs "ACPI") to detect if acpi0 is
present and thus that the PnPBIOS probe should be skipped instead of
having ACPI zero out the PnPBIOStable pointer.
- Make the PnPBIOStable pointer static to i386/i386/bios.c now that that is
the only place it is used.


130312 10-Jun-2004 jhb

Remove atdevbase and replace it's remaining uses with direct references to
KERNBASE instead.


130229 08-Jun-2004 peter

In pmap_extract_and_hold(), there is no need to mask off PG_FRAME because
pmap_extract() already does it.
In pmap_enter(), opa has already been masked so don't do it again.
Wrap a long line (recent transgression).
Use trunc_page() in pmap_mapdev() instead of anding with PG_FRAME, since
that is what we really meant.

Submitted by: alc (first item)


130228 08-Jun-2004 peter

Fix my silly typo in asm statement in previous commit.


130227 08-Jun-2004 peter

Argh. Remove stray number that slipped into the previous commit.


130226 08-Jun-2004 peter

Reapply rev 1.151 after enable sse/fpuinit order fixed in mp_machdep.c

Obtained from: das


130225 08-Jun-2004 peter

Set up the fpu *after* enabling SSE mode on AP's

Submitted by: (argh, I can't find the email)


130224 08-Jun-2004 peter

Initial PG_NX support (no-execute page bit)
- export the rest of the cpu features (and amd's features).
- turn on EFER_NXE, depending on the NX amd feature bit
- reorg the identcpu stuff a bit in order to stop treating the
amd features as second class features (since it is now a primary feature
bit set) and make it easier to export.


130223 08-Jun-2004 peter

Mask pte's with PG_FRAME before passing it to PHYS_TO_VM_PAGE().. PG_NX
lives in the top 12 'available' bits. atop() in the PHYS_TO_VM_PAGE()
macro only masks off the lower bits (by accident) and the upper bits
in the 64 bit ptes turn into "interesting" index values.


130221 08-Jun-2004 peter

Use trunc_page(va) when we mean it rather than anding it with PG_FRAME
(which doesn't work all that well when there are bits at the top that are
masked by PG_FRAME)


130219 07-Jun-2004 peter

Fix a serious problem that manifested during swap, and a few other times.
pmap_remove() would be called with a huge range and we'd stride across
it in only 2MB chunks. This would manifest as massive cpu time and a
largely unresponsive system during hard swap. Instead, check the higher
page directories which means we can run pmap_remove() in just a few
hundred loop iterations instead of millions since we can process
address space in chunks of 512GB and 1GB as well as 2MB.

Eternal thanks to: tmm


130218 07-Jun-2004 peter

Be a little more consistent in the naming of the PML4 defines.


130140 06-Jun-2004 das

Back out revision 1.150, since dwmalone reports that it causes a panic
upon startup on his machine.


130105 05-Jun-2004 das

Initialize the MXCSR to the appropriate default value at startup.

Tested on: tjr


130040 03-Jun-2004 phk

Add new bios_string() which will hunt for a string inside a given range
of the BIOS. This can be used for finding arbitrary magic in the BIOS
in order to recognize particular platforms.


130037 03-Jun-2004 peter

MFi386: add ixgp device


130035 03-Jun-2004 peter

MFi386: apic intpin programming updates etc.


130034 03-Jun-2004 peter

MFi386: remove debug printf


130033 03-Jun-2004 peter

Move module.h include to the same place as on i386 for diff reduction.


130032 03-Jun-2004 peter

MFi386: move cpu_nameclass struct next to its only consumer


130028 03-Jun-2004 tjr

Remove checks for curthread == NULL - it can't happen.


130025 03-Jun-2004 phk

Add missing <sys/module.h> instances which were shadowed by the nested
include in <sys/kernel.h>


130023 03-Jun-2004 tjr

Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by: jhb


129989 02-Jun-2004 tjr

Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by: davidxu


129876 30-May-2004 phk

Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>


129860 30-May-2004 alc

MFi386 revision 1.6
Reenable ithread preemption for interrupts that occur while executing in
the kernel.


129825 29-May-2004 tjr

Implement __bb_init_func. This is a fairly straightforward conversion
of the i386 version.


129817 28-May-2004 alc

Remove a broken micro-optimization from pmap_enter(). The ill effect
of this micro-optimization occurs when we call pmap_enter() to wire an
already mapped page. Because of the micro-optimization, we fail to
mark the PTE as wired. Later, on teardown of the address space,
pmap_remove_pages() destroys the PTE before vm_fault_unwire() has
unwired the page. (pmap_remove_pages() is not supposed to destroy
wired PTEs. They are destroyed by a later call to pmap_remove().)
Thus, the page becomes lost.

Note: The page is not lost if the application called munlock(2), only
if it relies on teardown of the address space to unwire its pages.

For the historically inclined, this bug was introduced by a
megacommit, revision 1.182, roughly six years ago.

Leak observed by: green@ and dillon independently
Patch submitted by: dillon at backplane dot com
Reviewed by: tegge@
MFC after: 1 week


129750 26-May-2004 tmm

Retire cpu_sched_exit(); it is not used any more.


129744 26-May-2004 bde

Quick fix for overflow when tsc_freq >= 2^31. "int profrate" in struct
gmon and struct gmonhdr was originally just to represent the kernel
(profiling) clock frequency and it remains poorly suited to representing
the frequencies of fast counters like the TSC. It broke a year or two
ago. This quick fix keeps it working for another year or month or two
until TSC frequencies can exceed 2^32, by dividing the frequency by 2.
Dividing the frequency by 4 would work for a little longer but would
lose a little too much precision.


129656 24-May-2004 bde

Oops, ".align 4" for the data section in the previous commit should
have been ".p2align 4". This bug is cosmetic since the data section
happens to be empty.


129653 24-May-2004 bde

Fixed profiling of trap, syscall and interrupt handlers and some
ordinary functions, essentially by backing out half of rev.1.115 of
amd64/exception.S. The handlers must be between certain labels for
the purposes of profiling, and this was broken by scattering them in
separately compiled .S files, especially for ordinary functions that
ended up between the labels. Merge the files by #including them as
before, except with different pathnames and better comments and
organization. Changes to the scattered files are minimal -- just
move the labels to the file that does the #includes.

This also partly fixes profiling of IPIs -- all IPI handlers are now
correctly classified as interrupt handlers, but many are still missing
mcount calls.


129649 24-May-2004 bde

Don't repeat the definition of IDTVEC(). It is in asmacros.h.


129630 23-May-2004 bde

Added profiling support for Xint0x80_syscall.


129625 23-May-2004 bde

Adjusted for amd64 after repo-copy. The adjustments are routine, except:
- perfmon headers must be avoided until perfmon is supported.
- all call-used registers including return registers must be preserved
by .mcount(), etc., not quite as in profile.h. __cyg_profile_func_*()
don't require this, but they are (mis)implemented as aliases for
.mcount(), etc. so they preserve the registers.
- i386 ifdefs related to perfmon have not been adjusted yet.


129623 23-May-2004 bde

Restored FAKE_MCOUNT() and MEXITCOUNT invocations and adjusted them for
amd64 as necessary. This is routine, except:
- the FAKE_MCOUNT($bintr) in doreti was missing the '$'. This gave a
a garbage address made up of padding bytes (with the nop byte 0x90 as
the MSB) instead of the intended address of bintr. This accidentally
worked on i386's because (0x90 << 24) is close enough to bintr, but
it doesn't work on amd64's because (0x90 << 56) is much further away
from bintr.
- the FAKE_MCOUNT($btrap) in calltrap was similarly broken. It hasn't
been needed since FreeBSD-1, so just delete it.


129618 23-May-2004 bde

Adjusted FAKE_MCOUNT()s for amd64. This is needed for both ordinary
and high resolution profiling of interrupt handlers. The adjustments
are routine once the magic stack offset 13*4 is decoded to be TF_RIP
(there were originally more types of stack frames so using TF_EIP for
one of them wouldn't have been much simpler).

Removed garbage comments attached to some of the FAKE_MCOUNT()s.


129612 23-May-2004 bde

Spell "retq" as "ret" in pagezero() like it is everywhere, else so
that the usual macro for "ret" hides the detail of calling .mexitcount
before returning.

Fixed missing call to .mexitcount in lgdt(). This was missing on
i386's, mainly because lgdt() uses lret[q] insted of ret. This is
very unimportant since lgdt() is not (normally?) called until after
profiling is initialized.


129551 21-May-2004 bde

MFi386 (1.103 and 1.104: fixed some problems in high resolution profiling
and improved some comments). Also, made the documented {f,s}uword()
functions the standard entry points and the undocumented {f,s}uword64()
functions alternative entry points, like {f,s}uword32() for i386's. The
bitrot in the comments was a little larger here -- there are new undocumented
32-bit sub-word functions, not just renaming of 16-bit functions from
documented ones to undocumented ones.


129499 20-May-2004 bde

MFi386 (1.37: GUPROF calibration macros; only routine adjustments needed).


129460 19-May-2004 peter

Like on i386, clear the last three entries in the pml4 page when doing a
pmap_release(), and put it the free queue marked as already zeroed.


129446 19-May-2004 bde

Fixed the type of fptrdiff_t. It needs to be 64 bits in theory, and in
practice too since kernel addresses are almost 2^64 higher than most
user addresses.


129445 19-May-2004 bde

Fixed some style bugs (mainly misalignment of backslashes).


129444 19-May-2004 bde

Moved most of the "MI" definitions and declarations from <machine/profile.h>
to <sys/gmon.h>. Cleaned them up a little by not attempting to ifdef
for incomplete and out of date support for GUPROF in userland, as in
the sparc64 version.


129412 19-May-2004 peter

Unbreak builds without DDB. Bad Bruce! No cookie! :-)


129408 18-May-2004 peter

The 'call mcount' hooks that gcc inserts when profiling are in a place that
cannot handle the scratch registers being trashed. So we have to preserve
them ourselves.


129393 18-May-2004 stefanf

<stdint.h> should define WINT_M{AX,IN} independent from whether WCHAR_MIN is
defined. Otherwise first including <wchar.h> and then <stdint.h> leads to no
WINT_M{AX,IN} at all.

PR: 64956
Approved by: das (mentor)


129373 18-May-2004 bde

Fixed DDB_NOKLDSYM on amd64's:

machdep.c:
Initialize the symbol table pointers, not quite like for other arches.

db_elf.c:
Don't claim to be an i486 in the fake ELF header.


129366 17-May-2004 peter

Turn on modules for amd64. Fear.


129361 17-May-2004 peter

Deal with REL records that have the addend embedded variable sized targets
rather than the RELA table. I dont know if bintutils will ever generate
REL records, but just in case.....


129309 16-May-2004 peter

Checkpoint some of what I was starting to tinker with for having some
different context support for 32 vs 64 bit processes. This simply omits
the save/restore of the segment selector registers for non 32 bit
processes. This avoids the rdmsr/rwmsr juggling when restoring %gs
clobbers the kernel msr that holds the gsbase.

However, I suspect it might be better to conditionally do this at
user<->kernel transition where we wouldn't need to do the juggling in the
first place. Or have per-thread extended context save/restore hooks.


129305 16-May-2004 peter

Kill the LAZYPMAP ifdefs. While they worked, they didn't do anything
to help the AMD cpus (which have a hardware tlb flush filter). I held
off to see what the 64 bit Intel cpus did, but it doesn't seem to help
much there either. Oh well, store it in the Attic.


129293 16-May-2004 peter

Converge some more with i386.


129288 16-May-2004 peter

MFi386: add rue and twa


129287 16-May-2004 peter

MFi386: avoid partial register references, for what its worth.


129286 16-May-2004 peter

For consistency with i386, have pmap_kenter_temporary() take a vm_paddr_t
argument. It is actually the same type on amd64 (vm_paddr_t = vm_offset_t)
but this reduces the i386<->amd64 diffs a little.


129284 16-May-2004 peter

MFi386: numerous interrupt and acpi updates


129282 16-May-2004 peter

Make a small revision to the api between the elf linker core and the
elf_reloc() backends for two reasons. First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).


128990 06-May-2004 njl

Make unnecessary globals static and remove unused includes.

Pointed out by: cscout


128979 05-May-2004 njl

Add an MI implementation of the ACPI global lock routines and retire the
individual asm versions. The global lock is shared between the BIOS and
OS and thus cannot use our mutexes. It is defined in section 5.2.9.1 of
the ACPI specification.

Reviewed by: marcel, bde, jhb


128928 04-May-2004 jhb

Add a simple mini-driver for the ELCR register. Originally, the ELCR
register controlled the trigger mode and polarity of EISA interrupts.
However, it appears that most (all?) PCI systems use the ELCR to manage
the trigger mode and polarity of ISA interrupts as well since ISA IRQs used
to route PCI interrupts need to be level triggered with active low
polarity. We check to see if the ELCR exists by sanity checking the value
we get back ensuring that IRQS 0 (8254), 1 (atkbd), 2 (the link from the
slave PIC), and 8 (RTC) are all clear indicating edge trigger and active
high polarity.

This mini-driver will be used by the atpic driver to manage the trigger and
polarity of ISA IRQs. Also, the mptable parsing code will use this mini
driver rather than examining the ELCR directly.


128845 02-May-2004 marcel

Add option GEOM_GPT. This brings the ability to have a large number of
partitions on a single disk.


128838 02-May-2004 obrien

Spell Ethernet correctly.


128629 25-Apr-2004 das

Hide FLT_EVAL_METHOD and DECIMAL_DIG in pre-C99 compilation
environments.

PR: 63935
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>


128508 21-Apr-2004 njl

Don't check for NULL, device_get_softc() always succeeds.


128384 18-Apr-2004 alc

Simplify the sf_buf implementation. In short, make it a trivial veneer
over the direct virtual-to-physical mapping.


128297 16-Apr-2004 alc

Set the "global" attribute on the page table entries for the kernel and
direct mappings. This shaves a few seconds off of my buildworld times.

Discussed with: peter@


128100 11-Apr-2004 alc

- is_physical_memory()'s parameter, which is a physical address, should be
a vm_paddr_t not a vm_offset_t.


128097 10-Apr-2004 alc

- pmap_kenter_temporary() is unused by machine-independent code. Therefore,
move its declaration to the machine-dependent header file on those
machines that use it. In principle, only i386 should have it.
Alpha and AMD64 should use their direct virtual-to-physical mapping.
- Remove pmap_kenter_temporary() from ia64. It is unused. Approved
by: marcel@


127974 07-Apr-2004 peter

Update to include both the L1 and L2 TLB stats, as well as the seperate
2M/4M page TLB vs 4K page TLB stats. This also applies to the i386
platform, as does the cpu features fixes.


127973 07-Apr-2004 peter

MFi386: move rss() from db_interface.c to cpufunc.h


127920 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999 and email from Peter Wemm.

Approved by: core, peter


127914 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


127913 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999 and permission from Alan Cox.

Approved by: core, alc@


127869 05-Apr-2004 alc

Remove unused arguments from pmap_init().


127803 03-Apr-2004 alc

Remove ptmmap and ptvmmap. They are unused on amd64.


127788 03-Apr-2004 alc

In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not. Add a new parameter so that the caller can specify which is
the case.

Reported by: dillon


127786 03-Apr-2004 alc

Microoptimize pagezero() based upon something that I learned writing the
optimized pagecopy(). This also has the virtual of making these two
functions more similar in style.


127653 31-Mar-2004 alc

- Add an optimized page copy function for use by pmap_copy_page(). It is
roughly four times faster than bcopy() for uncached pages.
- Sort the function prototypes in md_var.h.


127582 29-Mar-2004 peter

Finish tidying up a couple of leftovers from the KSTACK_PAGES stuff. Some
files still #included the opt_ file. powerpc hadn't been updated yet.


127392 25-Mar-2004 peter

MFi386: correctly calculate the top-of-stack when a kthread is created
with a larger kernel stack.


127391 25-Mar-2004 peter

Run print_AMD_features() for both AuthenticAMD and GenuineIntel cpus.
Report the %ecx bits in cpuid function 1. This is a hack.
When reporting AMD Features, only mask off the common bits. Otherwise
the SEP bit masks off SYSCALL etc in the report.


127390 25-Mar-2004 obrien

Add NTFS since many may want to dual-boot MS-Win64 w/FreeBSD.


127241 20-Mar-2004 alc

- Add uiomove_fromphys() implementations to alpha and ia64. These only
differ trivially from amd64.
- Correct a spelling error in a comment.


127239 20-Mar-2004 marcel

Introduce the cpumask_t type. The purpose of the type is to create a
level of abstraction for any and all CPU mask and CPU bitmap variables
so that platforms have the ability to break free from the hard limit
of 32 CPUs, simply because we don't have more bits in an u_int. Note
that the type is not supposed to solve massive parallelism, where
the number of CPUs can be larger than the width of the widest integral
type. As such, cpumask_t is not supposed to be a compound type. If
such would be necessary in the future, we can deal with the issues
then and there. For now, it can be assumed that the type is integral
and unsigned.

With this commit, all MD definitions start off as u_int. This allows
us to phase-in cpumask_t at our leasure without breaking anything.
Once cpumask_t is used consistently, platforms can switch to wider
(or smaller) types if such would be beneficial (or not; whatever :-)

Compile-tested on: i386


127236 20-Mar-2004 alc

Introduce uiomove_fromphys(). This is a variant of uiomove() that takes
a collection of physical pages as the source. On amd64 it is implemented
using the direct virtual-to-physical map.


127191 19-Mar-2004 obrien

'vi' got away from me in rev. 1.13.


127158 18-Mar-2004 obrien

Document machdep.hlt_cpus.

Submitted by: Craig Rodrigues <rodrigc@crodrigues.org>


127151 18-Mar-2004 obrien

Cleanup hints, given that no hammer machine have (nor ever will have)
ISA slots.

Submitted by: Peter


127146 17-Mar-2004 jmg

sync comment with i386's isa.c.. This removes a comment that is YEARS
old...


127135 17-Mar-2004 njl

Convert callers to the new bus_alloc_resource_any(9) API.

Submitted by: Mark Santcroos <marks@ripe.net>
Reviewed by: imp, dfr, bde


127086 16-Mar-2004 alc

Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.


127002 15-Mar-2004 obrien

Shorten a long comment.


126931 13-Mar-2004 peter

Re-kill ispcvt on amd64 - rc.d/syscons was fixed ages ago.


126930 13-Mar-2004 peter

MFp4: comment out options that don't exist so that they cannot be
accidently added to config files and be silently accepted.
Comment out one bogo-option that crept into NOTES.


126929 13-Mar-2004 peter

Diff reduction with current. Correct comment about ed etc.


126928 13-Mar-2004 peter

Move the non-MD machine/dvcfg.h and machine/physio_proc.h to a common
MI area before they proliferate more.


126927 13-Mar-2004 peter

Drastically clean up the legacy host-pci bridge table. We don't need
all the ancient Intel/VIA/SIS/etc chipsets on amd64 systems. Even the
newer intel stuff won't need this since we use acpi by default and we
don't have all their magic programming information. Just use a generic
"Host to PCI bridge" name if we ever hit this code.


126926 13-Mar-2004 peter

MFi386: nuke pci_cfgintr


126925 13-Mar-2004 peter

Reduce the scope of the Giant lock being held for non-mpsafe syscalls.
There was way too much code being covered.


126919 13-Mar-2004 scottl

Now that contigfree() does not require Giant, don't grab it in busdma.


126891 12-Mar-2004 trhodes

These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
- in_cksum.[ch]:
* use a generic C version instead of the assembly version in the !gcc
case (ASM code breaks with the optimizations icc does)
-> no bad checksums with an icc compiled kernel
Help from: andre, grehan, das
Stolen from: alpha version via ppc version
The entire checksum code should IMHO be replaced with the DragonFly
version (because it isn't guaranteed future revisions of gcc will
include similar optimizations) as in:
---snip---
Revision Changes Path
1.12 +1 -0 src/sys/conf/files.i386
1.4 +142 -558 src/sys/i386/i386/in_cksum.c
1.5 +33 -69 src/sys/i386/include/in_cksum.h
1.5 +2 -0 src/sys/netinet/igmp.c
1.6 +0 -1 src/sys/netinet/in.h
1.6 +2 -0 src/sys/netinet/ip_icmp.c

1.4 +3 -4 src/contrib/ipfilter/ip_compat.h
1.3 +1 -2 src/sbin/natd/icmp.c
1.4 +0 -1 src/sbin/natd/natd.c
1.48 +1 -0 src/sys/conf/files
1.2 +0 -1 src/sys/conf/files.amd64
1.13 +0 -1 src/sys/conf/files.i386
1.5 +0 -1 src/sys/conf/files.pc98
1.7 +1 -1 src/sys/contrib/ipfilter/netinet/fil.c
1.10 +2 -3 src/sys/contrib/ipfilter/netinet/ip_compat.h
1.10 +1 -1 src/sys/contrib/ipfilter/netinet/ip_fil.c
1.7 +1 -1 src/sys/dev/netif/txp/if_txp.c
1.7 +1 -1 src/sys/net/ip_mroute/ip_mroute.c
1.7 +1 -2 src/sys/net/ipfw/ip_fw2.c
1.6 +1 -2 src/sys/netinet/igmp.c
1.4 +158 -116 src/sys/netinet/in_cksum.c
1.6 +1 -1 src/sys/netinet/ip_gre.c
1.7 +1 -2 src/sys/netinet/ip_icmp.c
1.10 +1 -1 src/sys/netinet/ip_input.c
1.10 +1 -2 src/sys/netinet/ip_output.c
1.13 +1 -2 src/sys/netinet/tcp_input.c
1.9 +1 -2 src/sys/netinet/tcp_output.c
1.10 +1 -1 src/sys/netinet/tcp_subr.c
1.10 +1 -1 src/sys/netinet/tcp_syncache.c
1.9 +1 -2 src/sys/netinet/udp_usrreq.c

1.5 +1 -2 src/sys/netinet6/ipsec.c
1.5 +1 -2 src/sys/netproto/ipsec/ipsec.c
1.5 +1 -1 src/sys/netproto/ipsec/ipsec_input.c
1.4 +1 -2 src/sys/netproto/ipsec/ipsec_output.c

and finally remove
sys/i386/i386 in_cksum.c
sys/i386/include in_cksum.h
---snip---
- endian.h:
* DTRT in C++ mode
- quad.h:
* we don't use gcc v1 anymore, remove support for it
Suggested by: bde (long ago)
- assym.h:
* avoid zero-length arrays (remove dependency on a gcc specific
feature)
This change changes the contents of the object file, but as it's
only used to generate some values for a header, and the generator
knows how to handle this, there's no impact in the gcc case.
Explained by: bde
Submitted by: Marius Strobl <marius@alchemy.franken.de>
- aicasm.c:
* minor change to teach it about the way icc spells "-nostdinc"
Not approved by: gibbs (no reply to my mail)
- bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by: -arch
Submitted by: netchild


126846 11-Mar-2004 bde

Don't implement anything in the ffs family in <machine/cpufunc.h>
in the non-_KERNEL case. This "fixes" applications that include
this "kernel-only" header and also include <strings.h> (or get
<strings.h> via the default _BSD_VISIBLE pollution in <string.h>.
In C++ there was a fatal error: the declaration specifies C linkage
but the implementation gives C++ linkage. In C there was only a
static/extern mismatch if the headers were included in a certain order
order, and a partially redundant declaration for all include orders;
gcc emits incomplete or wrong diagnostics for these, but only for
compiling with -Wsystem-headers and certain other warning options, so
the problem was usually not seen for C.

Ports breakage reported by: kris


126828 11-Mar-2004 marcel

Remove stale or broken call to kdb_trap() and protected by the non-
option KDB. Besides being wrong, it also interferes with ongoing
work.


126735 08-Mar-2004 peter

Stop depending on #include pollution from cpufunc.h


126734 08-Mar-2004 peter

MFi386: re-sort non-gcc function prototypes, trim includes


126733 08-Mar-2004 peter

MFi386: curpcb is no longer null anymore, so do not test for it.


126732 08-Mar-2004 peter

MFi386: set initial curpcb pcpu variable at startup time rather than
waiting for a context switch


126731 08-Mar-2004 peter

MFi386: wait for local apic to become free before using it


126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


126715 07-Mar-2004 alc

Remove unused declarations. (Some time ago, these variables became fields
of vm/vm.h's struct kva_md_info.)


126677 06-Mar-2004 peter

When faced with a "GenuineIntel", we know what they call it now. Replace
snide comment with a different one.


126654 05-Mar-2004 bde

MFi386: (all: keep a comment in sync with code, and don't depend on
namespace pollution).


126649 05-Mar-2004 le

Fix syntax errors and wrong function prototypes in several MD header
files when using non-GNUC compilers.

PR: kern/58515
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>
Approved by: grog (mentor), obrien


126642 05-Mar-2004 obrien

Document that ENABLE_ALART controls the alarm on Intel intpm driver.

Submitted by: peter


126638 05-Mar-2004 obrien

Sync with i386/NOTES.


126637 05-Mar-2004 obrien

Add comment for 'mptable'.

Submitted by: peter


126635 05-Mar-2004 obrien

Note that imp is working on un-shimming this driver, afterwards it should
work on AMD64.


126633 05-Mar-2004 obrien

The PECOFF support is 32-bit only.

Reviewed by: peter


126541 03-Mar-2004 obrien

Sync with i386/NOTES rev. 1.1131.


126528 03-Mar-2004 obrien

AMD64 versions.


126246 25-Feb-2004 peter

Since we don't use PG_NX yet, don't turn on EFER_NXE quite yet. This needs
to be done based on the cpuid bits. AMD says that we should test the cpuid
features bits for certain things, such as this.


126089 21-Feb-2004 peter

Catch up with some proc/procsig locking improvements that were made to the
i386 version and were not merged over.


126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


125984 19-Feb-2004 obrien

Checkpoint the NOTES I was working on.


125584 08-Feb-2004 peter

I forgot to add the NO_MODULES override for NOTES


125530 06-Feb-2004 peter

Remove the badsw* INVARIANTS checks. The events that this attempts
to catch are already nicely caught by trapping the null pointer derefs.
Remove no-longer-used noswitch/nothrow strings. They were referenced
by the stub cpu_switch() etc functions before they were implemented.
Try something a little different for the lock prefixes.

Prompted by: bde (the first two items anyway)


125512 06-Feb-2004 peter

Turn of ath since it causes a link failure without the hal till sam's
set up with a cross compiler and has the time to port the hal.


125487 05-Feb-2004 kan

Rename cn_unavailable to cnunavailable for little more consistency.
Garbage collect unused cndebug() function.

Suggested by: bde


125467 05-Feb-2004 kan

Eliminate global cons_unavailable flag and replace it by the status
bit maintained on a per-device basis. Single variable is inadequate
on machines running with multiple consoles enabled.


125464 05-Feb-2004 peter

Don't cast a pointer to an int that isn't big enough.


125463 05-Feb-2004 peter

Fix long/int printf format problems exposed by PMAP_DIAGNOSTIC


125461 04-Feb-2004 peter

Checkpoint a NOTES file I had as of Nov 23rd. It doesn't quite compile
due to triggering some printf breakage in some DIAGNOSTIC printfs.


125312 02-Feb-2004 obrien

Remove a device that will compile fine, isn't 64-bit clean.


125222 30-Jan-2004 peter

GRR. MFi386: white space spam


125221 30-Jan-2004 peter

Merge some more changes from i386.


125183 29-Jan-2004 peter

Re-add debug register support.
Some other minor tweaks snuck in here, including supporting more
discontiguous memory segments and some cosmetic tweaks.


125182 29-Jan-2004 peter

Re-add user_dbreg_trap() for debug register support


125181 29-Jan-2004 peter

Take another shot at the invariants calls to __panic. They hadn't been
updated for the regparm ABI on amd64.
Context switch debug regs.
Update for fpu simplification
Don't needlessly reload %cr3, in case the cpu has the tlb flush filter
turned off. Re-add LAZY_SWITCH stubs.


125180 28-Jan-2004 peter

deal with dbregs for fork etc
update for fpu.c simplification
Merge #include sort from i386


125179 28-Jan-2004 peter

Un-stub the hardware debug register stuff.


125178 28-Jan-2004 peter

Export PCB_DR* symbols


125177 28-Jan-2004 peter

We can simplify a lot of things now that we don't have to worry about
hardware bugs on external 386 cpus and now that we can depend on SSE.


125176 28-Jan-2004 peter

Add dbreg struct definitions for /proc/*/dbregs and a place to store the
registers in the pcb


125175 28-Jan-2004 peter

Re-add debug register functions


125174 28-Jan-2004 peter

MFi386: mp_topology().


125173 28-Jan-2004 peter

MFi386: add THERMTRIP msr values


125172 28-Jan-2004 peter

Diff reduction with i386


125163 28-Jan-2004 peter

MFi386: change an outb to a DELAY()


124950 25-Jan-2004 alc

MFi386 revision 1.230
- Move smp_topology to subr_smp.c so that it is defined on all architectures.


124935 24-Jan-2004 jeff

- Recruit some new ULE users by making it the default scheduler in GENERIC.
ULE will be in a probationary period to determine whether it will be left
as the default in 5.3 which would likely mean the rest of the 5.x series.


124919 24-Jan-2004 nectar

Add PFIL_HOOKS to the GENERIC kernel configuration, primarily so
that one can load the IPFilter module (which requires PFIL_HOOKS).

Requested by: Many, for over a year


124850 23-Jan-2004 peter

Unbreak amd64: Rename calls from panic to __panic


124631 17-Jan-2004 phk

remove elan_mmcr, I'm not sure I understand what it did here in the
first place.


124296 09-Jan-2004 nectar

Provide sysarch(2) prototypes in the MD sysarch.h headers. While I'm
at it, use the ANSI C generic pointer type for the second argument,
thus matching the documentation.

Remove the now extraneous (and now conflicting) function declarations
in various libc sources. Remove now unnecessary casts.

Reviewed by: bde


124194 06-Jan-2004 nectar

Remove `static' prototype from header file.


124187 06-Jan-2004 jhb

Use i8259A register defines from shared header sys/dev/ic/i8259.h instead
of from the amd64-specific icu.h.


124092 03-Jan-2004 davidxu

Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr


124038 01-Jan-2004 alc

- Use pagezero() instead of bzero() in pmap_pinit(). (pagezero() is much
faster.)
MFi386:
- Don't bother clearing PG_ZERO on the page table page in
_pmap_allocpte(); it serves no purpose.
- Don't bother clearing and setting PG_BUSY on page table directory pages.


123929 28-Dec-2003 silby

Track three new sendfile-related statistics:
- The number of times sendfile had to do disk I/O
- The number of times sfbuf allocation failed
- The number of times sfbuf allocation had to wait


123920 28-Dec-2003 silby

Move the declaration of sfbufspeak and sfbufsused to mbuf.h,
and use imax instead of max, as sfbufspeak and sfbufsused
are signed.

Submitted by: bde


123884 27-Dec-2003 silby

Track current and peak sfbuf usage, export the values via sysctl.


123791 24-Dec-2003 peter

GC the unused <machine/kse.h> file.


123742 23-Dec-2003 peter

Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.


123710 22-Dec-2003 alc

- Significantly reduce the number of preallocated pv entries in
pmap_init(). Such a large preallocation is unnecessary and wastes
nearly eight megabytes of kernel virtual address space per gigabyte
of managed physical memory.
- Increase UMA_BOOT_PAGES by two. This enables the removal of
pmap_pv_allocf(). (Note: this function was only used during
initialization, specifically, after pmap_init() but before
pmap_init2(). During pmap_init2(), a new allocator is installed.)


123692 20-Dec-2003 alc

Since we have additional kernel virtual address space, allow the buffer
cache to grow to 400M bytes.


123429 11-Dec-2003 peter

MFi386: remove APIC_IRQ* defines that are no longer used.


123428 11-Dec-2003 peter

MFi386: (jhb): Deal with MAXCPU etc correctly


123368 10-Dec-2003 obrien

Add just enough of i386/include/pcvt_ioctl.h to amd64/include/pcvt_ioctl.h
such that 'ispcvt' can build. Unforunately 'ispcvt' is needed in order for
/etc/rc.d/syscons to run. This fixes the bug where I could not get my
keymap effective at boot.


123326 09-Dec-2003 njl

Use the ACPI-CA definitions for the various APIC tables instead of our
own.


123214 07-Dec-2003 alc

Increase VM_KMEM_SIZE_MAX from 200MB to 400MB.

Discussed with: peter


123182 06-Dec-2003 peter

Reconfigure the runq macros to use the 64 bit ffs/bsf routines instead
of doing a loop and taking two 32 bit passes at the runqueue bits. All
the 64 bit platforms should probably do this since there are 64 run queues.

Approved by: re (scottl)


123181 06-Dec-2003 peter

Add 64 bit bsf*/ffs* routines. Have the ffs() inline use gcc's builtin
because it uses the better cmove instructions to avoid branches.


123180 06-Dec-2003 peter

Various whitespace and cosmetic sync-up's with i386.

Approved by: re (scottl)


123179 06-Dec-2003 peter

amd64_protection_init and the protection_codes[] array was overkill.
Inline it instead.

Approved by: re (scottl)


123178 06-Dec-2003 peter

Kill the ASM versions of the mtx_lock_spin and friends. They were never
used on amd64, and were actually totally broken. They had the wrong
calling conventions. I believe the i386 versions are going away too.

Approved by: re (scottl)


123177 06-Dec-2003 peter

MFi386: put the apic disable hook in a better place.

Approved by: re (scottl)


123175 06-Dec-2003 peter

Revert some amd64 changes that cached curthread and converge back to the
i386 version. The curthread special case in pcpu.h solves my complaint
about the verbose macro expansion in this case. Note that the i386
version still has some OBE comments, I didn't re-add them back again.

Approved by: re (scottl)


123126 03-Dec-2003 jhb

Fix all users of mp_maxid to use the same semantics, namely:

1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by: re (scottl)
Tested on: i386, amd64, alpha


123119 03-Dec-2003 peter

Catch up with the procsig locking changes elsewhere. We were doing
things like copyin/out with psp->ps_mtx held. That was not good.

Approved by: re (scottl)


123118 03-Dec-2003 peter

Add an additional knob to just disable the apic code without also having
to resort to disabling acpi as well. I'll document this in the release
notes for amd64.

Approved by: re (scottl)


123074 30-Nov-2003 jeff

- Make mp_maxid reflect the same meaning as it does on other architectures.
It is one past the last valid cpuid. This relied on a different bug in
UMA to work properly.

Reported/Tested by: phk
Approved by: rwatson


123010 27-Nov-2003 peter

Fix i386 apic support merge botch. sizeof(long) is 8, not 4. This fixes
the annoying 'sysctl: hw.intrcnt: out of memory' error message in systat.

Approved by: re (rwatson)


122950 22-Nov-2003 peter

Argh! The Athlon64 and Opteron only implement 40 bits of address space in
the MTRR Base/Mask registers. If you use the documented algorithm in the
systems programming guide, you'll get a GPF. The only thing that has
prevented this so far is that the bios pre-sets some MTRR entries which
we mis-interpreted sufficiently to fool the memcontrol interface into
thinking all the address space was taken and therefore rejected XFree86's
requests. However, not all bioses do this.. You get an insta-panic in
that case. Grrr. A better fix (dynamic mask) will happen by 5.3/5-stable
so that we automatically adapt to more than 40 physical bits.

Approved by: re (scottl)


122947 21-Nov-2003 jhb

- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.

Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)


122941 21-Nov-2003 peter

Turn on NO_MIXED_MODE for amd64 generic. It turns out that all the
known samples of broken chipsets that needed mixed mode in the first place
are so broken (ie: locks up) that we can't use IO APIC mode at all and it
needs to be turned off in the bios. So, the MIXED_MODE penalty on the
good chipsets gained nothing.

Approved by: re (scottl)


122940 21-Nov-2003 peter

Cosmetic and/or trivial sync up with i386.

Approved by: re (rwatson)


122939 21-Nov-2003 peter

MFi386 rev 1.54 (jhb): Add interrupts that are actually available to the
resource manager, rather than adding everything.

Approved by: re (scottl)


122938 21-Nov-2003 peter

MFi386: pre-register idt slots for atpic so we catch any strays without
blowing up.

Approved by: re (scottl)


122937 21-Nov-2003 peter

MFi386 rev 1.207 (phk): Don't mistakenly disable the TSC when using
statclock_disable.

Approved by: re (scottl)


122932 20-Nov-2003 peter

Argh! Followup to previous commit. I checked in the patch with an
unintended local change. Change Xurthread back to curthread.


122930 20-Nov-2003 peter

Provide a streamlined '#define curthread __curthread()' for amd64 to avoid
the compiler having to parse and optimize the PCPU_GET(curthread) so often.
__curthread() is an inline optimized version of PCPU_GET(curthread) that
knows that pc_curthread is at offset zero in the pcpu struct. Add a
CTASSERT() to catch any possible changes to this. This accounts for
just over a 1% wall clock speedup for total kernel compile/link time,
and 20% compile time speedup on some specific files depending on which
compile options are used.

Approved by: re (jhb)


122901 19-Nov-2003 peter

Sync with i386.
- turn on SMP in generic
- add 'device atpic' - this is unconditional on i386, but certain nvidia
based systems need to disable acpi because the reference bios seems to be
hosed. If acpi is disabled, we won't find the apic. amd64 has the
mptable code in a seperate compile option as well.
- turn sym back on, it doesn't fail to compile anymore.

Approved by: re


122851 17-Nov-2003 peter

Add SMP changes as should have been committed as rev 1.28


122850 17-Nov-2003 peter

Restore file accidently killed in the crossfire from the smp commit.


122849 17-Nov-2003 peter

Initial landing of SMP support for FreeBSD/amd64.

- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.


122845 17-Nov-2003 peter

Oh, how embarresing. I broke my own platform. :-)


122841 17-Nov-2003 peter

Widen the enable/disable helper function's argument in line with the
ithread_create() changes etc. This should be mostly a NOP.


122833 17-Nov-2003 bde

Fixed pedantic warnings for statement-expressions using __extension__
and by not using a statement-expression for the non-expression
__PCPU_SET().


122829 17-Nov-2003 bde

Fixed a pedantic syntax error (a stray semicolon at the end of
PCPU_MD_FIELDS).


122821 16-Nov-2003 alc

- Remove unnecessary synchronization from sf_buf_init(). (There is only
one active CPU when sf_buf_init() is performed.)


122780 16-Nov-2003 alc

- Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.


122771 16-Nov-2003 bde

Localized the cy driver's locking.


122763 15-Nov-2003 njl

Add the pc_acpi_id PCPU member. The new acpi_cpu driver uses this to
dereference the softc.


122714 14-Nov-2003 peter

Preemptively burn a bridges. The isa timer code is likely to be
replaced by the HPET timer at some point, so dont even make a release
with the aquire/release_timer0 functions.


122713 14-Nov-2003 peter

Minor source sync with amd64. Use int as the type for the width
field of %.*s rather than size_t.


122712 14-Nov-2003 peter

Minor source sync with amd64. For %.*s printf formats, pass in an
int rather than a size_t. cast the ioapicaddress variable via
uintptr_t before going to void *.


122711 14-Nov-2003 peter

Convert a couple of pointers to integers for source compatability with
amd64.


122710 14-Nov-2003 peter

Whitespace nit (sorry, couldn't help it)


122703 14-Nov-2003 jhb

Always install IDT entries for ATPIC interrupt sources. The APIC no
longer uses these interrupt vectors for its ISA interrupt pins, so these
entries will not be overwritten. If we get a spurious interrupt from the
ATPIC when using the APIC, it will be treated as a stray interrupt instead
of causing a panic.


122700 14-Nov-2003 jhb

If an interrupt source doesn't have an ithread, treat it as a stray
interrupt. This can only happen if an unregistered interrupt source
triggers an interrupt.


122697 14-Nov-2003 peter

basemem is in K, not bytes. I think I tricked jhb into making the same
mistake I did and then committing it to cvs.


122693 14-Nov-2003 peter

"opt_auto_eoi.h" is not used here anymore. See atpic.c.


122692 14-Nov-2003 jhb

Replace magic numbers with macros for i8259A register constants. Still
need the ICW4 bits for PC98 though.


122690 14-Nov-2003 jhb

Shuffle the APIC interrupt vectors around a bit:
- Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff
range. The pmap lazyfix IPI was reordered down next to the TLB
shootdowns to avoid conflicting with the spurious interrupt vector.
- Move the base of APIC interrupts up 16 so that the first 16 APIC
interrupts do not overlap the vectors used by the ATPIC.
- Remove bogus interrupt vector reservations for LINT[01].
- Now that 0xc0 - 0xef are available, use them for device interrupts.
This increases the number of APIC device interrupts to 191.
- Increase the system-wide number of global interrupts to 191 to catch up
to more APIC interrupts.

Requested by: peter (2)


122684 14-Nov-2003 peter

Fix up the control word 3 bits. jhb discovered how much I screwed this
up. :-]


122620 13-Nov-2003 jhb

Whitespace.


122617 13-Nov-2003 jhb

Fix a typo.


122595 13-Nov-2003 peter

Stop pretending to support kernel profiling. The FAKE_MCOUNT() etc
calls are just gradually getting more and more stale. At this point it
would be better to start from scratch once prof_machdep.c is adapted.


122572 12-Nov-2003 jhb

- Move manipulation of td_intr_nesting_level out of assembly interrupt
vector stubs and into the C functions they call.
- Move disabling and EOIing of interrupt sources out of PIC driver entry
points and into intr_execute_handlers(). Intr_execute_handlers() only
disables a source for an interrupt if it is a stray interrupt or has
threaded handlers. Sources with fast handlers no longer disable (mask)
the source while executing the handlers.
- Move the setting of clkintr_pending into intr_execute_handlers() and set
the variable for any interrupt source with a vector of 0. (Should only
be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE
case.
- Implement lapic_eoi() and use it to implement ioapic_eoi_source().
- Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to
handle all atpic interrupts and not just threaded ones.

Inspired by: peter's changes to amd64 in p4 (1)
Requested by: bde (2)


122520 12-Nov-2003 peter

Cosmetic sync with i386


122511 11-Nov-2003 jhb

Don't probe busses in the MP Table for the MP Table PCI bridge drivers
if the bus number doesn't correspond to a PCI bus in the MP Table.

Reported by: jhay


122502 11-Nov-2003 jhb

Some motherboards like to remap the SCI (normally IRQ 9) up to a PCI
interrupt such as IRQ 22 or 19. However, the ACPI BIOS still routes
interrupts from some PCI devices to the same intpin calling the pin
IRQ 22. Thus, ACPI expects to address a single interrupt source via two
different names. To work around this, if the SCI is remapped to a non-ISA
interrupt (i.e., greater than 15), then we use
acpi_OverrideInterruptLevel() function to tell ACPI to use IRQ 22 or 19
rather than IRQ 9 for the SCI.

Previously we would change IRQ 22 or 19's name to IRQ 9 when we encountered
such an Interrupt Source Override entry in the MADT which routed the SCI
properly but left PCI devices mapped to IRQ 22 or 19 w/o a routable
interrupt.

Tested by: sos


122491 11-Nov-2003 jhb

Enable HTT CPUs by default instead of halting them by default. Users
should now only have HTT CPUs if they have explicitly asked for them
either by enabling HyperThreading in the BIOS or by using the
MPTABLE_FORCE_HTT kernel option.


122490 11-Nov-2003 jhb

Disable probing of HTT CPUs by default for the MP Table case. HTT CPUs
should only be used if they are enabled in the BIOS. Now that we support
enumerating CPUs using the ACPI MADT, any HTT machine using ACPI should
respect the BIOS setting. For HTT machines with ACPI disabled in the
kernel, the MPTABLE_FORCE_HTT kernel option can be used to try to probe HTT
CPUs like have done in the past for the MP Table case. This option should
only be enabled if HTT is enabled in the BIOS.


122438 10-Nov-2003 jhb

MFamd64 (via P4, not in CVS yet):
- Use the static boot_address variable directly rather than passing it
around to several functions.
- Clean up a couple of magic numbers.


122434 10-Nov-2003 jhb

Bump APIC ID limits up to 32 since a machine with 16 CPUs will have APIC
IDs for the I/O APICs that are greater than 16.

Reported by: John Cagle <john.cagle@hp.com>


122364 09-Nov-2003 marcel

Change the clear_ret argument of get_mcontext() to be a flags argument.
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.

This change is triggered by a need on ia64 to have another knob for
get_mcontext().


122303 08-Nov-2003 peter

Move a MD 32 bit binary support routine into the MD areas. exec_setregs
is highly MD in an emulation environment since it operates on the host
environment. Although the setregs functions are really for exec support
rather than signals, they deal with the same sorts of context and include
files. So I put it there rather than create yet another file.


122296 08-Nov-2003 peter

Update the graffiti.


122295 08-Nov-2003 peter

Switch from having a fpu "device" to something that is more like the
integrated part of the cpu core that it is.


122292 08-Nov-2003 peter

The great s/npx/fpu/gi


122288 08-Nov-2003 peter

Converge with i386/GENERIC


122278 08-Nov-2003 peter

Rename npx* to fpu*. I haven't done the flags/function names yet.


122269 08-Nov-2003 peter

There isn't much point printing 'npx0: INT 16 interface' because that is
the only way it works here.


122268 07-Nov-2003 jhb

Dump the trigger and polarity of each intpin's default setting in the
bootverbose output.


122266 07-Nov-2003 scottl

Document the lockfunc and lockfuncarg arguments to bus_dma_tag_create() in
the busdma headers.


122172 06-Nov-2003 jhb

Only disable the old pin when doing a remap if it's current vector is still
the old vector.

Reported by: sam


122156 06-Nov-2003 peter

OK, this might be a bit silly, but add another popcnt() candidate.


122149 05-Nov-2003 jhb

When remapping an ISA interrupt from one intpin to another, disable the
pin that is used by the default identity mapping if it still maps to the
old vector. The ACPI case might need some tweaking for the SCI interrupt
case since ACPI likes to address the intpin using both the IRQ remapped to
it as well as the previous existing PCI IRQ mapped to it.

Reported by: kan


122148 05-Nov-2003 jhb

Two style nits.


122124 05-Nov-2003 jhb

- Adjust some of the bitfields in the ioapic_intsrc struct to be unsigned
rather than signed. This fixes some cosmetics such as verbose printf's
for IRQs greater than 127.
- The calculation for next_ioapic_base was also adjusted so that it will
only complain once for each hole in the IRQs provided by ACPI for IO
APICs.

Reported by: Michal Mertl <mime@traveller.cz>


122123 05-Nov-2003 jhb

Add a workaround for MP Tables that list the same PCI IRQ twice with
the same APIC / pin destination in both cases.

Reported by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


122064 04-Nov-2003 jhb

Tweak the version string output for ioapic devices.


122051 04-Nov-2003 nyan

Fix to support pc98.


122048 04-Nov-2003 nyan

Split pc98 support into pc98/pc98/nmi.c.


122016 04-Nov-2003 peter

Make this compile with PAE.


121996 03-Nov-2003 jhb

New i386 SMP code:

- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.


121995 03-Nov-2003 jhb

Don't probe PnP BIOS devices for PICs for now to avoid problems with those
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.


121992 03-Nov-2003 jhb

Add the ACPI MADT table APIC enumerator. This code uses the ACPI Multiple
APIC Descriptor Table to enumerate both I/O APICs and local APICs. ACPI
does not embed PCI interrupt routing information in the MADT like the MP
Table does. Instead, ACPI stores the PCI interrupt routing information
in the _PRT object under each PCI bus device. The MADT table simply
provides hints about which interrupt vectors map to which I/O APICs. Thus
when using ACPI, the existing ACPI PCI bridge drivers are sufficient to
route PCI interrupts.


121991 03-Nov-2003 jhb

Add the MP Table APIC enumerator. This code uses the BIOS MP Table to
enumerate I/O APICs as well as local APICs. It also provides Host-PCI
and PCI-PCI bridge drivers to use the MP Table to route PCI interrupts.


121986 03-Nov-2003 jhb

New APIC support code:

- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.


121985 03-Nov-2003 jhb

Add the new atpic(4) driver for the 8259A master and slave PICs. By
default we provide 16 interrupt sources for IRQs 0 through 15. However,
if the I/O APIC driver has already registered sources for any of those IRQs
then we will silently fail to register our own source for that IRQ.

Note that i386/isa/icu.h is now specific to the 8259A and no longer
contains any info relevant to APICs. Also note that fast interrupts no
longer use a separate entry point. Instead, both fast and threaded
interrupts share the same entry point which merely looks up the appropriate
source and passes control to intr_execute_handlers().


121982 03-Nov-2003 jhb

New device interrupt code. This defines an interrupt source abstraction
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.

By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.


121980 03-Nov-2003 jhb

Move the NMI handling code out to its own file.


121755 30-Oct-2003 jhb

Include "opt_pmap.h" so that the DISABLE_P* options are honored.


121754 30-Oct-2003 jhb

Always export r_gdt and r_idt and give them extern declarations in
machine/segments.h.


121751 30-Oct-2003 peter

MFi386: thread specific fpu state optimizations


121723 30-Oct-2003 peter

MFi386: rev 1.451 (jhb): call pmap_kremove() rather than duplicate it


121722 30-Oct-2003 peter

MFi386: trap.c rev 1.259: fetch thread mailbox address in page fault trap


121623 28-Oct-2003 peter

Oops. Remove some rather noisy debug printfs that slipped in there
somehow.


121481 24-Oct-2003 jhb

A few whitespace and comment tweaks.


121450 24-Oct-2003 peter

Add __va_copy and make it always visible, in spite of the __ISO_C_VISIBLE
setting. Make va_copy be an alias if __ISO_C_VISIBLE >= 1999.

Why? more than a few ports have an autoconf that looks for __va_copy
because it is available on glibc. It is critical that we use it if
at all possible on amd64. It generally isn't a problem for i386 and its
ilk because autoconf driven code tends to fall back to an assignment.


121405 23-Oct-2003 peter

Use a more robust API altogether for the amd64_get_fsbase() etc functions.


121398 23-Oct-2003 peter

Renumber the sysarch vectors for amd64 specific syscalls so that I can
implement i386 compat numbers where it makes sense. This would save a
syscall translation layer. Yes, this breaks the abi slightly again, but
fortunately its just a recompile rather than tweaking the source. I will
be fixing the libc stubs while I'm here.


121307 21-Oct-2003 silby

Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.


121228 18-Oct-2003 njl

Add the cpu_idle_hook() function pointer so that other idlers can be
hooked at runtime. Make C1 sleep (e.g., HLT) be the default. This
prepares the way for further ACPI sleep states.


121133 16-Oct-2003 bde

Don't forget to load %es with the kernel data segment selector in
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.

This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.


121103 15-Oct-2003 peter

Pull the tier-2 card one last time and break the get/setcontext and
sigreturn() ABI and the signal context on the stack.

Make the trapframe (and its shadows in the ucontext and sigframe etc)
8 bytes larger in order to preserve 16 byte stack alignment for the
following C code calls. I could have done some padding after the
trapframe was saved, but some of the C code still expects an argument of
'struct trapframe'. Anyway, this gives me a spare field that can be used
to store things like 'partial trapframe' status or something else in
the future.

The runtime impact is fairly small, *except* for threaded apps and things
that decode contexts and the signal stack (eg: cvsup binary). Signal
delivery isn't too badly affected because the kernel generates the
sigframe that sigreturn uses after the handler has been called.

The size of mcontext_t and struct sigframe hasn't changed. Only
the last few fields (sc_eip etc) got moved a little and I eliminated
a spare field. mc_len/sc_len did change location though so the
sanity checks there will still trap it.


121081 14-Oct-2003 alc

MFia64
Move uma_small_alloc() and uma_small_free() to uma_machdep.c.


120937 09-Oct-2003 robert

Implement preliminary support for the PT_SYSCALL command to ptrace(2).


120831 06-Oct-2003 bms

Move pmap_resident_count() from the MD pmap.h to the MI pmap.h.
Add a definition of pmap_wired_count().
Add a definition of vmspace_wired_count().

Reviewed by: truckman
Discussed with: peter


120772 05-Oct-2003 alc

Don't bother setting a page table page's valid field. It is unused and
not setting it is consistent with other uses of VM_ALLOC_NOOBJ pages.


120722 03-Oct-2003 alc

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


120661 02-Oct-2003 alc

Reimplement pagezero() using "movnti".


120654 01-Oct-2003 peter

Commit Bosko's patch to clean up the PSE/PG_G initialization to and
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.

For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.

A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.

Obtained from: bmilekic


120617 01-Oct-2003 peter

Use __register_t instead of register_t, otherwise <sys/types.h> is a
prerequisite for <ucontext.h> on amd64. Oops.


120597 30-Sep-2003 peter

MFi386: Do not depend on LEAPYEAR() macro boolean values being 0 or 1.
MFi386: Add quality field for timer0


120596 30-Sep-2003 peter

MFi386: BURN_BRIDGES around timer0 functions


120595 30-Sep-2003 jeff

- Remove the definition for TD_SWITCHIN as it is not used.

Approved by: peter


120525 27-Sep-2003 alc

Eliminate the pte object.


120449 26-Sep-2003 alc

MFi386
Allocate the page table directory page as "no object" pages.


120427 25-Sep-2003 alc

MFi386
Reimplement pmap_release() such that it uses the page table rather than
the pte object to locate the page table directory pages. (Temporarily,
retain an assertion on the emptiness of the pte object.)


120423 25-Sep-2003 peter

Re-raise the default datasize and stacksize now that the 32 bit exec
support can clip it to sensible values.


120422 25-Sep-2003 peter

Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.


120375 23-Sep-2003 nyan

Implement the bus_space_map() function to allocate resources and initialize
a bus_handle, but currently it does only initializing a bus_handle.


120369 23-Sep-2003 peter

Oops. back out last commit. The data and stack limits are used by the
32 bit binary stuff. 32 bit binaries do not like it much when the kernel
tries hard to put things above the 8GB mark.

I have a work-in-progress to fix this properly, but I didn't want to burn
anybody with this yet.


120367 23-Sep-2003 peter

Fix patch transcription typo. s/IDT_BPT/IDT_BP/


120365 23-Sep-2003 peter

Sync with i386 version. The quality initialization was missing and some
other junk.


120363 23-Sep-2003 peter

GC unused child variable


120362 23-Sep-2003 peter

MFi386 pci_bus.c 1.102 legacyvar.h 1.4: rename nexus_pcib to legacy_pcib

However, leave legacy_pcib_route_interrupt() since there is no pcibios to
call.


120360 22-Sep-2003 peter

Move basemem variable into global scope so that the MP startup code can
refer to it for looking for tables.


120358 22-Sep-2003 peter

Increase the default data size limit from 512MB to 8GB. Increase default
stack limit from 64MB to 512MB.


120357 22-Sep-2003 peter

MFi386 rev 1.51 by scottl: make dflt_lock() always panic.


120356 22-Sep-2003 peter

MFi386 rev 1.53 by scottl: Allocate the S/G list in the tag, not on
the stack. This means that s/g lists can be arbitrarily long.


120355 22-Sep-2003 peter

MFi386 machdep.c rev 1.201, clock.c 1.201, clock.h 1.45 by phk: Dont
initialize a TSC timecounter until we know if it is broke or not.

XXX I think there is a bug in the i386 code here. init_TSC_tc() comes
after:
if (statclock_disable)
return;

ie: if you turn off the statclock interrupt, you dont get the TSC either.


120354 22-Sep-2003 peter

MFi386 rev 1.105 by jhb: fix comment typo


120353 22-Sep-2003 peter

MFi386 rev 1.256 by jhb: remove redundant #include <sys/sysctl.h>


120352 22-Sep-2003 peter

MFi386 rev 1.25 by jhb: add new MSR's and some missing older ones and
APICBASE MSR constants.


120351 22-Sep-2003 peter

MFi386 rev 1.55 by sam: remove unused #define BUS_DMAMAP_NSEGS


120350 22-Sep-2003 peter

MFi386 rev 1.37: constant-friendly bswap macros


120349 22-Sep-2003 peter

MFi386: pci_cfgreg.h rev 1.10 by jhb/des/njl. Fix CONF1_ENABLE_MSK.


120348 22-Sep-2003 peter

MFCi386: trap.c rev 1.257 by bde. Don't forget to reenable interrupts
for breakpoint and trace traps from usermode. Although all the setidt
entries are interrupt gates on amd64, all but the trace and bpt trap
entry handlers reenable interrupts after the swapgs instruction in order
to simulate the trap/interrupt gate distinction. In other words, the
amd64 code behaves the same way that i386 does here.


120347 22-Sep-2003 peter

MFi386 by jhb: add acpi_SetDefaultIntrModel();


120346 22-Sep-2003 peter

MFi386 by jhb: use symbolic constants for the IDT entries.


120345 22-Sep-2003 peter

MFi386: machdep.c:1.570 clock.c:1.204 by bde: Quick fix for calling DELAY
for ddb input in some atkbd-based console drivers. ddb must not use any
normal locks but DELAY() normally calls getit() which needs clock_lock.
This also removes the need for recursion on clock_lock.


120243 19-Sep-2003 joerg

Mention the puc(4) glue driver in a commented-out example so the user
of "dumb" PCI-based serial/parallel boards get a hint how to enable
them.

I wasn't sure about the ia64, pc98, powerpc, and sparc64 archs whether
they'd support puc(4) or not.


120106 15-Sep-2003 obrien

Statically compile in sound as we don't have modules yet.


120040 13-Sep-2003 alc

Simplify (and micro-optimize) pmap_unuse_pt(): Only one caller,
pmap_remove_pte(), passed NULL instead of the required page table
page to pmap_unuse_pt(). Compute the necessary page table page
in pmap_remove_pte(). Also, remove some unreachable code from
pmap_remove_pte().


119999 12-Sep-2003 alc

Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from: tegge


119960 10-Sep-2003 obrien

Sort 'bge' correctly.


119941 10-Sep-2003 jhb

Remove an XXX comment by using the per CPU mask added after this comment
was added.


119938 10-Sep-2003 jhb

Fix a typo.


119924 09-Sep-2003 peter

Clean up get/set_mcontext() and get/set_fpcontext(). These are operated
on data structures on the kernel stack which are guaranteed to be 16 byte
aligned by gcc, the amd64 ABI and __aligned(16).

Ensire the tss_rsp0 initial stack pointer is 16 byte aligned in case
sizeof(pcb) becomes odd at some point. This is convenient for the
interrupt handler case because the ring crossing pushes cause the
required odd alignment before the call to the C code.

Have fast_syscall add an additional 8 bytes to ensure that the trapframe
has the correct odd alignment for the call to C code. Note that there are
no checks to make sure that the trapframe size is appropriate for this.

This makes get/setfpcontext work properly (finally). You get a GPF in
kernel mode if any of this is botched without the alignment fixup code
that is apparently needed on i386.


119894 08-Sep-2003 peter

Turn aac back on now that its been cleaned up for 64 bit compilation


119889 08-Sep-2003 peter

Argh. This file was completely out of sync with mcontext/trapframe.


119888 08-Sep-2003 peter

Hmm. Two copies of the mcontext...


119869 08-Sep-2003 alc

Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by: tegge


119868 08-Sep-2003 wpaul

Take the support for the 8139C+/8169/8169S/8110S chips out of the
rl(4) driver and put it in a new re(4) driver. The re(4) driver shares
the if_rlreg.h file with rl(4) but is a separate module. (Ultimately
I may change this. For now, it's convenient.)

rl(4) has been modified so that it will never attach to an 8139C+
chip, leaving it to re(4) instead. Only re(4) has the PCI IDs to
match the 8169/8169S/8110S gigE chips. if_re.c contains the same
basic code that was originally bolted onto if_rl.c, with the
following updates:

- Added support for jumbo frames. Currently, there seems to be
a limit of approximately 6200 bytes for jumbo frames on transmit.
(This was determined via experimentation.) The 8169S/8110S chips
apparently are limited to 7.5K frames on transmit. This may require
some more work, though the framework to handle jumbo frames on RX
is in place: the re_rxeof() routine will gather up frames than span
multiple 2K clusters into a single mbuf list.

- Fixed bug in re_txeof(): if we reap some of the TX buffers,
but there are still some pending, re-arm the timer before exiting
re_txeof() so that another timeout interrupt will be generated, just
in case re_start() doesn't do it for us.

- Handle the 'link state changed' interrupt

- Fix a detach bug. If re(4) is loaded as a module, and you do
tcpdump -i re0, then you do 'kldunload if_re,' the system will
panic after a few seconds. This happens because ether_ifdetach()
ends up calling the BPF detach code, which notices the interface
is in promiscuous mode and tries to switch promisc mode off while
detaching the BPF listner. This ultimately results in a call
to re_ioctl() (due to SIOCSIFFLAGS), which in turn calls re_init()
to handle the IFF_PROMISC flag change. Unfortunately, calling re_init()
here turns the chip back on and restarts the 1-second timeout loop
that drives re_tick(). By the time the timeout fires, if_re.ko
has been unloaded, which results in a call to invalid code and
blows up the system.

To fix this, I cleared the IFF_UP flag before calling ether_ifdetach(),
which stops the ioctl routine from trying to reset the chip.

- Modified comments in re_rxeof() relating to the difference in
RX descriptor status bit layout between the 8139C+ and the gigE
chips. The layout is different because the frame length field
was expanded from 12 bits to 13, and they got rid of one of the
status bits to make room.

- Add diagnostic code (re_diag()) to test for the case where a user
has installed a broken 32-bit 8169 PCI NIC in a 64-bit slot. Some
NICs have the REQ64# and ACK64# lines connected even though the
board is 32-bit only (in this case, they should be pulled high).
This fools the chip into doing 64-bit DMA transfers even though
there is no 64-bit data path. To detect this, re_diag() puts the
chip into digital loopback mode and sets the receiver to promiscuous
mode, then initiates a single 64-byte packet transmission. The
frame is echoed back to the host, and if the frame contents are
intact, we know DMA is working correctly, otherwise we complain
loudly on the console and abort the device attach. (At the moment,
I don't know of any way to work around the problem other than
physically modifying the board, so until/unless I can think of a
software workaround, this will have do to.)

- Created re(4) man page

- Modified rlphy.c to allow re(4) to attach as well as rl(4).

Note that this code works for the sample 8169/Marvell 88E1000 NIC
that I have, but probably won't work for the 8169S/8110S chips.
RealTek has sent me some sample NICs, but they haven't arrived yet.
I will probably need to add an rlgphy driver to handle the on-board
PHY in the 8169S/8110S (it needs special DSP initialization).


119779 05-Sep-2003 peter

Oops. sizeof(long) = 8, not 4. Get the fxsave buffer inside mcontext
the right size. I'm planning on *possibly* stealing the two 'spare'
variables on either side for botched alignment correction.


119703 03-Sep-2003 obrien

MFi386: add device ataraid, this is now seperate and not pulled in by atadisk.


119628 01-Sep-2003 kan

Standardize idempotentcy ifdefs. Consistently use _MACHINE_VARARGS_H_
symbol.


119563 29-Aug-2003 alc

Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.


119539 28-Aug-2003 jhb

- Rename PCIx_HEADERTYPE* to PCIx_HDRTYPE* so the constants aren't so long.
- Add a new PCIM_HDRTYPE constant for the field in PCIR_HDRTYPE that holds
the header type.
- Replace several magic numbers with appropriate constants for the header
type register and a couple of PCI_FUNCMAX.
- Merge to amd64 the fix to the i386 bridge code to skip devices with
unknown header types.

Requested by: imp (1, 2)


119531 28-Aug-2003 njl

Minor style cleanups.


119452 25-Aug-2003 obrien

Fix copyright comment & FBSDID style nits.

Requested by: bde


119399 24-Aug-2003 alc

Eliminate the last (direct) uses of vm_page_lookup() on the pte object.


119340 23-Aug-2003 peter

AMD64 mtrr driver.


119336 23-Aug-2003 peter

Switch to using the emulator in the common compat area.
Still work-in-progress.


119334 22-Aug-2003 peter

Initial sweep at dividing up the generic 32bit-on-64bit kernel support
from the ia32 specific stuff. Some of this still needs to move to the MI
freebsd32 area, and some needs to move to the MD area. This is still
work-in-progress.


119291 22-Aug-2003 imp

Prefer new location of pci include files (which have only been in the
tree for two or more years now), except in a few places where there's
code to be compatible with older versions of FreeBSD.


119194 21-Aug-2003 peter

Regen


119193 21-Aug-2003 peter

This is too funny for words. Swap syscalls 416 and 417 around. It works
better that way when sigaction() and sigreturn() do the right thing.


119158 20-Aug-2003 alc

- Lock the pte object when performing vm_page_grab().
- Insure that the page table page is zero filled before adding it
to the page table.


119015 17-Aug-2003 gordon

Fixup the ELF branding information to point to the new home of rtld.


119006 17-Aug-2003 alc

In pmap_copy(), since we have the page table page's physical address
in hand, use PHYS_TO_VM_PAGE() rather than vm_page_lookup().


119004 16-Aug-2003 marcel

In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.


118990 16-Aug-2003 marcel

Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.

ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).

alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.

powerpc: MD prototypes left in cpu.h. Comment added.

Suggested by: bde
Tested with: make universe (pc98 incomplete)


118983 16-Aug-2003 alc

Eliminate pmap_page_lookup() and its uses. Instead, use PHYS_TO_VM_PAGE()
to convert the pte's physical address into a vm page.

Reviewed by: peter


118953 15-Aug-2003 jhb

- Fix a duplicated typo.
- Add a macro for the logical shift needed to extract an APIC ID from
either from the local APIC ICR Hi register or the APIC ID registers of
the local and IO APICs.


118848 12-Aug-2003 imp

Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon


118832 12-Aug-2003 ps

Halted CPU's should not accumulate time.

Reviewed by: jhb


118741 10-Aug-2003 alc

Rename pmap_changebit() to pmap_clear_ptes() and remove the last
parameter. The new name better reflects what the function does and
how it is used. The last parameter was always FALSE.

Note: In theory, gcc would perform constant propagation and dead code
elimination to achieve the same effect as removing the last parameter,
which is always FALSE. In practice, recent versions do not. So, there
is little point in letting unused code pessimize execution.


118641 08-Aug-2003 alc

MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry().


118451 04-Aug-2003 scottl

In _bus_dmamap_load_buffer(), only count the number of bounce pages needed if
they haven't been counted before. This test was ommitted when bus_dmamap_load()
was merged into this function, and results in the pagesneeded field growing
without bounds when multiple deferrals happen.

Thanks to Paul Saab for beating his head against this for a few hours =-)


118443 04-Aug-2003 jhb

- Since td_critnest is now initialized in MI code, it doesn't have to be
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.


118365 02-Aug-2003 alc

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in pmap_mapdev().
See revision 1.140 of kern/sys_pipe.c for a detailed rationale.

Submitted by: tegge


118331 02-Aug-2003 peter

Fix a dumbass mistake. I had the 'set' and 'get' reversed in the
fpsetround/fpgetround macro pairs.


118244 31-Jul-2003 bmilekic

Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.


118236 31-Jul-2003 peter

KSTACK_PAGES is a global option.


118235 31-Jul-2003 peter

Cosmetic: fix disorder of opt_kstack_pages.h include.


118156 29-Jul-2003 davidxu

Use PSL_KERNEL as upcall thread's initial rflags, don't use
scratch user rflags.


118081 27-Jul-2003 mux

- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.


118031 25-Jul-2003 obrien

Use __FBSDID().

Brought to you by: a boring talk at Ottawa Linux Symposium


118030 25-Jul-2003 obrien

Use __FBSDID().

Brought to you by: a boring talk at OLS


118024 25-Jul-2003 alc

MFi386 revision 1.416
Add vm object locking to pmap_prefault().

Note: powerpc and sparc64 do not implement this function.


117985 25-Jul-2003 davidxu

Align upcall stack top to odd times of 8. GCC accounts return address
in callee function for stack alignment.


117962 24-Jul-2003 davidxu

Implement cpu_set_upcall and cpu_set_upcall_kse.

Reviewed by: peter


117961 24-Jul-2003 davidxu

Set fault address to si_addr.

Reviewed by: peter


117943 23-Jul-2003 peter

Make the breakpoint instruction trap gate available to users.
ptrace() needs this.

Submitted by: Mark Kettenis <kettenis@chello.nl>


117942 23-Jul-2003 peter

Set the %gs base to pcb_gsbase, not pcb_fsbase. Oops.

Discovered by: davidxu


117929 23-Jul-2003 alc

Annotate pmap_changebit() as __always_inline. This function was
written as a template that when inlined is specialized for the caller
through constant value propagation and dead code elimination. Thus,
the specialized code that is generated for pmap_clear_reference() et
al. avoids several conditional branches inside of a loop.


117928 23-Jul-2003 jhb

Use macros from apic.h to when writing to the ICR to send IPIs to startup
APs rather than magic numbers.

Tested by: scottl


117927 23-Jul-2003 jhb

Add a new macro APIC_ICRLO_RESV_MASK that contains all of the reserved
fields in the low 32 bits of the local APIC ICR register. Use this macro
in place of APIC_RESV2_MASK when masking off existing bits from the ICR
when writing to it to send an IPI.

Tested by: scottl


117865 22-Jul-2003 peter

Go back to 64 bit precision for fadd/fsub/fsqrt etc. This is because on
AMD64, gcc (and the ABI) expects the x87 unit to be running in 80/64
mode (not 64/53) so that it can use it for 'long double' operations. It
takes the expected precision differences into account when generating
code.


117863 22-Jul-2003 peter

Extend the machine/ieeefp.h that was inherited from i386 to support
the SSE mxcsr register as well. Since gcc will intermix SSE2 and x87
FP code, the fpsetround() etc mode had better be the same.

There are hooks to enable these inlines to be instantiated inside libc
for non-gcc or C++ callers. (g++ doesn't like the inlines that tried
to extract an integer and convert it to an enum).


117600 15-Jul-2003 davidxu

Rename thread_siginfo to cpu_thread_siginfo.

Suggested by: jhb


117385 10-Jul-2003 markm

Protect lint(1) from a #error.


117372 10-Jul-2003 peter

unifdef -DLAZY_SWITCH and start to tidy up the associated glue.


117370 09-Jul-2003 peter

Fix the VADDR() macros to use either KVADDR() or UVADDR(), depending
on the implied sign extension. The single unified VADDR() macro was
not able to avoid sign extending the VM_MAXUSER_ADDRESS/USRSTACK values.
Be explicit about UVADDR() (positive address space) and KVADDR()
(kernel negative address space) to make mistakes show up more
spectacularly.

Increase user VM space from 1/2TB (512GB) to 128TB.


117369 09-Jul-2003 peter

Fix up bogus index/offset/mask calculations in the allocpte and the
corresponding release code. This was preventing the use of more than
1/2TB of user VM. I also spent a week staring at this code only to
eventually find that I'd mistakenly typed a P as an R.


117368 09-Jul-2003 peter

Turn the 2MB page mappings that cover the kernel text+data+bss area back
on now that pmap_pte() can handle it. I never actually ran into anything
that broke that I know of, but this was turned off as a precaution.


117367 09-Jul-2003 peter

Have pmap_pte() on a 2MB mapped address return the 2MB pde itself
rather than a non-existing pte. There is code elsewhere in i386/amd64
pmap that neglects to handle the large page cases because it knows that
it will see PG_PS in the returned "pte".


117340 08-Jul-2003 alc

In pmap_object_init_pt(), the pmap_invalidate_all() should be performed on
the caller-provided pmap, not the kernel_pmap. Using the kernel_pmap
results in an unnecessary IPI for TLB shootdown on SMPs.

Reviewed by: jake, peter


117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


117136 01-Jul-2003 mux

Sync more things with other backends.


117129 01-Jul-2003 mux

Honor the boundary of the busdma tag when allocating bounce pages.
This was fixed in revision 1.5 of alpha/alpha/busdma_machdep.c and
was never fixed in other busdma backends using bounce pages.


117126 01-Jul-2003 scottl

Mega busdma API commit.

Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.

sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.

If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.

Reviewed by: tmm, gibbs


117045 29-Jun-2003 alc

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


117006 28-Jun-2003 jeff

- Construct a cpu topology map for Hyper Threading systems so that ULE may
take advantage of them.


116958 28-Jun-2003 davidxu

Add a machine depended function thread_siginfo, SA signal code
will use the function to construct a siginfo structure and use
the result to export to userland.

Reviewed by: julian


116947 28-Jun-2003 scottl

Catch amd64 up with the pending busdma async callback locking. Though this
mechanism might change in the near future, it's best to keep everything in
sync right now.

Reminded by: peter


116932 27-Jun-2003 peter

Turn ips back on.


116868 26-Jun-2003 peter

Oops, I only added a comment about why ips doesn't compile. Actually
comment it out for real.


116863 26-Jun-2003 peter

Sync with i386 - add everything that compiles. There are a few drivers
that are trivially easy to fix (eg: ips) that I've not committed fixes for.


116856 26-Jun-2003 peter

Add back in the ability for pmap_mapdev() to use KVM if the region
being requested is outside of the range of the direct map region. eg:
for pci windows. While here, increase the minimum size of the direct
map region to be 4GB instead of 1GB.


116709 23-Jun-2003 alc

MFi386
Add vm object locking to pmap_object_init_pt().


116685 22-Jun-2003 simokawa

Move KERNBASE to -2GB.
Currently, we cannot increase KVA more than 2GB.


116684 22-Jun-2003 simokawa

- Allow access to direct mapped region via /dev/kmem. This makes
'netstat -r' work.
- Use direct map for /dev/mem.


116683 22-Jun-2003 simokawa

- Allocate a new PD Table if kernel grows beyond 1GB boundary.
Reviewed by: peter

- Use direct map in pmap_mapdev().


116619 20-Jun-2003 simokawa

Use direct map in pmap_map().

This saves much KVA for vm_pages and you don't need to increase NKPT
for large physical memory anymore.

Suggested by: dfr


116579 19-Jun-2003 simokawa

Fix direct map page table for 2GB+ physical memory.

You may still need to increase NKPT for larger memory.
I have successfully booted 8GB system with NKPT=256.


116510 18-Jun-2003 alc

Fix a performance bug in all of the various implementations of
uma_small_alloc(): They always zeroed the page regardless of what the
caller requested.


116361 15-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


116328 14-Jun-2003 alc

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


116188 11-Jun-2003 peter

GC unused cpu_wait() function


115907 06-Jun-2003 jhb

- Use IDTVEC() to declare IPI handlers since they are also IDT vectors.
- Make handlers for IPI's used by SMP kernels #ifdef SMP.


115905 06-Jun-2003 jhb

- Document the thermal and performance counter LVT entries in the local
APIC.
- Add a lvt_thermal member to the LAPIC struct.
- Add constants for the SMI and INIT LVT delivery modes.


115861 04-Jun-2003 marcel

Change the second (and last) argument of cpu_set_upcall(). Previously
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.

Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.

This change officially marks the ability on ia64 for 1:1 threading.

Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64


115795 04-Jun-2003 peter

Fix ALIGNED_POINTER(). sizeof((u_int32_t)) is not legal C.


115736 02-Jun-2003 peter

Fix restarted syscalls. When we rewind %rip, we also need to restore
all the argument registers etc since we have almost certainly have trashed
them by now. Take particular car of %r10 since it held the original value
of %rcx (which we saved in tf_rcx on entry and doreti doesn't know this).


115734 02-Jun-2003 peter

Make this more compatable with libc_r. Make the internal types for storing
registers an array of longs rather than int.


115703 02-Jun-2003 obrien

Use __FBSDID().


115683 02-Jun-2003 obrien

Use __FBSDID().


115678 02-Jun-2003 peter

MFi386: i386/include/asm.h rev 1.11: Do not abuse ##.


115659 02-Jun-2003 obrien

Use C99 compatable asm statements.


115636 01-Jun-2003 obrien

Sync with i386/GENERIC ordering.


115577 31-May-2003 peter

MFi386: rev 1.56: remove break after return


115576 31-May-2003 peter

MFi386: rev 1.23: use gdb_strlen()/gdb_strcpy() directly.


115575 31-May-2003 peter

MFi386: rev 1.50: remove unused variable


115546 31-May-2003 phk

Avoid unbalancing the { } count in the source file with #ifdef by
putting the opening { after the #ifdef ... #endif sequence.

Found by: FlexeLint


115432 31-May-2003 peter

Add acpi to the build. Remove the hack from machdep.c that lies to the
loader to shut it up.


115431 31-May-2003 peter

Have hammer_time() return the proc0 stack location, and have locore
switch to it before calling mi_startup(). The bootstack is WAY too small
for running acpica during probe/attach. While here, pass modulep/physfree
to the startup routine, rather than writing to the global variables in
locore.S.

Approved by: re (amd64/*)


115430 31-May-2003 peter

Regenerate.


115429 31-May-2003 peter

Make this compile with WITNESS enabled. It wants the syscall names.


115428 31-May-2003 peter

Port acpica to amd64.

Approved by: re (amd64/* blanket)


115426 31-May-2003 peter

With the help of jhb, fix the ACPI_ACQUIRE_GLOBAL_LOCK() macros and
port to amd64 after repocopy.

Approved by: re (amd64/*)


115416 30-May-2003 hmp

Rename BUS_DMAMEM_NOSYNC to BUS_DMA_COHERENT.

The current name is confusing, because it indicates to
the client that a bus_dmamap_sync() operation is not
necessary when the flag is specified, which is wrong.

The main purpose of this flag is to hint the underlying
architecture that DMA memory should be mapped in a coherent
way, but the architecture can ignore it. But if the
architecture does supports coherent mapping of memory, then
it makes bus_dmamap_sync() calls cheap.

This flag is the same as the one in NetBSD's Bus DMA.

Reviewed by: gibbs, scottl, des (implicitly)
Approved by: re@ (jhb)


115404 30-May-2003 peter

Nasty 'make it compile' port to amd64. Note that it needs some other
wire protocol for the extra registers. I should probably just remove it
from here for now since its quite useless.

Approved by: re (amd64/* blanket)


115403 30-May-2003 peter

Initial port to amd64 after repocopy from i386. Note that the
disassembler has not been updated yet, and will do some very strange
things. It does tracebacks (without function arguments due to regparm
calling conventions) if -fno-omit-frame-pointer is used (to come later).
This achieves basic functionality.

Approved by: re (amd64/* blanket)


115402 30-May-2003 peter

Add setjmp/longjmp for ddb


115358 27-May-2003 peter

Update AMD Features vector to include NX (page table entry no-execute bit)
and LM (long mode) etc.


115343 27-May-2003 scottl

Bring back bus_dmasync_op_t. It is now a typedef to an int, though the
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.

Approved by: re (bmah)


115316 26-May-2003 scottl

De-orbit bus_dmamem_alloc_size(). It's a hack and was never used anyways.
No need for it to pollute the 5.x API any further.

Approved by: re (bmah)


115283 24-May-2003 peter

Stop profiled libc from exploding, matching gcc's generated code.

Approved by: re (amd64/* blanket)


115257 23-May-2003 peter

Typo fix. oops.

Submitted by: jmallett
Approved by: re (blanket amd64/*)


115256 23-May-2003 peter

Update comments. Note that the kernel is at -1GB, not -2GB as erroniously
implied by the previous commit. KVM is still only 1GB until
pmap_growkernel() learns about the extra page table level.

Approved by: re (blanket)


115255 23-May-2003 peter

As suggested by the gdb folks, pad the 'struct fpreg' to a full 512 bytes
to match the native fxsave/fxrstor object size since thats apparently what
the Linux/NetBSD folks do.


115252 23-May-2003 peter

Deal with the user VM space expanding. 32 bit applications do not like
having their stack at the 512GB mark. Give 4GB of user VM space for 32
bit apps. Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).

Approved by: re (blanket)


115251 23-May-2003 peter

Major pmap rework to take advantage of the larger address space on amd64
systems. Of note:
- Implement a direct mapped region using 2MB pages. This eliminates the
need for temporary mappings when getting ptes. This supports up to
512GB of physical memory for now. This should be enough for a while.
- Implement a 4-tier page table system. Most of the infrastructure is
there for 128TB of userland virtual address space, but only 512GB is
presently enabled due to a mystery bug somewhere. The design of this
was heavily inspired by the alpha pmap.c.
- The kernel is moved into the negative address space(!).
- The kernel has 2GB of KVM available.
- Provide a uma memory allocator to use the direct map region to take
advantage of the 2MB TLBs.
- Fixed some assumptions in the bus_space macros about the ability
to fit virtual addresses in an 'int'.

Notable missing things:
- pmap_growkernel() should be able to grow to 512GB of KVM by expanding
downwards below kernbase. The kernel must be at the top 2GB of the
negative address space because of gcc code generation strategies.
- need to fix the >512GB user vm code.

Approved by: re (blanket)


115237 22-May-2003 peter

Merge from i386/trap.c rev 1.252. Use td_critnest instead of the
spinlocks count for explicitly enabling interrupts.

Approved by: re (blanket)


115164 19-May-2003 kan

sys/sys/limits.h:

- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).

sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:

- Style fixes.

Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)


115093 17-May-2003 peter

Actually get all the bits for sd_hibase.. it was 16 bits short. oops.

Approved by: re (amd64/* blanket)


115016 15-May-2003 alc

Initialize logical_cpus_mask when the logical CPUs are enumerated in
the mptable. (Previously, logical_cpus_mask was only initialized if
the hyperthreading fixup was executed.)

Approved by: re (jhb)
Reviewed by: ps


115006 15-May-2003 peter

Collect the nastiness for preserving the kernel MSR_GSBASE around the
load_gs() calls into a single place that is less likely to go wrong.

Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.

Approved by: re (amd64/* blanket)


115005 15-May-2003 peter

Use compile time constants for things like PTmap[] etc because they're
about to move outside of the +/- 2GB range

Suggested by: jake
Approved by: re (amd64/* blanket)


114988 14-May-2003 peter

Regen

Approved by: re (amd64 blanket)


114987 14-May-2003 peter

Add BASIC i386 binary support for the amd64 kernel. This is largely
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.

Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.

On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.

Approved by: re (amd64/* blanket)


114986 14-May-2003 peter

Fix some misunderstandings about 64 bit extension.
Fix fuword/suword - they're supposed to be 'long' - ie: point them
at fuword64/suword64 instead of the incorrect 32 bit versions.


114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


114953 12-May-2003 peter

Really stop the loader from trying to load the acpi module by lying and
pretending that it is already here.

Approved by: re (amd64/* stuff)


114952 12-May-2003 peter

For the page fault handler, save %cr2 in the outer trap handler so that
we do not have to run so long with interrupts disabled. This involved
creating tf_addr in the trapframe. Reorganize the trap stubs so that
they consistently reserve the stack space and initialize any missing
bits.

Approved by: re (amd64 stuff)


114951 12-May-2003 peter

Sync ucontext with reality. The struct trapframe changes need to be
reflected here.

Approved by: re (blanket amd64/*)


114930 12-May-2003 peter

AMD64 physical space is much larger than i386, de-i386 the bus_space and
bus_dma MD code for AMD64. (And a trivial ifdef update in dev/kbd because
of this). More updates are needed here to take advantage of the 64 bit
instructions.

Approved by: re (blanket amd64/*)


114928 12-May-2003 peter

Give a %fs and %gs to userland. Use swapgs to obtain the kernel %GS.base
value on entry and exit. This isn't as easy as it sounds because when
we recursively trap or interrupt, we have to avoid duplicating the
swapgs instruction or we end up back with the userland %gs. I implemented
this by testing TF_CS to see if we're coming from supervisor mode
already, and check for returning to supervisor. To avoid a race with
interrupts in the brief period after beginning executing the handler and
before the swapgs, convert all trap gates to interrupt gates, and reenable
interrupts immediately after the swapgs. I am not happy with this.
There are other possible ways to do this that should be investigated.
(eg: storing the GS.base MSR value in the trapframe)

Add some sysarch functions to let the userland code get to this.

Approved by: re (blanket amd64/*)


114923 11-May-2003 peter

Call it an AMD64 Processor, not a Hammer. Also, it seems that the cpuid
model numbers are wider than I first thought.

Approved by: re (blanket amd64/*)


114922 11-May-2003 peter

I missed another printf format error while extracting the patch.

Approved by: re (blanket amd64/*)


114921 11-May-2003 peter

Make atdevbase long for the KERNBASE > 4GB case

Approved by: re (amd64/* blanket)


114919 11-May-2003 peter

Fix printf format errors that were undetected due to using the standard
FSF compiler during early development.


114918 11-May-2003 peter

Export PML4SHIFT and PDPSHIFT

Approved by: re (blanket amd64/*)


114917 11-May-2003 peter

Since compiling natively, the compile environment has been less forgiving
about silly typos. Use the correct comment sequences.


114870 10-May-2003 peter

Provide a fake varargs implementation for lint's benefit. This way
it can see the intent of the va_* macros, even though it cannot work.

Approved by: re (blanket amd64/*)


114869 10-May-2003 peter

Remove _ARCH_INDIRECT ifdefs. They existed for lib/msun/* on i386, which
could use different versions of the math code depending on whether there
was real floating point hardware or math emulation. Since the fpu is
part of the core specification on amd64, there is no need for this here.

Approved by: re (blanket amd64/*)


114868 10-May-2003 peter

bcopyb() isn't used on amd64 kernel (it only exists for i386/pcvt)

Approved by: re (blanket amd64/*)


114867 10-May-2003 peter

Finish translating i386/support.s into amd64 asm - replace bcopy etc with
asm versions. This yields about a 5% kernel compile time speedup.


114859 09-May-2003 peter

Include the MXCSR initial values, based on the AMD docs. This file
should really be renamed to fpu.h and npx.c to fpu.c since its part of
the core architecture on amd64 systems, not an isa 'numeric processor
extension'.


114858 09-May-2003 peter

Turn syscons on now that it works, so that anybody trying to run this
can see something. Probing for keyboard still works for auto serial
console mode.


114837 08-May-2003 peter

Oops. Turn T_PAGEFLT back into an interrupt gate. It is *critical*
that interrupts be disabled and remain disabled until %cr2 is read.
Otherwise we can preempt and another process can fault, and by the
time we read %cr2, we see a different processes fault address. This
Greatly Confuses vm_fault() (to say the least). The i386 port has
got this marked as a bug workaround for a Cyrix CPU, which is what
lead me astray. Its actually necessary for preemption, regardless
of whether Cyrix cpus had a bug or not.


114821 08-May-2003 peter

Leave space for the 128 byte red-zone on the stack.


114820 08-May-2003 peter

#include <machine/metadata.h> was missing; add it


114819 08-May-2003 peter

Fix a preemption race. I was reenabling interrupts in the fast system
call handler before it was safe. It was possible for to lose context
and for something else to clobber the PCPU scratch variable. This
moves the interrupt enable *way* too late, but its better safe than
sorry for the moment.


114803 07-May-2003 jhb

Style nits.

Approved by: re (bmah)


114678 04-May-2003 kan

Style fixes.
Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they
were marked for deprecation ever since SUSv1 at least.
Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is
supported.
Restore a lost comment in MI _limits.h file and remove it from
sys/limits.h where it does not belong.


114560 03-May-2003 peter

Repocopy *.s to *.S


114381 01-May-2003 peter

I changed the numbering of the MODINFOMD_SMAP during the commit, so
recognize the old number for my development boxes so I can use old
loader/pxeboot for a while if I need to.


114373 01-May-2003 peter

Slight reorg and added AMD64 support. A couple of the MODINFOMD_* values
that were added to sparc64 and later powerpc, really should have been in
the MI area. But changing that now with insufficient preperation will
just cause too much pain.

Move MD_FETCH() to the MI sys/linker.h file to avoid another two copies
of it.


114349 01-May-2003 peter

Commit MD parts of a loosely functional AMD64 port. This is based on
a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to
attempt to get a stable base to start from. There is a lot missing still.
Worth noting:
- The kernel runs at 1GB in order to cheat with the pmap code. pmap uses
a variation of the PAE code in order to avoid having to worry about 4
levels of page tables yet.
- It boots in 64 bit "long mode" with a tiny trampoline embedded in the
i386 loader. This simplifies locore.s greatly.
- There are still quite a few fragments of i386-specific code that have
not been translated yet, and some that I cheated and wrote dumb C
versions of (bcopy etc).
- It has both int 0x80 for syscalls (but using registers for argument
passing, as is native on the amd64 ABI), and the 'syscall' instruction
for syscalls. int 0x80 preserves all registers, 'syscall' does not.
- I have tried to minimize looking at the NetBSD code, except in a couple
of places (eg: to find which register they use to replace the trashed
%rcx register in the syscall instruction). As a result, there is not a
lot of similarity. I did look at NetBSD a few times while debugging to
get some ideas about what I might have done wrong in my first attempt.


114346 30-Apr-2003 peter

Repocopy from x86_64/... to amd64/...
Rename visible x86_64 references to amd64.
Kill MID_MACHINE, its a.out specific, the only platform that supports it
is i386. All of the other platforms should remove it too.


114305 30-Apr-2003 jhb

Range check the syscall number before looking it up in the syscallnames[]
array.

Submitted by: pho


114293 30-Apr-2003 markm

Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.


114291 30-Apr-2003 markm

Warns fixing. Protect against inappropriate linting, and mark
GCC-specific assemble code as such (in #ifdefs). Fix an easy
static variable warning while I'm here.


114216 29-Apr-2003 kan

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


114177 28-Apr-2003 jake

Use inlines for loading and storing page table entries. Use cmpxchg8b for
the PAE case to ensure idempotent 64 bit loads and stores.

Sponsored by: DARPA, Network Associates Laboratories


114029 25-Apr-2003 jhb

- Push down Giant into the sysarch() calls that still need Giant.
- Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is
invalid.


114017 25-Apr-2003 jhb

Regen.


114016 25-Apr-2003 jhb

Oops, the thr_* and jail_attach() syscall entries should be NOPROTO rather
than STD.


114013 25-Apr-2003 jake

Remove harmless invalid cast.

Sponsored by: DARPA, Network Associates Laboratories


113998 25-Apr-2003 deischen

Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared. The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets. Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code. The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by: jake
Reviewed by: bde (months ago)


113989 24-Apr-2003 jhb

Regen.


113987 24-Apr-2003 jhb

Fix the thr_create() entry by adding a trailing \. Also, sync up the
MP safe flag for thr_* with the main table.


113953 24-Apr-2003 davidxu

Don't print anything for fault at cpu_switch_load_gs, just like other
code to recover fault in doreti because of invalid segment registers,
silently push error to userland.


113941 23-Apr-2003 kan

Add a new sys/limits.h file which in turn depends on machine/_limits.h
to get actual constant values. This is in preparation for machine/limits.h
retirement.

Discussed on: standards@
Submitted by: Craig Rodrigues <rodrigc@attbi.com> (*)
Modified by: kan


113859 22-Apr-2003 jhb

- Replace inline implementations of sigprocmask() with calls to
kern_sigprocmask() in the various binary compatibility emulators.
- Replace calls to sigsuspend(), sigaltstack(), sigaction(), and
sigprocmask() that used the stackgap with calls to the corresponding
kern_sig*() functions instead without using the stackgap.


113845 22-Apr-2003 davidxu

Move down intr level testing code a bit, cpu_switch_load_gs fault can be at
interrupt nested time.


113843 22-Apr-2003 davidxu

Fix some problems for cpu_switch_load_gs. when fault address is at
cpu_switch_load_gs, cpu is in context switch, so don't enable interrupt.
because it is in context switch, it is expected sched_lock was held,
so don't PROC_LOCK(p) and psignal, it is LOR, probably we can
set a P_XSIGBUS like flag in p_sflags, and set TDF_ASTPENDING in
td_flags, in ast(), post a SIGBUS to process if P_XSIGBUS was set.


113833 22-Apr-2003 davidxu

Remove single threading detecting code, these code really should be
replaced by thread_user_enter(), but current we don't want to enable
this in trap.


113803 21-Apr-2003 simokawa

Add FireWire drivers to GENERIC.


113796 21-Apr-2003 davidxu

Reset pcb_gs and %gs before possibly invalidating it.


113757 20-Apr-2003 wpaul

Add device driver support for the ASIX Electronics AX88172 USB 2.0
ethernet controller. The driver has been tested with the LinkSys
USB200M adapter. I know for a fact that there are other devices out
there with this chip but don't have all the USB vendor/device IDs.

Note: I'm not sure if this will force the driver to end up in the
install kernel image or not. Special magic needs to be done to exclude
it to keep the boot floppies from bloating again, someone please
advise.


113728 20-Apr-2003 davidxu

Backout my last commit.

Requested by: bde


113704 19-Apr-2003 davidxu

Don't return garbage in high 16 bits.


113686 18-Apr-2003 jhb

Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().


113682 18-Apr-2003 jhb

Hold the proc lock for curproc around sigonstack().


113621 17-Apr-2003 jhb

Remove a couple of unused symbols.


113492 15-Apr-2003 mux

style(9)


113472 14-Apr-2003 simokawa

Restore delayed load support for the resource shortage case.
It was missed in the previous change.
Now, _bus_dmamap_load_buffer() accepts BUS_DMA_WAITOK/BUS_DMA_NOWAIT flags.

Original idea from: jake


113459 14-Apr-2003 simokawa

* Use _bus_dmamap_load_buffer() and respect maxsegsz in bus_dmamap_load().
Ignoring maxsegsz may lead to fatal data corruption for some devices.
ex. SBP-2/FireWire
We should apply this change to other platforms except for sparc64.

MFC after: 1 week


113364 11-Apr-2003 davidxu

Copy %gs from current CPU not from a stale PCB backup.


113363 11-Apr-2003 davidxu

set_user_ldt_rv() should check same proc not thread,
this commit fixes an user LDT smp rendezvous bug.


113348 10-Apr-2003 des

Convert the SMP_TSC kernel option into a loader tunable. Also enable
the TSC timecounter on single-CPU systems even when they are running
an SMP kernel.


113347 10-Apr-2003 mux

Change the operation parameter of bus_dmamap_sync() from an
enum to an int and redefine the BUS_DMASYNC_* constants as
flags. This allows us to specify several operations in one
call to bus_dmamap_sync() as in NetBSD.


113339 10-Apr-2003 julian

Move the _oncpu entry from the KSE to the thread.
The entry in the KSE still exists but it's purpose will change a bit
when we add the ability to lock a KSE to a cpu.


113321 10-Apr-2003 wes

Add a sysctl that records and reports the CPU clock rate calculated
at boot. Funny how often this trivial piece of information crops up
in embedded boxen.

Sponsored by: St. Bernard Software


113275 09-Apr-2003 mike

o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.

Reviewed by: rwatson, tjr


113266 08-Apr-2003 jake

Remove invalid cast to vm_offset_t to avoid truncating a physical address
when doing pmap_kextract on a 2MB page.

Spotted by: peter
Sponsored by: DARPA, Network Associates Laboratories


113228 07-Apr-2003 jake

Add support for bounce buffers to _bus_dmamap_load_buffer, which is the
backend for bus_dmamap_load_mbuf and bus_dmamap_load_uio.

- Increaes MAX_BPAGES to 512. Less than this causes fxp to quickly runs out
of bounce pages.
- Add an argument to reserve_bounce_pages indicating wether this operation
should fail or be queued for later processing if we run out of memory.
The EINPROGRESS return value is not handled properly by consumers of
bus_dmamap_load_mbuf.
- If bounce buffers are required allocate minimum 1 bounce page at map
creation time. If maxsize was small previously this could get truncated
to 0 and the drivers would quickly run out of bounce pages.
- Fix a bug handling the return value of alloc_bounce_pages at map creation
time. It returns the number of pages allocated, not 0 on success.
- Use bus_addr_t for physical addresses to avoid truncation.
- Assert that the map is non-null and not the no bounce map in
add_bounce_pages.

Sponsored by: DARPA, Network Associates Laboratories


113225 07-Apr-2003 jake

Better fix for previous previous which still allows the 4megs of kva at
the top of the address space to be reclaimed. The problem is that with
the APTD gone the mapable kernel address space runs right to the end of
the 32 bit address space. As a max this is 0x100000000, which can't be
represented in 32 bits, so we have to use ptd entry n-1 and pte offset
n-1, instead of ptd entry n and pte offset 0. There's still 1 page we
can't use, but we gain just under 4 megs of kva (8 megs with PAE).

Sponsored by: DARPA, Network Associates Laboratories


113148 05-Apr-2003 peter

Unbreak the !LAZY_SWITCH case. I #ifdef'ed too much when I added
the ifdefs prior to commit and killed the same-address-space test.

Submitted by: bde


113100 04-Apr-2003 tegge

Add SMP_TSC option, which can be used on SMP systems where the TSCs
are synchronized to reduce context switch cost.


113090 04-Apr-2003 des

Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector. Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.


113064 04-Apr-2003 jake

Bandaid fix for previous commit while I figure out why it broke. This
caused crashes early in boot on i386 UP machines.

Reported by: phk
Pointy hat to: jake


113040 03-Apr-2003 jake

- Removed APTD and associated macros, it is no longer used.

BANG BANG BANG etc.

Sponsored by: DARPA, Network Associates Laboratories


112993 02-Apr-2003 peter

Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.

Glanced at by: jake, jhb


112967 02-Apr-2003 jake

- Make casuptr return the old value of the location we're trying to update,
and change the umtx code to expect this.

Reviewed by: jeff


112908 01-Apr-2003 jeff

- Add thr and umtx system calls.


112898 01-Apr-2003 jeff

- Define a new md function 'casuptr'. This atomically compares and sets
a pointer that is in user space. It will be used as the basic primitive
for a kernel supported user space lock implementation.
- Implement this function in x86's support.s
- Provide stubs that return -1 in all other architectures. Implementations
will follow along shortly.

Reviewed by: jake


112897 01-Apr-2003 jeff

- In npxgetregs() use the td argument to save the fpu state from and not
curthread. Nothing currently depends on this behavior.
- Clean up an extra newline.

Obtained from: bde


112896 31-Mar-2003 jeff

- Add a placeholder for sigwait


112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


112886 31-Mar-2003 jeff

- Fix two calls to trapsignal() that were still passing in 'struct proc'.
These were missed in my last commit.


112883 31-Mar-2003 jeff

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


112858 31-Mar-2003 jeff

- In npxsetregs don't set the floating point if td == fpcurthread not if
curthread == fpcurthread. This is important when we're saving the fp
state for a thread other than curthread as in from set_mcontext.


112841 30-Mar-2003 jake

- Add support for PAE and more than 4 gigs of ram on x86, dependent on the
kernel opition 'options PAE'. This will only work with device drivers which
either use busdma, or are able to handle 64 bit physical addresses.

Thanks to Lanny Baron from FreeBSD Systems for the loan of a test machine
with 6 gigs of ram.

Sponsored by: DARPA, Network Associates Laboratories, FreeBSD Systems


112837 30-Mar-2003 jake

- Remove invalid casts.

Sponsored by: DARPA, Network Associates Laboratories


112836 30-Mar-2003 jake

- Convert all uses of pmap_pte and get_ptbase to pmap_pte_quick. When
accessing an alternate address space this causes 1 page table page at
a time to be mapped in, rather than using the recursive mapping technique
to map in an entire alternate address space. The recursive mapping
technique changes large portions of the address space and requires global
tlb flushes, which seem to cause problems when PAE is enabled. This will
also allow IPIs to be avoided when mapping in new page table pages using
the same technique as is used for pmap_copy_page and pmap_zero_page.

Sponsored by: DARPA, Network Associates Laboratories


112790 29-Mar-2003 mdodd

- Move driver to newbus.
- Provide identify methods for EtherExpress and 3c507 cards; this
means these cards no longer need wired configs.
- Provide a detach method.


112688 26-Mar-2003 ps

Nuke HTT from here too.

Spotted by: jhb


112687 26-Mar-2003 ps

Nuke options HTT infavor of machdep.hlt_logical_cpus tunable/sysctl.
This keeps the logical cpu's halted in the idle loop. By default
the logical cpu's are halted at startup. It is also possible to
halt any cpu in the idle loop now using machdep.hlt_cpus.

Examples of how to use this:
machdep.hlt_cpus=1 halt cpu0
machdep.hlt_cpus=2 halt cpu1
machdep.hlt_cpus=4 halt cpu2
machdep.hlt_cpus=3 halt cpu0,cpu1

Reviewed by: jhb, peter


112686 26-Mar-2003 peter

Halt the cpus in the idle loop for SMP as well for several reasons:
1) Its critical for HTT. There's less foot-shooting opportunity.
2) I've seen significant improvements in interactive response to commands
over ssh sessions. I assume this is less lock contention.
3) As incentive to finish the idle cpu IPI wakeup stuff.
4) The machine on my desk was blowing hot air in my general direction
because somebody forgot to turn the hlt on, and it saves 50 watts per
cpu..

The machdep.cpu_idle_hlt sysctl is still available, but now the default
is the same as on UP kernels.


112647 25-Mar-2003 jhb

Add an options entry for HTT in SMP and GENERIC similar to the SMP and
APIC_IO options.

Requested by: John Cagle <john.cagle@hp.com>


112569 25-Mar-2003 jake

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


112551 24-Mar-2003 mdodd

Use repo-copied files in sys/i386/bios.


112526 24-Mar-2003 bde

Disable interrupts while in kdb_trap() to handle cases where the caller
doesn't do it. This fixes all known causes of "Context switches not
allowed in the debugger" in mi_switch(). The main cause was trap_fatal()
calling kdb_trap() with interrupts enabled. Switching to ithreads for
interrupt handling then made fatal traps more fatal and harder to debug.
The problem was limited in -current because most interrupt handlers are
blocked by Giant, but it occurred almost deterministically for me because
my clock interrupt handlers are non-fast and not blocked by Giant.


112498 22-Mar-2003 ru

Remove bitrot associated with `maxusers'.

Submitted by: bde


112445 20-Mar-2003 dwmalone

Extend CPU_ATHLON_SSE_HACK to cover a few more revisions of Athlon CPUs.

Submitted by: Jon Kuster <kwsn@earthlink.net>
MFC after: 2 weeks


112436 20-Mar-2003 mux

Use atomic operations to increment and decrement the refcount
in busdma tags. There are currently no tags shared accross
different drivers so this isn't needed at the moment, but it
will be required when we'll have a proper newbus method to get
the parent busdma tag.


112367 18-Mar-2003 phk

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


112350 17-Mar-2003 jhb

Expand the APIC ID mask field of the ICR register to 8 bits intead of just
4 bits. This reportedly fixes booting on the SW7500CW2. Much thanks to
the submitter for tracking this down!

Submitted by: Brian Buchanan <brian@ncircle.com>
Reviewed by: peter
MFC after: 3 days


112346 17-Mar-2003 mux

- Lock down the bounce pages structures. We use the same locking scheme
as with the alpha backend because both implementations of bounce pages
are identical.
- Remove useless splhigh()/splx() calls.


112312 16-Mar-2003 jake

Made the prototypes for pmap_kenter and pmap_kremove MD. These functions
are machine dependent because they are not required to update the tlb when
mappings are added or removed, and doing so is machine dependent.
In addition, an implementation may require that pages mapped with pmap_kenter
have a backing vm_page_t, which is not necessarily true of all physical
pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the
physical address in order to make this requirement clear.


112196 13-Mar-2003 mux

Grab Giant around calls to contigmalloc() and contigfree() so
that drivers converted to be MP safe don't have to deal with it.


112133 12-Mar-2003 jake

- Added support for multiple page directory pages to pmap_pinit and
pmap_release.
- Merged pmap_release and pmap_release_free_page. When pmap_release is
called only the page directory page(s) can be left in the pmap pte object,
since all page table pages will have been freed by pmap_remove_pages and
pmap_remove. In addition, there can only be one reference to the pmap and
the page directory is wired, so the page(s) can never be busy. So all there
is to do is clear the magic mappings from the page directory and free the
page(s).

Sponsored by: DARPA, Network Associates Laboratories


112104 11-Mar-2003 jake

Use bus_space_handle_t to represent host port and virtual addresses;
bus_addr_t may not be appropriate.

Sponsored by: DARPA, Network Associates Laboratories


111975 08-Mar-2003 davidxu

Initialize eflags in fake frame to default value rather than random one.
The random value sometimes causes macro CLKF_USERMODE to return true
because PSL_VM bit is set and really shoudn't be, this causes statclock()
to execute in wrong path, and further breaks KSE code and kernel crashes
when executing threaded program.


111939 06-Mar-2003 rwatson

Instrument sysarch() MD privileged I/O access interfaces with a MAC
check, mac_check_sysarch_ioperm(), permitting MAC security policy
modules to control access to these interfaces. Currently, they
protect access to IOPL on i386, and setting HAE on Alpha.
Additional checks might be required on other platforms to prevent
bypass of kernel security protections by unauthorized processes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


111878 04-Mar-2003 jhb

Wrap the hyperthreading support code with the HTT kernel option.
Hyperthreading support is now off unless the HTT option is added.

MFC-after: 3 days


111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


111638 27-Feb-2003 jhb

Expand some #ifdef's to fix I386_CPU compile.

Reported by: Andy Farkas <andyf@speednet.com.au>


111636 27-Feb-2003 alc

Remove some long unused declarations. (For example, the PV flags have not
been used since revision 1.8, roughly nine years ago.)


111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


111582 26-Feb-2003 ru

Implemented "nooption" and "nomakeoption" config(8) tokens.
Fixed memory leak in the "nodevice" option implementation.

Use these instead of sed(1) in MD NOTES.

Use a single makefile (sys/conf/makeLINT.mk) to generate
LINT for all architectures. (Previous versions missed
the LINT dependency on Makefile, and i386 version also
missed the dependency on ${NOTES}.)

Fixed bugs in the previous NOTES conversion using the
"nodevice" token and sed(1):

- i386 LINT lost "device pst".

- pc98 LINT lost SC_*, MAXCONS and KBD_DISABLE_KEYMAP_LOAD
options, and got needless DPT_* options.

- Added nooptions PPC_DEBUG, PPC_PROBE_CHIPSET, KBD_INSTALL_CDEV
to sparc64 LINT so that it has a chance to config(8).

This basically returns us to where we were before.


111535 26-Feb-2003 davidxu

Better to not know anything about KSE.


111524 26-Feb-2003 mux

Correctly set BUS_SPACE_MAXSIZE in all the busdma backends.
It was bogusly set to 64 * 1024 or 128 * 1024 because it was
bogusly reused in the BUS_DMAMAP_NSEGS definition.


111500 25-Feb-2003 obrien

Move most everything back to a MI NOTES, and use "nodevice" in MD NOTES
Where needed. Use 'sed' for now in place of "nooptions". Add a sparc64
MD NOTES.

Reviewed by: arch@


111493 25-Feb-2003 jake

- Added inlines pmap_is_current, pmap_is_alternate and pmap_set_alternate
for testing and setting the current and alternate address spaces.
- Changed PTDpde and APTDpde to arrays to support multiple page directory
pages.

ponsored by: DARPA, Network Associates Laboratories


111477 25-Feb-2003 davidxu

Remove an unsafe KASSERT.


111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


111440 24-Feb-2003 jake

- Removed UMAXPTDI and UMAXPTEOFF.
- Changed VM_MAXUSER_ADDRESS to be defined in terms of PTDPTDI. In order for
assumptions about the recursive page table map to work it must be the base
of the recursive map. Any pte offset that's not NPTEPG will break these
assumptions.

Sponsored by: DARPA, Network Associates Laboratories


111428 24-Feb-2003 nyan

The mpbiosreason variable does not used for pc98.


111385 24-Feb-2003 jake

Use the direct mapping of IdlePTD setup in locore for proc0's page directory,
instead of allocating another page of kva and mapping it in again. This was
likely an oversight in revision 1.174 (cut and paste from pmap_pinit).

Discussed with: peter, tegge
Sponsored by: DARPA, Network Associates Laboratories


111382 23-Feb-2003 tegge

Allow machines with one CPU and a valid mp table to boot an SMP kernel.


111372 23-Feb-2003 jake

Previous commit missed a 1 that should be NGPTD, and an NPDEPG that should
be NPDEPTD. Grumble.

Sponsored by: DARPA, Network Associates Laboratories


111363 23-Feb-2003 jake

- Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the
page directory.
- Use these instead of the magic constants 1 or PAGE_SIZE where appropriate.
There are still numerous assumptions that the page directory is exactly
1 page.

Sponsored by: DARPA, Network Associates Laboratories


111299 23-Feb-2003 jake

- Added macros PDESHIFT and PTESHIFT, use these instead of magic constants
in locore.
- Removed the macros PTESIZE and PDESIZE, use sizeof instead in C.

Sponsored by: DARPA, Network Associates Laboratories


111272 22-Feb-2003 alc

The root of the splay tree maintained within the pm_pteobj always refers
to the last accessed pte page. Thus, the pm_ptphint is redundant and can
be removed.


111271 22-Feb-2003 jake

unsigned -> pt_entry_t.

Sponsored by: DARPA, Network Associates Laboratories


111167 20-Feb-2003 peter

Fix fumble in rev 1.525. pmap_kenter()'s second argument is a physical
address, not a page index.

Laughed at by: jake


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111068 18-Feb-2003 peter

Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has been
#if'ed out for a while. Complete the deed and tidy up some other bits.

We need to be able to call this stuff from outer edges of interrupt
handlers for devices that have the ISR bits in pci config space. Making
the bios code mpsafe was just too hairy. We had also stubbed it out some
time ago due to there simply being too much brokenness in too many systems.
This adds a leaf lock so that it is safe to use pci_read_config() and
pci_write_config() from interrupt handlers. We still will use pcibios
to do interrupt routing if there is no acpi.. [yes, I tested this]

Briefly glanced at by: imp


111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


111028 17-Feb-2003 jeff

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


111024 17-Feb-2003 jeff

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


111017 16-Feb-2003 phk

Change "dev_t gdbdev" to "void *gdb_arg", some possible paths for GDB
will not have a dev_t.


111002 16-Feb-2003 phk

Remove #include <sys/dkstat.h>


110955 15-Feb-2003 alc

Assert that the kernel map's system mutex is held in pmap_growkernel().


110845 14-Feb-2003 alc

- Add a mutex for synchronizing the use of CMAP/CADDR 1 and 2.
- Eliminate small style differences between pmap_zero_page(),
pmap_copy_page(), etc.


110831 13-Feb-2003 obrien

Fix the style of the SCHED_4BSD commit.


110781 13-Feb-2003 peter

Oops. I mis-remembered about the P4 problems. It was 5.0-DP2 that
was shipped with DISABLE_PG_G and DISABLE_PSE, not 5.0-REL. *blush*
Disable the code - but still leave it there in case its still lurking.


110780 13-Feb-2003 peter

Turn of PG_PS and PG_G for Pentium-4 cpus at boot time. This is so
that we can stop turning off PG_G and PG_PS globally for releases.


110747 12-Feb-2003 alc

Remove kptobj. Instead, use VM_ALLOC_NOOBJ.


110687 11-Feb-2003 phk

Switch to using the TSC code in i386/i386/tsc.c.


110566 08-Feb-2003 mike

Implement fpclassify():
o Add a MD header private to libc called _fpmath.h; this header
contains bitfield layouts of MD floating-point types.
o Add a MI header private to libc called fpmath.h; this header
contains bitfield layouts of MI floating-point types.
o Add private libc variables to lib/libc/$arch/gen/infinity.c for
storing NaN values.
o Add __double_t and __float_t to <machine/_types.h>, and provide
double_t and float_t typedefs in <math.h>.
o Add some C99 manifest constants (FP_ILOGB0, FP_ILOGBNAN, HUGE_VALF,
HUGE_VALL, INFINITY, NAN, and return values for fpclassify()) to
<math.h> and others (FLT_EVAL_METHOD, DECIMAL_DIG) to <float.h> via
<machine/float.h>.
o Add C99 macro fpclassify() which calls __fpclassify{d,f,l}() based
on the size of its argument. __fpclassifyl() is never called on
alpha because (sizeof(long double) == sizeof(double)), which is good
since __fpclassifyl() can't deal with such a small `long double'.

This was developed by David Schultz and myself with input from bde and
fenner.

PR: 23103
Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
(significant portions)
Reviewed by: bde, fenner (earlier versions)


110532 08-Feb-2003 alc

MF alpha
- Synchronize access to the allpmaps list with a mutex.


110480 07-Feb-2003 peter

Commit some cosmetic changes I had laying around and almost included
with another commit. Unwrap a line. Unexpand a pmap_kenter().


110379 05-Feb-2003 phk

This file has no longer any content from the original Berkeley file so
replace the UCB copyright with a FreeBSD 2 clause thing.

Remove some no longer relevant comments.


110370 05-Feb-2003 phk

i386/i386/tsc.c was repo-copied from i386/isa/clock.c.

Remove all the stuff that does not relate to the TSC.

Change the calibration to use DELAY(1000000) rather than trying to check
it against the CMOS RTC, this drastically increases precision:

Using 25 samples on a Athlon 700MHz UP machine I find:

stddev min max average
CMOS 22200 Hz -74980 Hz 34301 Hz 704928721 Hz
DELAY 1805 Hz -1984 Hz 2678 Hz 704937583 Hz

(The difference between the two averages is not statistically significant.)

expressed in PPM of the frequency:
stddev min max
CMOS 31.49 PPM -106.37 PPM 48.66 PPM
DELAY 2.56 PPM 2.81 PPM 3.80 PPM

This code will not be used until a followup commit to sys/isa/clock.c
and sys/pc98/pc98/clock.c which will only happen after some field testing.


110368 05-Feb-2003 phk

Make get_cyclecount() use binuptime() when no tsc is available: it is cheaper.


110335 04-Feb-2003 harti

Fix a problem in bus_dmamap_load_{mbuf,uio} when the first mbuf or the first
uio segment is empty. In this case no dma segment is create by
bus_dmamap_load_buffer, but the calling routine clears the first flag.
Under certain combinations of addresses of the first and second mbuf/uio
buffer this leads to corrupted DMA segment descriptors. This was already
fixed by tmm in sparc64/sparc64/iommu.c.

PR: kern/47733
Reviewed by: sam
Approved by: jake (mentor)


110299 03-Feb-2003 phk

Split the global timezone structure into two integer fields to
prevent the compiler from optimizing assignments into byte-copy
operations which might make access to the individual fields non-atomic.

Use the individual fields throughout, and don't bother locking them with
Giant: it is no longer needed.

Inspired by: tjr


110296 03-Feb-2003 jake

Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway. This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by: jhb, tmm
Tested on: i386, sparc64


110254 03-Feb-2003 alc

- Make allpmaps static.
- Use atomic subtract to update the global wired pages count. (See
also vm/vm_page.c revision 1.233.)
- Assert that the page queue lock is held in pmap_remove_entry().


110232 02-Feb-2003 alfred

Consolidate MIN/MAX macros into one place (param.h).

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


110202 01-Feb-2003 joe

Put replace spaces with tabs in keeping with the rest of the file.


110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


110039 29-Jan-2003 phk

Make tsc_freq a 64bit quantity.

Inspired by: http://www.theinquirer.net/?article=7481


110030 29-Jan-2003 scottl

Implement bus_dmamem_alloc_size() and bus_dmamem_free_size() as
counterparts to bus_dmamem_alloc() and bus_dmamem_free(). This allows
the caller to specify the size of the allocation instead of it defaulting
to the max_size field of the busdma tag.

This is intended to aid in converting drivers to busdma. Lots of
hardware cannot understand scatter/gather lists, which forces the
driver to copy the i/o buffers to a single contiguous region
before sending it to the hardware. Without these new methods, this
would require a new busdma tag for each operation, or a complex
internal allocator/cache for each driver.

Allocations greater than PAGE_SIZE are rounded up to the next
PAGE_SIZE by contigmalloc(), so this is not suitable for multiple
static allocations that would be better served by a single
fixed-length subdivided allocation.

Reviewed by: jake (sparc64)


109994 28-Jan-2003 jake

Remove BDE_DEBUGGER.

Discussed with: bde


109964 28-Jan-2003 alc

Merge pmap_testbit() and pmap_is_modified(). The latter is the only caller
of the former.


109898 26-Jan-2003 julian

Fix KSE related patch.
Make it compile for the SMP case..
statclock_process() has changed prototypes.


109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


109869 26-Jan-2003 jeff

- Remove a redundant scheduler option.

Pointy hat to: jeff
Spotted by: dillon


109865 26-Jan-2003 jeff

- Introduce the SCHED_ULE and SCHED_4BSD options for compile time selection
of the scheduler.
- Add SCHED_4BSD as the scheduler for all kernel config files in cvs.


109717 23-Jan-2003 peter

Nuke CHEAP_TPR stuff, including LOPRIO_LEVEL (bogus) and ALLHWI_LEVEL
(which we never used). There is no need to tweak the TPR anymore and
only causes problems.


109715 23-Jan-2003 peter

Now that TPR isn't bogusly raised at boot, there is no need to clear
it at context switch.


109700 22-Jan-2003 jhb

- Move enable_sse()'s prototype to machine/md_var.h.
- Sort definition of cpu_* variables appropriately.
- Move cpu_fxsr out of the magic non-BSS set of variables and stick it in
the BSS along with hw_instruction_sse (make the latter static as well).

Submitted by: bde (partially)


109696 22-Jan-2003 jhb

Rename cpuid_cpuinfo to cpu_procinfo. bde requested that I rename this
variable to something in the cpu_* namespace since that's what all the
other cpuid variables were named and cpu_procinfo is what I came up with.

Requested by: bde


109691 22-Jan-2003 jhb

Bah, add in a missing space char I noticed when MFC'ing this.


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109605 21-Jan-2003 jake

Resolve relative relocations in klds before trying to parse the module's
metadata. This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by: Hartmut Brandt <brandt@fokus.gmd.de>
PR: 46732
Tested on: alpha (obrien), i386, sparc64


109520 19-Jan-2003 marcel

o Move the contents of <machine/floatingpoint.h> over to
<machine/ieeefp.h> where it belongs.
o Remove the i386 specific inclusion of <machine/floatingpoint.h>
from <ieeefp.h>, now that including <machine/ieeefp.h> is enough
for all architectures.
o Allow <machine/ieeefp.h> to inline the functions exposed by the
headers by checking for _IEEEFP_INLINED_ in the MI header. When
defined, prototypes are not given and it is assumed that the MD
headers, when inlining only a subset of the functions provide
prototypes for the functions not being inlined.

Based on patch from: Terry Lambert <tlambert2@mindspring.com>
Tested with: make release.


109344 16-Jan-2003 sam

wi now needs wlan

Reviewed by: imp


109342 16-Jan-2003 dillon

Merge all the various copies of vm_fault_quick() into a single
portable copy.


109340 15-Jan-2003 dillon

Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy. Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).


109027 09-Jan-2003 jhb

Remove earlysetcpuclass() as it has been OBE.

Suggested by: bde


109026 09-Jan-2003 jhb

Rework part of the previous processor name changes so that we read
cpu_exthigh and cpu_brand in printcpuinfo() instead of in identify_cpu().
We also only do it for known-good values of cpu_vendor which is a bit more
conservative.

Reviewed by: bde (mostly)


108961 08-Jan-2003 jhb

Consistently use spaces in between arguments to strcmp(). Whitespace
only.


108948 08-Jan-2003 jhb

- Use cpu_exthigh instead of executing cpuid again to retrieve it for the
print_AMD_foo() functions.
- Add a brand name table for the brand index provided on Intel CPU's in
%ebx after cpuid 1.
- For Intel CPUs, if we don't get a processor name from the extended cpuid
then use the brand index in cpuid_cpuinfo to pick a name from the brand
table and copy that name into cpu_brand.
- Replace the duplicated code to use the extended cpuid to replace
cpu_model with the processor name in the AMD and Transmeta sections of
printcpuinfo() with generic code that replaces cpu_model with
cpu_brand if cpu_brand is not an empty string. We also trim leading
spaces from cpu_brand prior to doing this since at least some processor
names (notably those of Intel CPUs) have leading spaces in the name.
- Give print_AMD_features() its own private regs[] array since
printcpuinfo() doesn't use the one it has anymore.


108947 08-Jan-2003 jhb

- Add a cpu_exthigh variable to hold the highest extended cpuid value
returned from cpuid 0x80000000.
- Add a cpu_brand char array to hold the processor name returned by
cpuid 0x80000002-0x80000004 on AMD, Intel, Transmeta, and possibly
other CPUs.
- Use cpuid to set cpu_exthigh and read the processor name if it is present
in identify_cpu().


108946 08-Jan-2003 jhb

Bah, get the test for more than one logical CPU right so we don't bogusly
claim a CPU has HT support when it lists 0 or 1 logical CPU's per physical
processor.


108914 08-Jan-2003 jhb

Enumerate logical hyperthread CPUs manually if they aren't already listed
in the mptable. The way this works is that we determine if the system
has hyperthreading and how many logical CPU's should be in each physical
CPU by using the information returned by cpuid. During the first pass of
the mptable, we build a bitmask of the APIC IDs of the CPUs listed in the
mptable. We then scan that bitmask to see if the CPUs are already listed
by the mptable, or if there are any APIC IDs already in use that would
conflict with the APIC IDs of the logical CPUs. If that test succeeds,
then we fixup the count of application processors. Later on during the
second pass of the mptable we create fake processor entries for logical
CPUs and add them to the system.

We only need this type of fixup hack when using the mptable to enumerate
CPUs. The ACPI MADT table properly enumerates all logical CPUs.


108913 08-Jan-2003 jhb

If the boot processor supports hyperthreading and contains more than one
logical CPU, display the number of logical CPUs per physical processor
underneath the list of CPU features.


108911 08-Jan-2003 jhb

Add a cpuid_cpuinfo variable to hold the results of %ebx from cpuid with
%eax of 1 and set it in identify_cpu().


108909 08-Jan-2003 jhb

- Fix the name of the hyperthreading cpuid feature flag to be HTT instead
of HHT.
- Document fields returned in %ebx by a cpuid with %eax of 1.


108608 03-Jan-2003 jhb

Document bit 31 of the cpuid features word as PBE (Pending Break Enable).


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108517 31-Dec-2002 njl

Return an error when r/w is requested on an unsupported device instead of
looping.

Submitted by: Sean Kelly <smkelly@zombie.org>
Pointed out by: bde


108409 29-Dec-2002 rwatson

Synchronize to kern/syscalls.master:1.139.

Obtained from: TrustedBSD Project


108342 28-Dec-2002 scottl

Add the if_bge driver. I can't find any reason why it's not here, and it's
pretty common on Dell servers and other high-end boxes.


108338 28-Dec-2002 julian

Add code to ddb to allow backtracing an arbitrary thread.
(show thread {address})

Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.

All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().

kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.

Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.


108337 28-Dec-2002 alc

Assert that the page queues lock is held in pmap_testbit().


108253 24-Dec-2002 alc

- Hold the page queues lock around calls to vm_page_wakeup() and
vm_page_flag_clear().


108239 23-Dec-2002 phk

Outdent the string rather than use concatenation.


108175 22-Dec-2002 tjr

MB_LEN_MAX is not MD, move it to the MI limits.h.


108026 18-Dec-2002 marcel

Export the physical address of the RSDP to userland by means
of the `machdep.acpi_root' sysctl. This is required on ia64
because the root pointer hardly ever, if at all, lives in the
first MB of memory and also because scanning the first MB of
memory can cause machine checks.
This provides a save and reliable way for ACPI tools to work
with the tables if ACPI support is present in the kernel. On
ia64 ACPI is non-optional.


107965 17-Dec-2002 njl

Back out 1.19 to rethink approach

Requested by: julian@


107962 17-Dec-2002 njl

Automatically issue a "continue" along with the "detach" command. This
fixes the problem of cleanly restarting a target after entering gdb mode.

Reviewed by: archie@


107957 16-Dec-2002 julian

Reformat last change
Requested by: nate@


107955 16-Dec-2002 julian

Don't dump core into a partition that is too small for it.
If we do, we usually wrote backwareds into the proceeding partititon
which is usually the root partition.


107946 16-Dec-2002 cognet

Add the trm(4) driver.

MFC after: 1 day


107923 16-Dec-2002 marcel

Regen: swapoff


107922 16-Dec-2002 marcel

Change swapoff from MNOPROTO to UNIMPL. The former doesn't work.


107913 15-Dec-2002 dillon

This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after: 3 weeks


107867 14-Dec-2002 phk

Only dump the BIOS geometry table from bootinfo on PC98, we don't use
the contents on i386 anymore.


107853 14-Dec-2002 alc

Add page locking to pmap_mincore().

Submitted (in part) by: tjr@


107849 14-Dec-2002 alfred

SCARGS removal take II.


107839 13-Dec-2002 alfred

Backout removal SCARGS, the code freeze is only "selectively" over.


107838 13-Dec-2002 alfred

Remove SCARGS.

Reviewed by: md5


107719 10-Dec-2002 julian

Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by: re (jhb)


107650 05-Dec-2002 jhb

Add "disabled" hints to all of the uncommon ISA devices that are in
GENERIC. Each device can be re-enabled at startup time by unsetting the
disabled hint in the loader.

Requested by: mdodd
Approved by: re
Prodded by: rwatson


107618 04-Dec-2002 alc

Hold the page queues lock around calls to pmap_remove().

Approved by: re


107576 04-Dec-2002 phk

Use the correct value when writing the Day Of Week byte in the CMOS.
The correct range is [1...7] with Sunday=1, but we have been writing
[0...6] with Sunday=0.

The Soekris computers flagged the zero, zapped the date, so if you
rebooted your soekris on a sunday, it would come up with a wrong
date.

Bruce has a more extensive rework of this code, but we will stick with
the minimalist fix for now.

Spotted by: Soren Kristensen <soren@soekris.com>
Thanks to: Michael Sierchio <kudzu@tenebras.com>.
Confirmed by: bde
Approved by: re


107538 03-Dec-2002 alc

Avoid recursive acquisition of the page queues lock in pmap_unuse_pt().

Approved by: re


107521 02-Dec-2002 deischen

Align the FPU state in the ucontext and sigcontext to 16 bytes
to accomodate the new SSE/XMM floating point save/restore
instructions.

This commit is mostly from bde and includes some style nits.

Approved by: re (jhb)


107490 02-Dec-2002 alc

Hold the page queues lock when calling pmap_unwire_pte_hold() or
pmap_remove_pte(). Use vm_page_sleep_if_busy() in
_pmap_unwire_pte_hold() so that the page queues lock is released
when sleeping.

Approved by: re (blanket)


107434 01-Dec-2002 alc

Assert that the page queues lock is held in pmap_changebit()
and pmap_ts_referenced().

Approved by: re (blanket)


107410 30-Nov-2002 alc

Assert that the page queues lock is held in pmap_page_exists_quick().

Approved by: re (blanket)


107217 25-Nov-2002 alc

Assert that the page queues lock is held in pmap_remove_pages().

Approved by: re (blanket)


107212 24-Nov-2002 alc

Add page queues locking to vunmapbuf(); reduce differences with respect
to the sparc64 implementation. (Note: With modest effort on the alpha and
ia64 this function could migrate to the MI part of the kernel.)

Approved by: re (blanket)


107199 24-Nov-2002 iwasaki

Add `if (!cold)' checkings for functions which is called via SYSINIT.
Loading acpi.ko with kldload is disallowed, however some
functions were executed unexpectedly.

Approved by: re


107184 23-Nov-2002 alc

- Assert that the page queues lock is held in pmap_remove_all().
- Fix a diagnostic message and comment in pmap_remove_all().
- Eliminate excessive white space from pmap_remove_all().

Approved by: re


107180 22-Nov-2002 mux

Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit(). This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit(). We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by: re@
Suggested by: jhb


107144 21-Nov-2002 jhb

*sigh*. It seems that in the ACPICA code, Intel defines its own APIC_IO
macro for use when parsing MADT tables, thus we always tried to set the
interrupt model to APIC. This proved to be harmful on UP machines with
IO APIC's (or for UP kernels on SMP machines) since the wrong interrupt
routing information would be returned.

Pointy hat to: jhb
Approved by: re (rwatson)


106993 16-Nov-2002 deischen

Regenerate after adding syscalls.


106989 16-Nov-2002 deischen

Add *context() syscalls to ia64 32-bit compatability table as requested
in kern/syscalls.master.


106977 16-Nov-2002 deischen

Add getcontext, setcontext, and swapcontext as system calls.
Previously these were libc functions but were requested to
be made into system calls for atomicity and to coalesce what
might be two entrances into the kernel (signal mask setting
and floating point trap) into one.

A few style nits and comments from bde are also included.

Tested on alpha by: gallatin


106901 14-Nov-2002 imp

MFp4:
o Fix small style nit. This was supposed to be part of the last batch of
style fixes, but somehow didn't get merged.


106878 13-Nov-2002 peter

Recognize the Serverworks CIOB30 host to pci bridge.


106842 13-Nov-2002 mdodd

Loader tunable 'machdep.disable_mtrrs'.
Sysctl of same name to reflect status.

Submitted by: jhb
Approved by: re (murray)
MFC after: 1 day


106838 13-Nov-2002 alc

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


106753 11-Nov-2002 alc

- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit()
if we're removing write access from the page's PTEs.
- Export pmap_remove_all() on alpha, i386, and ia64. (It's already
exported on sparc64.)


106707 09-Nov-2002 iwasaki

Add a new loader tunable, hw.hasbrokenint12, to indicate that BIOS
has broken int 12H.
If hw.hasbrokenint12="1" in loader environment, kernel never use BIOS
INT 12 call to determine base memory size.
Otherwise, kernel use INT 12 in old behaviour.
This should fix kernel panic problem caused by 1.544 changes.

MFC after: 1 day


106697 09-Nov-2002 des

Print real / avail memory in megabytes rather than kilobytes.


106605 07-Nov-2002 tmm

Move the definitions of the hw.physmem, hw.usermem and hw.availpages
sysctls to MI code; this reduces code duplication and makes all of them
available on sparc64, and the latter two on powerpc.
The semantics by the i386 and pc98 hw.availpages is slightly changed:
previously, holes between ranges of available pages would be included,
while they are excluded now. The new behaviour should be more correct
and brings i386 in line with the other architectures.

Move physmem to vm/vm_init.c, where this variable is used in MI code.


106598 07-Nov-2002 alfred

Properly parenthesize the DBREG_DRX macro's variables to allow for
DBREG_DRX(&dbregs, n) usage.


106567 07-Nov-2002 alc

Simplify and optimize pmap_object_init_pt(). More specifically,
take advantage of the fact that the vm object's list of pages is
now ordered to reduce the overhead of finding the desired set of
pages to be mapped. (See revision 1.215 of vm/vm_page.c.)


106542 07-Nov-2002 davidxu

1.Fix smp race between kernel vm86 BIOS calling and userland vm86 mode code,
remove global variable in_vm86call, set vm86 calling flag in PCB flags.

2.Fix vm86 BIOS calling preempted problem by changing vm86_lock mutex type
from MTX_DEF to MTX_SPIN. vm86pcb is not remembered in thread struct,
when the thread calling vm86 BIOS is preempted by interrupt thread,
and later switching back to the thread would cause incorrect context be
loaded into CPU registers, this leads to kernel crash.


106503 06-Nov-2002 jmallett

Remove what was a temporary bogus assignment of bits of siginfo_t, as it does
not look like the prerequisites to fill it in properly will be in the tree
for the upcoming release, but it's mostly done, so there is no need for these
to stay around to remind us.


106443 05-Nov-2002 davidxu

Fix typo. ioport_rid should be irq_rid.


106364 02-Nov-2002 rwatson

Sync to src/sys/kern/syscalls.master


106358 02-Nov-2002 imp

MFp4:
o It turns out that we always need to try to route the interrupts for
the case where the $PIR tells us there can be only one. Some machines
require this, while others fail when we try to do this (bogusly, imho).
Since we have no apriori way of knowing which is which, we always try to
do the routing and hope for the best if things fail.
o Add some additional comments that state the obvious, but amplify it in
non-obvious ways (judging from the questions I've gotten).

This should un-break older laptops that still have to use PCIBIOS to route
interrupts.

Tested by: sam


106357 02-Nov-2002 imp

Use 0xffffffff instead of -1 for id to compare against.
Use exact width types, since this is a MD file and won't be used elsewhere.
Fix a couple of resulting printf breakages

Bug found by: phk using Flexlint


105955 25-Oct-2002 jhb

Note that the sched_lock protects md_ldt of struct mdproc.


105952 25-Oct-2002 peter

Finish fixing the 5.x FPU code for dealing with signal handlers.

Obtained from: bde


105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


105949 25-Oct-2002 iwasaki

Change method to determine base memory size.
Try INT 15H/E820H first, then fall back to the old compatibility
method (INT 12H).
This is a workaround for newer machines which have broken INT 12H BIOS
service implementation.

Reviewed by: -current ML
MFC after: 3 days


105910 25-Oct-2002 imp

Use the correct values for LDBL_*. Libc doesn't completely support
long doubles at the moment (printf truncates them to doubles).
However, long doubles to appear to work to the ranges listed in this
commit on both -stable (4.5) and -current. There may be some slight
rounding issues with long doubles, but that's an orthogonal issue to
these constants.

I've had this in my local tree for 3 months, and in my company's local
tree for 15 months with no ill effects.

Obtained from: NetBSD
Not likely to like it: bde


105900 24-Oct-2002 julian

Extract out KSE specific code from machine specific code
so that there is ony one copy of it. Fix that one copy
so that KSEs with no mailbox in a KSE program are not a cause
of page faults (this can legitmatly happen).

Submitted by: (parts) davidxu


105731 22-Oct-2002 jhb

No need for pmtimer hint anymore.


105554 20-Oct-2002 phk

Change the definition of the debugging registers to be an array, so
that we can index into it, rather than do pointer gymnastics on a
structure containing 8 elements.

Verified by: MD5 hash on the produced .o files.


105536 20-Oct-2002 phk

Remove a boatload of '&' which are surplus to the requirements.

Validated by: md5 hash is unchanged.


105535 20-Oct-2002 phk

Revert last commit, there actually was a -1 waaaaay down in pcireg_cfgread().


105534 20-Oct-2002 phk

Hide inline assembly if lint is defined.


105533 20-Oct-2002 phk

"id" is never going to be -1 when it is unsigned.

Spotted by: FlexeLint


105490 19-Oct-2002 peter

Stake a claim on 418 (__xstat), 419 (__xfstat), 420 (__xlstat)


105486 19-Oct-2002 peter

Grab 416/417 real estate before I get burned while testing again.
This is for the not-quite-ready signal/fpu abi stuff. It may not see
the light of day, but I'm certainly not going to be able to validate it
when getting shot in the foot due to syscall number conflicts.


105476 19-Oct-2002 rwatson

Add a placeholder for the execve_mac() system call, similar to SELinux's
execve_secure() system call, which permits a process to pass in a label
for a label change during exec. This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process. Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105469 19-Oct-2002 marcel

Add two hooks to signal module load and module unload to MD code.
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.

Validated on: alpha, i386, ia64


105463 19-Oct-2002 rwatson

Permits UFS ACLs to be used with the GENERIC kernel. Due to recent
ACL configuration changes, this shouldn't result in different code paths
for file systems not explicitly configured for ACLs by the system
administrator. For UFS1, administrators must still recompile their
kernel to add support for extended attributes; for UFS2, it's sufficient
to enable ACLs using tunefs or at mount-time (tunefs preferred for
reliability reasons). UFS2, for a variety of reasons, including
performance and reliability, is the preferred file system for use with
ACLs.

Approved by: re


105347 17-Oct-2002 pirzyk

Add the !define(COMPILING_LINT)

pass the pointy hat...

Requested by: Juli Mallett <jmallett@FreeBSD.org>


105328 17-Oct-2002 iwasaki

1. Fix a comment. Locking _is_ needed (but not done).
2. Update a comment. We now restore much more than RTC updates and
interrupts.
3. Order change. Stop interrupts by writing to RTC_STATUSB,
restore rate bits for the interrupts by writing to RTC_STATUSA,
then enable interrupts again.
This seems to be done perfectly backwards in startrtclock().
Otherwise, the idea for this change was obtained from
startrtclock().
4. Don't stop the clock (RTCB_HALT). We only program some control bits
and don't want to stop the clock.
5. (Not really related.) Add caveats to the comment about timer_restore().
The update is non-atomic since locking is not done.

On locking:
6. rtcin() and writertc() are locked() adequately by splhigh() in RELENG_4,
but this locking is null in -current.
7. Doing things in the correct order in (3) combined with (6) is probably
enough locking for rtcrestore() in RELENG_4. In -current, the
writertc()'s race with rtcintr() unless the BIOS disables RTC interrupts.

Submitted by: bde (including commit message)
MFC after: 1 week


105311 17-Oct-2002 pirzyk

put an #error directive when SMP and CPU_DISABLE_CMPXCHG are set
together.

Requested by: Lars Eggart <larse@isi.edu>
Enlighted how to do it by: John Baldwin <jhb@freebsd.org>


105287 16-Oct-2002 jhb

Use the global pcib devclass instead of our own static copy.


105277 16-Oct-2002 jhb

- curproc may be NULL in 4-stable. In that case use the vmspace from
proc0.
- Remove unused include.

Sponsored by: The Weather Channel


105276 16-Oct-2002 jhb

Include <sys/select.h> on -stable instead of <sys/selinfo.h> to get the
definition of struct selinfo.

Sponsored by: The Weather Channel


105216 16-Oct-2002 phk

Be consistent about functions being static.

Spotted by: FlexeLint.


105138 15-Oct-2002 peter

The a.out md_coredump stuff isn't referenced anywhere anymore, and
hasn't been filled in for ages.. Nuked.


105117 14-Oct-2002 pirzyk

Add a knob to turn on and off the CMPXCHG instruction on > i386 IA32 systems.
This is most beneficial for vmware client os installs.

Reviewed by: jmallet, iedowse, tlambert2@mindspring.com
MFC After: never, -STABLE does not currently use this instruction


105054 13-Oct-2002 mike

Remove the P1003_1B kernel option; it is no longer used.


105014 13-Oct-2002 mike

Add standards visibility conditionals. Change any uses of sigset_t to
struct __sigset to avoid depending on objects from <sys/signal.h>.


104974 12-Oct-2002 phk

Remove NO_GEOM option. No outstanding show-stoppers.

Sponsored by: DARPA & NAI Labs.


104964 12-Oct-2002 jeff

- Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
- Replace direct manipulation of scheduler data with hooks provided by the
new api.
- Remove KSE specific state modifications and single runq assumptions from
kern_switch.c

Reviewed by: -arch


104908 11-Oct-2002 mike

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


104741 09-Oct-2002 peter

re-regen. Sigh.


104740 09-Oct-2002 peter

Sigh. Fix fat-fingering of diff. I knew this was going to happen.


104739 09-Oct-2002 peter

regenerate. sendfile stuff and other recently picked up stubs.


104738 09-Oct-2002 peter

Try and deal with the #ifdef COMPAT_FREEBSD4 sendfile stuff. This would
have been a lot easier if do_sendfile() was usable externally.


104736 09-Oct-2002 peter

Try and patch up some tab-to-space spammage.


104735 09-Oct-2002 peter

Add placeholder stubs for nsendfile, mac_syscall, ksem_close, ksem_post,
ksem_wait, ksem_trywait, ksem_init, ksem_open, ksem_unlink, ksem_getvalue,
ksem_destroy, __mac_get_pid, __mac_get_link, __mac_set_link,
extattr_set_link, extattr_get_link, extattr_delete_link.


104727 09-Oct-2002 jhb

Use d_thread_t for cdevsw functions instead of struct thread * so that it
is easier to share this code with 4-stable.


104718 09-Oct-2002 jhb

Remove 'at' hints for npx and apm as both drivers have identify routines
that add an instance of themselves. The npx(4) driver doesn't even check
the npx 'port' hint but hardcodes IO_NPX instead. The npx(4) driver also
will use isa IRQ 13 (on x86, 8 on pc98) by default if no 'irq' hint is
specified, so we don't need that hint either.


104695 09-Oct-2002 julian

Round out the facilty for a 'bound' thread to loan out its KSE
in specific situations. The owner thread must be blocked, and the
borrower can not proceed back to user space with the borrowed KSE.
The borrower will return the KSE on the next context switch where
teh owner wants it back. This removes a lot of possible
race conditions and deadlocks. It is consceivable that the
borrower should inherit the priority of the owner too.
that's another discussion and would be simple to do.

Also, as part of this, the "preallocatd spare thread" is attached to the
thread doing a syscall rather than the KSE. This removes the need to lock
the scheduler when we want to access it, as it's now "at hand".

DDB now shows a lot mor info for threaded proceses though it may need
some optimisation to squeeze it all back into 80 chars again.
(possible JKH project)

Upcalls are now "bound" threads, but "KSE Lending" now means that
other completing syscalls can be completed using that KSE before the upcall
finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is
one of the highest priorities in the KSE system.) The upcall when it happens
will present all the completed syscalls to the KSE for selection.


104598 07-Oct-2002 imp

o go ahead and route the interupt, even if it is supposedly unique.
there are some strange machines that seem to need this.
o delete bogus comment.
o don't use the the bios for read/writing config space. They interact badly
with SMP and being called from ISR. This brings -current in line with
-stable.

# make the latter #ifdef on USE_PCI_BIOS_FOR_READ_WRITE in case we
# need to go back in a hurry.


104584 06-Oct-2002 mike

Add conditionals to allow va_list to be defined in other headers.


104583 06-Oct-2002 mike

o Add conditionals to allow va_list to be defined in other headers.
o Standardize on _MACHINE_STDARG_H_ to allow multiple header includes.
o Restrict the definition of va_copy() to C99 environments.


104519 05-Oct-2002 phk

NB: This commit does *NOT* make GEOM the default in FreeBSD
NB: But it will enable it in all kernels not having options "NO_GEOM"

Put the GEOM related options into the intended order.

Add "options NO_GEOM" to all kernel configs apart from NOTES.

In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.

There are currently three known issues which may force people to
need the NO_GEOM option:

boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.

SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.

PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)

These issues are all being worked.

Sponsored by: DARPA & NAI Labs.


104513 05-Oct-2002 deischen

Fix building of minimal kernels without npx by rearranging ifdefs.
Also fix some style bugs in surrounding code, and add a comment
about FP state restoral that seems questionable.

Submitted by: bde


104505 05-Oct-2002 mike

Fix namespace issues by using visibility conditionals from
<sys/cdefs.h>.


104493 04-Oct-2002 mike

style(9) <machine/setjmp.h> headers so they look mostly the same.


104486 04-Oct-2002 sam

New bus_dma interfaces for use by crypto device drivers:

o bus_dmamap_load_mbuf
o bus_dmamap_load_uio

Test on i386. Known to compile on alpha and sparc64, but not tested.
Otherwise untried.


104474 04-Oct-2002 jhb

Fix a bogon in previous commit. bcopy() from the malloc'd memory that we
already copied into, rather than doing the bcopy() from the userland
pointer. "Oops."


104460 04-Oct-2002 deischen

Add another temporary hack to allow running older i386 binaries.
This will be removed when new versions of syscalls sigreturn()
and sigaction() are added (mini is working on this but is in
the middle of a move).

This should fix the problem of cvsupd dying.


104381 02-Oct-2002 iwasaki

Add 2 Ids for new ServerWorks host to PCI bridge chipset.
These are still unknown name but these are working as well
as the other ServerWorks chipset.
Description strings should be corrected when the chipsets
are known.

MFC after: 1 week


104379 02-Oct-2002 archie

Let kse_wakeup() take a KSE mailbox pointer argument.

Reviewed by: julian


104354 02-Oct-2002 scottl

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


104319 01-Oct-2002 phk

The pmap_prefault_pageorder[] array was initialize with wrong values
due to a missing comma.

I have no idea what trouble, if any, this may have caused.

Pointed out by: FlexeLint


104294 01-Oct-2002 phk

It is too much work convincing lint why we would want empty structures,
so make the non-empty #ifdef lint.


104291 01-Oct-2002 phk

A more lint friendly #ifdef lint section.


104224 30-Sep-2002 jhb

- Give legacy an identify routine that always adds 'legacy0' at an order
of 1 so that it is not probed until after acpi0 is probed and attached.
- In legacy_probe(), return ENXIO if acpi0 is around and alive.
- nexus_attach() is now much simpler and just lets its child drivers do
all the work.


104223 30-Sep-2002 jhb

Trash the PnPBIOStable pointer later on when we know that the acpi probe
and attach routines have succeeded so that if they fail we can still use
the PnP BIOS to find ISA on-board devices. The fact that we do this here
is gross but fixing it properly involves a lot more work.


104215 30-Sep-2002 obrien

Turn back on the "SMP: AP CPU #N Launched!" message on normal boots.
Peter's rev 1.189 should fix the lost console on SCSI-based systems due
to this message.


104175 30-Sep-2002 obrien

Only print out the "SMP: AP CPU #N Launched!" message on verbose boots.
The kernel printf() isn't race-free


104174 30-Sep-2002 obrien

Save the FP state in the PCB as that is compatable with releng4 binaries.

This is a band-aid until the KSE pthread committers get back on the ground
and have their machines setup.

Submitted by: eischen


104118 28-Sep-2002 peter

Deal with some SMP races by doing the entire copyin at once rather
than doing the checks piecemeal and then doing a second copyin later.

PR: 38021
Submitted by: davidx (I've tweaked the patch a bit)


104110 28-Sep-2002 peter

There is no need for start/num to be signed in i386_ldt_args.


104106 28-Sep-2002 peter

Repair range checking for reading the ldt list.

PR: 38016
Submitted by: davidx


104097 28-Sep-2002 phk

Don't call function in return() for a void function.


104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


104045 27-Sep-2002 sos

Add the pst (Promise SX6000) driver to GENERIC.


103987 26-Sep-2002 peter

ISMEMSDP(), IS286GDP(), IS386GDP(), ISGDP(), ISSDP() and ISSYSSDP() are
not used anywhere anymore.


103972 25-Sep-2002 archie

Make the following name changes to KSE related functions, etc., to better
represent their purpose and minimize namespace conflicts:

kse_fn_t -> kse_func_t
struct thread_mailbox -> struct kse_thr_mailbox
thread_interrupt() -> kse_thr_interrupt()
kse_yield() -> kse_release()
kse_new() -> kse_create()

Add missing declaration of kse_thr_interrupt() to <sys/kse.h>.
Regenerate the various generated syscall files. Minor style fixes.

Reviewed by: julian


103965 25-Sep-2002 markm

Fix a declaration that is actually supposed to be a macro definition.

Submitted by: marius@alchemy.franken.de


103870 23-Sep-2002 alfred

use __packed.


103869 23-Sep-2002 jhb

Now that we only probe host-PCI bridges once, we no longer have to check to
see if we have been probed before by checking for a pciX bus device.


103868 23-Sep-2002 jhb

Put verbose printf's in the PCI BIOS interrupt routing code under
if (bootverbose).


103865 23-Sep-2002 jhb

Update the nexus driver for the addition of the legacy driver:
- nexus no longer has PCI bridges as direct children, so the PCI bus
ivar is no longer used and is removed.
- Don't attach default EISA, ISA, or MCA busses. Instead, if we do not
have an acpi0 device after bus_generic_probe(), add a legacy0 child
device.
- Remove machine/nexusvar.h.


103863 23-Sep-2002 jhb

Change the nexus_pcib driver (eventually to be renamed to legacy_pcib) to
hang off of the legacy driver instead of the nexus.


103862 23-Sep-2002 jhb

Add a new legacy(4) device driver for use on machines that do not have
ACPI or for when ACPI support is disabled or not present in the kernel.
Basically, the nexus device is now split into two with some parts
(such as adding default ISA, MCA, and EISA busses if they aren't found
as well as support for PCI bus device ivars) being moved to the legacy
driver.


103850 23-Sep-2002 peter

PIC_GOTOFF is OBE.


103847 23-Sep-2002 peter

use __packed, rather than __attribute__((packed)).


103834 23-Sep-2002 peter

At great personal risk, add a __packed and __aligned(x) define that
expand to __attribute__((packed)) and __attribute__((aligned(x)))
respectively. Replace the handful of gcc-ism's that use
__attribute__((aligned(16))) etc around the kernel with __aligned(16).

There are over 400 __attribute__((packed)) to deal with, that can come
later. I just want to use __packed in new code rather than add more
gcc-ism's.


103824 23-Sep-2002 peter

Delete a whole bunch of compatability defines that we dont use anymore.


103814 23-Sep-2002 mike

Be careful not to define GCC-specific optimizations in the non-GCC
case.


103778 22-Sep-2002 peter

Create inlines for ltr(sel), lldt(sel), lidt(addr) rather than
functions that have one instruction.


103772 22-Sep-2002 mdodd

- Move the init of %gs and pcb_gs before user_ldt_free().
- Always call load_gs()
- Trim comments.

This addresses some of the issues raised by BDE.


103770 22-Sep-2002 jake

Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c
so that it is MI. Allow nfs_mountroot to return an error if the nfs_diskless
struct is not valid, rather than panicing later on. Call nfs_setup_diskless()
from nfs_mountroot if NFS_ROOT is defined, like bootpc_init(). Removed legacy
root mount support for sparc64, and enabled NFS_ROOT by default.


103755 21-Sep-2002 markm

A good dose of style.9. No functional change.


103753 21-Sep-2002 markm

Code tidy-up. ISOfy, turn a macro into an inline for lint(1) (perhaps
this needs to go to cpufunc.h?), de-register.


103749 21-Sep-2002 markm

Provide in inline function for the (GNUC) assembler "hlt" instruction.


103748 21-Sep-2002 markm

Wrap GCC-specific asm() code in #ifdef __GNUC__


103733 21-Sep-2002 phk

Fix a 3 year old oversight: Remove the #ifdef/#endif pair now that there
is nothing between them anymore.

Spotted by: peter.


103711 20-Sep-2002 jhb

Axe unused include.


103706 20-Sep-2002 phk

We need neither <sys/diskslice.h> nor <sys/disklabel.h> here.

Sponsored by: DARPA & NAI Labs.


103703 20-Sep-2002 phk

For reasons now lost in historical fog, the bounds_check_with_label()
function were put in i386/i386/machdep.c from where it has been
cut and pasted to other architectures with only minor corruption.

Disklabel is really a MI format in many ways, at least it certainly
is when you operate on struct disklabel.

Put bounds_check_with_label() back in subr_disklabel.c where it belongs.

Sponsored by: DARPA & NAI Labs.


103682 20-Sep-2002 jhb

fork_trampoline() marks a trap frame.

Submitted by: bde


103681 20-Sep-2002 jhb

Use proper type for a variable used as a DDB symbol.


103680 20-Sep-2002 jhb

Trim includes.

Submitted by: bde


103679 20-Sep-2002 jhb

Various style fixes, including moving db_print_backtrace() out of the
middle of the watchpoint code.

Submitted by: bde


103649 19-Sep-2002 mdodd

This patch enables FreeBSD i686 MTRR support on Intel Pentium
4/XEON processors, which are not currently recognized.

Submitted by: Christian Zander <zander@minion.de>


103646 19-Sep-2002 jhb

Implement db_print_backtrace() if DDB is compiled into the kernel. This
MD function is just a wrapper around db_stack_trace_cmd() that prints out
a backtrace of curthread. Currently, this function is only implemented
on i386 and alpha (and the alpha version isn't quite tested yet, will do
that in a bit). Other changes:

- For i386, fix a bug in the raw frame address case. The eip we extract
from the passed in frame address does not match the frame we received.
Thus, instead of printing a bogus frame with the wrong eip, go ahead
and advance frame down to the same frame as the eip we are using.
- For alpha, attempt to add a way of doing a raw trace for alpha. Instead
of passing a frame address in 'addr', pass in a pointer to a structure
containing PC and KSP and use those to start the backtrace. The alpha
db_print_backtrace() uses asm to read in the current PC and KSP values
into such a request.

Tested on: i386
Requested by: many


103645 19-Sep-2002 mdodd

From Christian Zander:

This patch addresses a bug that can cause a GPF in the kernel - if a
process makes use of i386_set_ldt to install a LDT entry, then loads
a corresponding segment descriptor into %gs, forks, and if the child
execs.

In this scenario, setregs executes user_ldt_free and then determines
how to reset the %gs register:

/* reset %gs as well */
if (pcb == curpcb)
load_gs(_udatasel);
else
pcb->pcb_gs = _udatasel;

This is insufficient in the fork/exec case, since pcb will be equal
to curpcb when the child execs; load_gs will reset %gs to _udatasel
but it doesn't reset pcb->pcb_gs; upon return from the system call,
cpu_switch_load_gs will thus attempt to restore %gs from pcb->pcb_gs
and trigger a GPF since all LDT entries have already been cleared.

The fix is to always reset pcb->pcb_gs to _udatasel.

Submitted by: Christian Zander <zander@minion.de>
Reviewed by: jake


103527 18-Sep-2002 iwasaki

Restore status register A of RTC at resume time.
This should fix the 'too many RTC interrupts and statclock seems
broken after resume' problem.

MFC after: 1 week


103526 18-Sep-2002 mike

Implement C99's va_copy() macro.


103477 17-Sep-2002 sobomax

Don't reference cpu_fxsr unless CPU_ENABLE_SSE is defined. This fixes kernel
in !CPU_ENABLE_SSE case.


103436 17-Sep-2002 peter

Initiate deorbit burn for the i386-only a.out related support. Moves are
under way to move the remnants of the a.out toolchain to ports. As the
comment in src/Makefile said, this stuff is deprecated and one should not
expect this to remain beyond 4.0-REL. It has already lasted WAY beyond
that.

Notable exceptions:
gcc - I have not touched the a.out generation stuff there.
ldd/ldconfig - still have some code to interface with a.out rtld.
old as/ld/etc - I have not removed these yet, pending their move to ports.
some includes - necessary for ldd/ldconfig for now.

Tested on: i386 (extensively), alpha


103409 16-Sep-2002 mini

Add kernel support needed for the KSE-aware libpthread:
- Maintain fpu state across signals.
- Save and restore FPU state properly in ucontext_t's.

Reviewed by: bde, deischen, julian
Approved by: -arch


103408 16-Sep-2002 mini

Add kernel support needed for the KSE-aware libpthread:
- Maintain fpu state across signals.
- Save and restore FPU state properly in ucontext_t's.

Reviewed by: deischen, julian
Approved by: -arch


103407 16-Sep-2002 mini

Add kernel support needed for the KSE-aware libpthread:
- Maintain fpu state across signals.
- Use ucontext_t's to store KSE thread state.
- Synthesize state for the UTS upon each upcall, rather than
saving and copying a trapframe.
- Save and restore FPU state properly in ucontext_t's.

Reviewed by: deischen, julian
Approved by: -arch


103367 15-Sep-2002 julian

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


103346 15-Sep-2002 dwmalone

Some BIOSs are using MTRR values that are only documented under NDA
to control the mapping of things like the ACPI and APM into memory.

The problem is that starting X changes these values, so if something
was using the bits of BIOS mapped into memory (say ACPI or APM),
then next time they access this memory the machine would hang.

This patch refuse to change MTRR values it doesn't understand,
unless a new "force" option is given. This means X doesn't change
them by accident but someone can override that if they really want
to.

PR: 28418
Tested by: Christopher Masto <chris@netmonger.net>,
David Bushong <david@bushong.net>,
Santos <casd@myrealbox.com>
MFC after: 1 week


103225 11-Sep-2002 rwatson

Whitespace consistency fix from addition of IAHD_REG_PRETTY_PRINT: use
tabs not spaces.


103145 09-Sep-2002 jhb

Make sure a $PIR table header has a valid length before accepting the table
as valid.

Submitted by: Michal Mertl <mime@traveller.cz>


103122 09-Sep-2002 phk

#include "opt_bla.h" goes first says Bruce.


103109 09-Sep-2002 kuriyama

Use "options " rather than "options<tab>".


103102 08-Sep-2002 phk

Fix style(9) bugs.

Brucified by: bde


103081 07-Sep-2002 jmallett

Fill out two fields (si_pid, si_uid) in the siginfo structure handed back
to userland in the signal handler that were not being iflled out before, but
should and can be.

This part of sendsig could be slightly refactored to use an MI interface, or
ideally, *sendsig*() would have an API change to accept a siginfo_t, which
would be filled out by an MI function in the level above sendsig, and said MI
function would make a small call into MD code to fill out the MD parts (some
of which may be bogus, such as the si_addr stuff in some places). This would
eventually make it possible for parts of the kernel sending signals to set up
a siginfo with meaningful information.

Reviewed by: mux
MFC after: 2 weeks


103076 07-Sep-2002 jmallett

Match the more modern ports and comment the filling of POSIX parts of siginfo
with 'Fill in POSIX parts'. (Diff reduction.)


103064 07-Sep-2002 peter

Automatically enable CPU_ENABLE_SSE (detect and enable SSE instructions)
if compiling with I686_CPU as a target. CPU_DISABLE_SSE will prevent
this from happening and will guarantee the code is not compiled in.

I am still not happy with this, but gcc is now generating code that uses
these instructions if you set CPUTYPE to p3/p4 or athlon-4/mp/xp or higher.


103049 07-Sep-2002 peter

Zap the implementations of the i386-aout specific cpu_coredump function.
Most of the non-i386 platforms had rather broken implementations anyway.


103044 06-Sep-2002 jhb

Add a subclass of the PCI-PCI bridge driver that uses the PCIBIOS to
route interrupts if the child bus is described in the PCIBIOS interrupt
routing table. For child busses that are in the routing table, they do
not necessarily use a 'swizzle' on their pins on the parent bus to route
interrupts for child devices. If the child bus is an embedded device then
the pins on the child devices can be (and usually are) directly connected
either to a PIC or to a Interrupt Router. This fixes PCIBIOS interrupt
routing across PCI-PCI bridges for embedded devices.


103043 06-Sep-2002 jhb

Add a function pci_probe_route_table() that returns true if our PCI BIOS
supports interrupt routing and if the specified PCI bus is present in the
routing table.


103037 06-Sep-2002 jhb

Dump the $PIR table if booting verbose.


103025 06-Sep-2002 jhb

- Add a pci_cfgintr_valid() function to see if a given IRQ is a valid
IRQ for an entry in a PCIBIOS interrupt routing ($PIR) table.
- Change pci_cfgintr() to except the current IRQ of a device as a fourth
argument and to use that IRQ for the device if it is valid.
- If an intpin entry in a $PIR entry has a link of 0, it means that that
intpin isn't connected to anything that can trigger an interrupt. Thus,
test the link against 0 to find invalid entries in the table instead of
implicitly relying on the irqs field to be zero. In the machines I have
looked at, intpin entries with a link of 0 often have the bits for all
possible interrupts for PCI devices set.


103023 06-Sep-2002 jhb

If we are using APIC_IO tell ACPI so it can route interrupts properly.
This still doesn't work quite right because of other APIC_IO hacks in
the i386 PCI code.


103017 06-Sep-2002 jhb

Add support for printing out the contents of a PCI BIOS $PIR interrupt
routing table on the console. Eventually it will be printed during
verbose boots.


103016 06-Sep-2002 jhb

Prefer the physical bus number of the PCI bus as the unit of the pciX
device created.


102976 05-Sep-2002 jhb

Test PCIbios.ventry against 0 to see if we found a PCIbios entry point,
not the 'entry' member. The entry point is formed from both a base and
a relative entry point. 'entry' is that relative offset. It is perfectly
valid to have an entry point with a relative offset of 0. PCIbios.ventry
is the virtual address of the entry point that takes both 'base' and
'entry' into account, thus it is the proper variable to test to see if we
have an entry point or not.


102974 05-Sep-2002 jhb

Move some variables to the BSS instead of explicitly zero'ing them. This
also makes all of the PCIbios variable be zero'd, not just the entry field.


102972 05-Sep-2002 obrien

Statically compile pcn(4) into the install kernel vs. using as module.
lnc(4) will attach to AMD PCnet/FAST NICs if pcn(4) does not attach.
I.e. pcn(4) gets first chance. There is a problem however in that pcn(4)
was moved out of the install kernel so that the module would be used.
This however causes bad installs if one has an AMD PCnet/FAST NIC.


102934 04-Sep-2002 phk

Change the support for AMDs ElanSC520 CPU from being a device driver to
be
options CPU_ELAN
(NB: Soekris.com users!)

It is cleaner this way. We still recognize the cpu on the host-pci bridge.


102932 04-Sep-2002 jhb

Function prototypes don't need 'extern'.


102920 04-Sep-2002 jhb

Use resource_list_print_type() instead of duplicating the code in
nexus_print_resources().


102808 01-Sep-2002 jake

Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.


102689 31-Aug-2002 gibbs

Enable ahd/ahc register pretty printing by default. This expedites
handling of bug reports.


102667 31-Aug-2002 bde

db_ps.c:
Don't attempt to follow null pointers for zombie processes in db_ps().

Style fix: use explicit an comparison with NULL for all null pointer
checks in db_ps() instead of for half of them.

db_interface.c:
Fixed ddb's handling of traps from with ddb on i386's only.

This was mostly fixed in rev.1.27 (by longjmp()'ing back to the top
level) but was completly broken in rev.1.48 (by not unwinding the new
state (mainly db_active) either before or after the longjmp(). This
mostly never worked for other arches, since rev.1.27 has not been ported
and lower level longjmp()'s only handle traps for memory accesses. All
cases should be handled at a lower level to provided better control and
simplify unwinding of state.

Implementation details: don't pretend to maintain db_active in a nested
way -- ddb cannot be reentered in a nested way. Use db_active instead
of the db_global_jmpbuf_valid flag and longjmp()'s return value for things
related to reentering ddb. [re]entering is still not atomic enough.


102666 31-Aug-2002 peter

Take a shot at fixing up a whole stack of style and other embarresing
unforced errors that Bruce identified. I have not yet addressed all of
his concerns.


102603 30-Aug-2002 ache

Unbreak kernel build by printing Maxmem using %ld instead of old (now changed)
%u


102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


102561 29-Aug-2002 jake

Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.


102543 28-Aug-2002 peter

OK, I have had it with losing my console because the AP's print their "I am
alive!" message right as the scsi probe messages happen. This is a bit
nasty, but it seems to work. At the point that we unlock the AP's, briefly
wait till they are all done while we hold the console on their behalf.


102412 25-Aug-2002 charnier

Replace various spelling with FALLTHROUGH which is lint()able


102399 25-Aug-2002 alc

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


102329 23-Aug-2002 peter

Ok, somebody please shoot me. The asm I wrote for the ranged IPI shootdown
was wrong. It only ever invalidated one page due to me getting the loop
terminator wrong. This explains the DISABLE_PG_G effect on SMP.


102315 23-Aug-2002 mike

Move several MI types from <machine/_types.h> to <sys/_types.h>.
These types are unlikely to ever become very MD. They include:
clockid_t, ct_rune_t, fflags_t, intrmask_t, mbstate_t, off_t, pid_t,
rune_t, socklen_t, timer_t, wchar_t, and wint_t.

While moving them, make a few adjustments (submitted by bde):
o __ct_rune_t needs to be precisely `int', not necessarily __int32_t,
since the arg type of the ctype functions is int.
o __rune_t, __wchar_t and __wint_t inherit this via a typedef of
__ct_rune_t.
o Some minor wording changes in the comment blocks for ct_rune_t and
mbstate_t.

Submitted by: bde (partially)


102291 22-Aug-2002 archie

Replace (ab)uses of "NULL" where "0" is really meant.


102241 21-Aug-2002 archie

Don't use "NULL" when "0" is really meant.


102227 21-Aug-2002 mike

o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif

Concept by: bde
Reviewed by: jake, obrien


102179 20-Aug-2002 mux

Use the __BUS_ACCESSOR macro for NEXUS_ACCESSOR
instead of rolling our own implementation.

Reviewed by: tmm


102153 20-Aug-2002 peter

remove unit counts from atkbdc, pckbd, sc


102041 18-Aug-2002 alc

o Simplify the ptphint test in pmap_release_free_page(). In other words,
make it just like the test in _pmap_unwire_pte_hold().


101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101907 15-Aug-2002 imp

pccbb->cbb


101879 14-Aug-2002 jmallett

Document why the has_f00f_bug variable is initialised rather than placed into
the BSS (so that it can be binary-patched).

Inspired by: bde


101770 13-Aug-2002 alc

o Remove an unnecessary vm_page_flash() from _pmap_unwire_pte_hold().

Reviewed by: peter


101751 12-Aug-2002 alc

o Convert three instances of vm_page_sleep_busy() into vm_page_sleep_if_busy()
with page queue locking.


101721 12-Aug-2002 iedowse

Use roundup2() to avoid a problem where pmap_growkernel was unable
to extend the kernel VM to the maximum possible address of 4G-4M.

PR: i386/22441
Submitted by: Bill Carpenter <carp@world.std.com>
Reviewed by: alc


101704 11-Aug-2002 mjacob

Add support for the LSI-Logic Fusion/MP architecture.

This is an architecture that present a thing message passing interface
to the OS. You can query as to how many ports and what kind are attached
and enable them and so on.

A less grand view is that this is just another way to package SCSI (SPI or
FC) and FC-IP into a one-driver interface set.

This driver support the following hardware:

LSI FC909: Single channel, 1Gbps, Fibre Channel (FC-SCSI only)
LSI FC929: Dual Channel, 1-2Gbps, Fibre Channel (FC-SCSI only)
LSI 53c1020: Single Channel, Ultra4 (320M) (Untested)
LSI 53c1030: Dual Channel, Ultra4 (320M)

Currently it's in fair shape, but expect a lot of changes over the
next few weeks as it stabilizes.

Credits:

The driver is mostly from some folks from Jeff Roberson's company- I've
been slowly migrating it to broader support that I it came to me as.

The hardware used in developing support came from:

FC909: LSI-Logic, Advansys (now Connetix)
FC929: LSI-Logic
53c1030: Antares Microsystems (they make a very fine board!)

MFC after: 3 weeks


101635 10-Aug-2002 alc

o Remove the setting and clearing of the PG_MAPPED flag. (This flag is
obsolete.)


101588 09-Aug-2002 brooks

Make ppp(4) devices clonable and unloadable.


101459 07-Aug-2002 iwasaki

Improve stack manipulation code of ACPI wakeup routine.
The new code just override stack top value with saved return address
rather than pop/push operation.

Submitted by: jhb


101445 07-Aug-2002 imp

Add Intersil and Symbol as vendors for 802.11 cards that the wi driver
supports.

Obtained from: NetBSD


101352 05-Aug-2002 peter

Revert rev 1.356 and 1.352 (pmap_mapdev hacks). It wasn't worth the
pain.


101349 05-Aug-2002 alc

o Introduce pmap_page_is_mapped(). Its purpose is to obsolete
the PG_MAPPED flag.


101324 04-Aug-2002 anholt

Add device agp to GENERIC, filter it out of floppy builds

Approved by: des (mentor)


101322 04-Aug-2002 peter

Fix a mistake in 1.352 - I was returning a pointer to the rounded down
address. I expect this will fix acpica.


101321 04-Aug-2002 imp

Remove commented out PCI_ENABLE_IO_MODES. It is gone now.


101294 04-Aug-2002 alc

o Request a wired page from vm_page_grab() in _pmap_allocpte().


101280 03-Aug-2002 alc

o Ask for a prezeroed page in pmap_pinit() for the page directory page.


101254 03-Aug-2002 alc

o Don't set PG_MAPPED on the page allocated and mapped in _pmap_allocpte().
(Only set this flag if the mapping has a corresponding pv list entry,
which this mapping doesn't.)


101249 03-Aug-2002 peter

Take advantage of the fact that there is a small 1MB direct mapped region
on x86 in between KERNBASE and the kernel load address. pmap_mapdev()
can return pointers to this for devices operating in the isa "hole".


101248 03-Aug-2002 peter

Take a shot at fixing a nasty bug in the pmap changes that I did. I
missed the pmap_kenter/kremove in this file, which leads to read()/write()
of /dev/mem using stale TLB entries. (gah!) Fortunately, mmap of /dev/mem
wasn't affected, so it wasn't as bad as it could have been. This throws
some light on the 'X server affects stability' thread....

Pointed out by: bde


101235 02-Aug-2002 phk

Move a prototype to the least wrong place.

Suggested by: bde


101197 02-Aug-2002 alc

o Lock page queue accesses by vm_page_deactivate().


101165 01-Aug-2002 blackend

Fix the link to the Handbook


101140 01-Aug-2002 iwasaki

Fix a bug about stack manipulation at ACPI wakeup.
This should avoid kernel panic on kernel compiled w/o
NO_CPU_COPTFLAGS.

Suggested by: optimized code by -mcpu=pentiumpro


101105 31-Jul-2002 alc

o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped
by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably
leads to unnecessary pmap_page_protect() calls if one of these pages is
paged out after unwiring.

Note: setting PG_MAPPED asserts that the page's pv list may be
non-empty. Since checking the status of the page's pv list isn't any
harder than checking this flag, the flag should probably be eliminated.
Alternatively, PG_MAPPED could be set by pmap_enter() exclusively
rather than various places throughout the kernel.


101054 31-Jul-2002 phk

The Elan SC520 MMCR is actually 16bit wide, so u_char is inconvenient.


100969 30-Jul-2002 iwasaki

Resolve conflicts arising from the ACPI CA 20020725 import.


100912 30-Jul-2002 alc

o Lock page queue accesses by pmap_release_free_page().


100882 29-Jul-2002 mike

Create a new header <machine/_stdint.h> for storing MD parts of
<stdint.h>. Previously, parts were defined in <machine/ansi.h> and
<machine/limits.h>. This resulted in two problems:
(1) Defining macros in <machine/ansi.h> gets in the way of that
header only defining types.
(2) Defining C99 limits in <machine/limits.h> adds pollution to
<limits.h>.


100862 29-Jul-2002 alc

o Pass VM_ALLOC_WIRED to vm_page_grab() rather than calling vm_page_wire()
in pmap_new_thread(), pmap_pinit(), and vm_proc_new().
o Lock page queue accesses by vm_page_free() in pmap_object_init_pt().


100781 28-Jul-2002 peter

Unwind the syscall_with_err_pushed tweak that jake did some time back.

OK'ed by: jake


100646 24-Jul-2002 julian

Add some locking asserts and some comments


100551 23-Jul-2002 peter

de-count pci


100464 21-Jul-2002 peter

Add explicit unit count on 'device pci' for ahc/ahd


100435 21-Jul-2002 imp

style(9)ize the whole file

Approved in concept a long time ago by: msmith


100432 21-Jul-2002 peter

Move SWTCH_OPTIM_STATS related code out of cpufunc.h. (This sort of stat
gathering is not an x86 cpu feature)


100385 20-Jul-2002 peter

Regenerate


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


100378 19-Jul-2002 alc

o Use vm_page_alloc(... | VM_ALLOC_WIRED) in place of vm_page_wire().


100374 19-Jul-2002 gallatin

Add support for probing secondary buses on the ServerWorks Grand Champion
chipset used for P4-Xeon machines

PR: kern/38894
Tested-by: "Marc G. Fournier" <scrappy@hub.org>
Submitted-by: Mark Tinguely (partially)


100327 18-Jul-2002 markm

Beautify. This has the side effect of improving portability and
making lint work cleaner.

Inspired to do this by: jhb


100321 18-Jul-2002 phk

Add initialization code for the AMD Elan sc520 which maps the MMCR
into KVM and sets the i8254 frequency to the correct value.


100310 18-Jul-2002 phk

Add an entry for the AMD Elan SC520 hostbridge. I do not belive we can
identify this gadget on the CPUID result alone, so I intend to activate
the necessary magic (i8254 frequency for instance) for it based on the
precense of the on-chip host to PCI bridge.


100275 18-Jul-2002 peter

Use pmap_kenter() rather than vtopte() and bashing the page tables
directly.


100264 17-Jul-2002 peter

Avoid trying to set PG_G on the first 4MB when we set up the 4MB page.
This solves the SMP panic for at least one system. I'd still like to know
why my xeon works though.

Tested by: bmilekic


100251 17-Jul-2002 markm

Clean up the syntax WRT semicolons at the end of function-like-macros, and protect GCCisms from non-GNU compilers and lint.


100220 17-Jul-2002 dillon

Qualify comment on machdep.cpu_idle_hlt. Turning this on on a SMP
machine will result in approximately a 4.2% loss of performance (buildworld)
and approximately a 5% reduction in power consumption (when idle). Add XXX
note on how to really make hlt work (send an IPI to wakeup HLTed cpus on
a thread-schedule event? Generate an interrupt somehow?).


100189 16-Jul-2002 jhb

Various comment and minor style fixes. No actual content changes.

Inspired by: bde


100163 16-Jul-2002 markm

Retire the perl gethints.conf in favour of an awk version. Move
the awk version to a central place for maintenance.

Submitted by: Cyrille Lefevre <cyrille.lefevre@laposte.net>


100152 15-Jul-2002 peter

The pmap_invalidate_all() here is definately not a good idea. We are
running with interrupts disabled, other cpus locked down, and only
making a temporary local mapping that we immediately back out again.

Tested by: gallatin


100115 15-Jul-2002 jhb

makeLINT.send has been moved to sys/conf so we can build LINT on other
architectures besides i386.


100079 15-Jul-2002 markm

Wrap GNU specific code in ifdefs, and help lint out by providing
some alternative definitions.


100078 15-Jul-2002 markm

Cast to prevent "signed/unsigned comparison" warnings.


100077 15-Jul-2002 markm

Warnings and lint-assisting fixes; mark unused function parameters as
unused; wrap GNUisms (asm code) in appropriate #ifdefs.


99987 14-Jul-2002 alc

o Lock page queue accesses by vm_page_wire().


99932 13-Jul-2002 bde

Quick fix for high resolution kernel profiling on i386's. Use
-finstrument-functions instead of -mprofiler-epilogue. The former
works essentially the same as the latter but has a higher overhead
(about 22 more bytes per function for passing unused args to the
profiling functions).

Removed all traces of the IDENT Makefile variable, which had been
reduced to just a place for holding profiling's contribution to CFLAGS
(the IDENT that gives the kernel identity was renamed to KERN_IDENT).


99931 13-Jul-2002 peter

Two invlpg's slipped through that were not protected from I386_CPU

Pointed out by: dillon


99930 13-Jul-2002 peter

invlpg() does not work too well on i386 cpus. Add token i386 support
back in to the pmap_zero_page* stuff.


99929 13-Jul-2002 peter

Do global shootdowns when switching to/from 4MB pages. I believe we can
do a shootdown on a 4MB "page" though, but this should be safer for now.

Noticed by: tegge


99928 13-Jul-2002 peter

Bandaid for SMP. Changing APTDpde without a global shootdown is not
safe yet. We used to do a global shootdown here anyway so another day
or so shouldn't hurt.


99925 13-Jul-2002 alc

o Lock some page queue accesses, in particular, those by vm_page_unwire().


99900 13-Jul-2002 mini

Add additional cred_free_thread() calls that I had missed the first time.

Pointed out by: jhb


99890 12-Jul-2002 dillon

Re-enable the idle page-zeroing code. Remove all IPIs from the idle
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.

A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.

Two-way SMP buildworld -j 5 tests (second run, after stabilization)
2282.76 real 2515.17 user 704.22 sys before peter's IPI commit
2266.69 real 2467.50 user 633.77 sys after peter's commit
2232.80 real 2468.99 user 615.89 sys after this commit

Reviewed by: peter, jhb
Approved by: peter


99887 12-Jul-2002 jhb

Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.

Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64


99862 12-Jul-2002 peter

Revive backed out pmap related changes from Feb 2002. The highlights are:
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.

Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.

I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.

I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.

New option: DISABLE_PG_G - In case I missed something.


99854 12-Jul-2002 alfred

Introduce syscall.master option 'COMPAT4' which allows one to wrap
syscalls for FreeBSD 4 compatibility.
Add kernel option COMPAT_FREEBSD4 to enable these syscalls.


99852 12-Jul-2002 peter

Unexpand a couple of 8-space indents that I added in rev 1.285.


99766 11-Jul-2002 peter

Bah, move the invltlb counter to C code and hook a debug sysctl onto it.


99765 11-Jul-2002 peter

s/NCPU/MAXCPU/ to try and get this to compile.


99746 10-Jul-2002 julian

fix a comment and note a problem with XXXSMP


99742 10-Jul-2002 dillon

Remove the critmode sysctl - the new method for critical_enter/exit (already
the default) is now the only method for i386.

Remove the paraphanalia that supported critmode. Remove td_critnest, clean
up the assembly, and clean up (mostly remove) the old junk from
cpu_critical_enter() and cpu_critical_exit().


99741 10-Jul-2002 obrien

Consistently line-up /**/ comments so they don't cause line wrappage.


99703 10-Jul-2002 julian

Include all of isa/ipl.s into exception.s as there is now nothing left in
ipl.s except doreti which really belongs in with the exceptions as it's
just the other side of the same coin. Will remove ipl.s in a separate commit.

Agreed by: several including bde@freebsd.org


99588 08-Jul-2002 markm

Comment out apm; ACPI is the modern replacement, and folks who really
need it can uncomment it. This may buy us some kernel space.

Discussed with: imp & msmith (quite a while ago)


99581 08-Jul-2002 peter

The clock is already allocated as 'fast' - no need to try and intercept a
'slow' interrupt registration and convert it into 'fast'.


99578 08-Jul-2002 peter

Cosmetic. Remove #if 0 definition of vtophys() - it predates 4MB pages.
Remove avtophys(), it isn't referenced anywhere.


99571 08-Jul-2002 peter

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


99567 08-Jul-2002 peter

s/procrunnable/kserunnable/ in a comment


99561 08-Jul-2002 peter

Fix a hideous TLB bug. pmap_unmapdev neglected to remove the device
mappings from the page tables, which were mapped with PG_G! We could
reuse the page table entry for another mapping (pmap_mapdev) but it
would never have cleared any remaining PG_G TLB entries.


99559 07-Jul-2002 peter

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


99544 07-Jul-2002 imp

Make NEWCARD the default pccard/cardbus system.


99537 07-Jul-2002 mux

One #include <sys/lock.h> is enough.

Submitted by: Olivier Houchard <cognet@ci0.org>


99481 06-Jul-2002 obrien

Make space for compilations.


99415 04-Jul-2002 peter

Diff reduction (microoptimization) with another WIP. Move the frame
calculation in get_ptbase() to a little later on.


99398 04-Jul-2002 julian

Don't free pages we never allocated..

My eyes openned by: Matt


99397 04-Jul-2002 julian

Slight restatement of the code and remove some unused variables.


99381 03-Jul-2002 julian

Add comments and slightly rearrange the thread stack assignment code
to try make it less obscure.


99380 03-Jul-2002 julian

Remove vestiges of old code...
These functions are always called on new memory so they can
not already be set up, so don't bother testing for that.
(This was left over from before we used UMA (which is cool))


99149 30-Jun-2002 iwasaki

Resolve conflicts arising from the ACPI CA 20020404 import.


99129 30-Jun-2002 obrien

This is the start of the FreeBSD/x86_64 kernel.


99123 30-Jun-2002 obrien

This is the start of the FreeBSD/x86_64 kernel.


99122 30-Jun-2002 obrien

Gcc 3.1 varargs support.


99117 30-Jun-2002 mike

Since printf(3) now supports the `j' conversion specifier, use that
when printing intmax_t and uintmax_t.

Forgotten by: mike
Noticed by: bde


99106 30-Jun-2002 rwatson

Remove ALT_BREAK_TO_DEBUGGER. This was inconsistent (both in form
and function) with existing configuration choices. Arguably if
ALT_BREAK_TO_DEBUGGER was present, so should have been
BREAK_TO_DEBUGGER. Regardless, it broke the option sort order in
these kernel configuration files.

Requested by: bde


99095 29-Jun-2002 julian

Fix reverse ordering of locks. add a comment about locks on some platforms.

Submitted by: jhb@freebsd.org


99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


99026 29-Jun-2002 julian

Add files that are new for KSE.


99013 29-Jun-2002 peter

Remove a couple of __P() stragglers.


98902 27-Jun-2002 arr

Fix for the problem stated below by Tor Egge:
(from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832566+0+ \
current/freebsd-current)

"Too many pages were prefaulted in pmap_object_init_pt, thus
the wrong physical page was entered in the pmap for the virtual
address where the .dynamic section was supposed to be."

Submitted by: tegge
Approved by: tegge's patches never fail


98892 26-Jun-2002 iedowse

Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.


98824 25-Jun-2002 iedowse

Complete the initial set of VM changes required to support full
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:

o Use a 64-bit type for the vm_object `size' and the size argument
to vm_object_allocate().
o Use the correct type for index variables in dev_pager_getpages(),
vm_object_page_clean() and vm_object_page_remove().
o Avoid an overflow in the i386 pmap_object_init_pt().


98778 24-Jun-2002 peter

Compile in the cpu halt code even on SMP, instead just default the
sysctl (machdep.cpu_idle_hlt) to off in the SMP case. This allows you to
turn it on if you wish and do not particularly care about the small window
where a cpu will remain halted even when a job is placed on the run queue
(until the next clock tick).


98765 24-Jun-2002 jake

Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.


98728 24-Jun-2002 mini

userout -> out. These two labels are now identical.

Approved by: alfred


98727 24-Jun-2002 mini

Remove unused diagnostic function cread_free_thread().

Approved by: alfred


98650 22-Jun-2002 mp

Add additional cpuid feature flags and put into a canonical format.

MFC after: 1 week


98627 22-Jun-2002 jmallett

Use rm -f in the clean target, as seems to be common practice, and also avoids
errors if no LINT exists.

Submitted by: dwcjr


98618 22-Jun-2002 mp

Clock frequencies reported by sysctl should be unsigned values. Discovered
when machdep.tsc_freq returned a negative number on a 2.2GHz Xeon.

Submitted by: Brian Harrison <bharrison@ironport.com>
Reviewed by: phk
MFC after: 1 week


98542 21-Jun-2002 mckusick

This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@freebsd.org>


98480 20-Jun-2002 peter

Deorbit suibyte(). It was only used for split address space systems
for supporting UIO_USERISPACE (ie: it wasn't used).


98469 20-Jun-2002 peter

Move the "- 1" into the RQB_FFS(mask) macro itself so that
implementations can provide a base zero ffs function if they wish.
This changes
#define RQB_FFS(mask) (ffs64(mask))
foo = RQB_FFS(mask) - 1;
to
#define RQB_FFS(mask) (ffs64(mask) - 1)
foo = RQB_FFS(mask);
On some platforms we can get the "- 1" for free, eg: those that use the
C code for ffs64().

Reviewed by: jake (in principle)


98361 17-Jun-2002 jeff

- Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.


98145 12-Jun-2002 bde

If trap() is called when ddb is active, then go directly to trap_fatal();
do not blunder around enabling interrupts and running trap handlers.
trap_pfault() will normally pass control to ddb's fault handler which
will normally do the right thing.

This bug is very old. but in old versions of FreeBSD it is probably only
serious for trap handling that involves sleeping. In -current, attempting
to examine unmapped memory while stopped at a breakpoint at mi_switch()
was always fatal.


98001 07-Jun-2002 jhb

- Fixup / remove obsolete comments.
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.


97937 06-Jun-2002 gibbs

Hook up the ahd driver.


97721 01-Jun-2002 alfred

Silence preprocessor warning, No need to use CONCAT with "," and "word".


97713 01-Jun-2002 bde

Fixed the return value of fpsetmask(). The API requires inversion of the
mask on both input and output to fpsetmask(), but this was only done for
input, so fpsetmask() returned the complement of the old mask (ANDed with
the mask bitfield).

PR: 38170
MFC after: 4 weeks


97711 01-Jun-2002 bde

Fixed style bugs in rev.1.9.


97694 01-Jun-2002 imp

Use a common function to map the bogus intlines.
Don't require pin be non-zero before we map bogus intlines, always do it.
This fixes a number of problems on HP Omnibook computers.

Tested/Reviewed by: Brooks Davis


97564 30-May-2002 dfr

Move the definition of ElfN_Hashelt to common headers. The only platform
which has a different definition for this is alpha.


97500 29-May-2002 obrien

Do not refer to the Intel PRO/1000 by its internal name.

Requested by: pdeuskar


97473 29-May-2002 brooks

Restore the irq=0 => irq=255 hack to pci_cfgintr_search(). Just having
it in pci_cfgregread() wasn't sufficent on at least the HP Omnibook 500.

Reviewed by: imp


97307 26-May-2002 dfr

Add declarations of suword32 and suword64. Add implementations of one or
the other (or both) to all the platforms. Similar for fuword32 and
fuword64.


97261 25-May-2002 jake

Make the run queue parameters machine dependent. Optimize 64 bit
architectures by using a 64 bit word for the bit array which keeps
track of non-empty queues.

Reviewed by: peter


97139 22-May-2002 jhb

Rename pause() to ia32_pause() so it doesn't conflict with the pause()
function defined in <unistd.h>. I didn't #ifdef _KERNEL it because the
mutex implementation in libpthread will probably need this.


97137 22-May-2002 obrien

Restore us back to the rev 1.324 level of having an Intel gigE driver.


97115 22-May-2002 jhb

Debug registers aren't selectors, so use saner names for the variables in
the inline functions for reading and writing the debug registers.


97114 22-May-2002 jhb

- Sort the pause() inline into the appropriate location.
- Add many missing prototypes to the non-GCC section.


97113 22-May-2002 jhb

Rename cpu_pause() to pause(). Originally I was going to make this an
MI API with empty cpu_pause() functions on other arch's, but this
functionality is definitely unique to IA-32, so I decided to leave it
as i386-only and wrap it in #ifdef's. I should have dropped the cpu_
prefix when I made that decision.

Requested by: bde


97087 21-May-2002 rwatson

Permit alternative break sequence to break to debugger in GENERIC. Breakage
of serial break on -CURRENT seems rampant for some reason, and I like
being able to get into ddb.

Reviewed by: peter


97076 21-May-2002 jhb

Add an inline function cpu_pause() for the IA32 'pause' instruction.


96929 19-May-2002 peter

Make this compile with gcc-3.1, which objects to the multi-line string.


96527 13-May-2002 bde

Fixed a semantic error. va_arg(ap, u_short) is nonsense except on i386's
with 16-bit ints, since u_short is promoted when it is passed to a
varargs function. gcc now warns about this. We always pass small
integers (this is well obuscated), so there are no conversion problems.

Fixed a related style bug (bogus cast).


96517 13-May-2002 bde

Fixed a syntax error (a label not followed by a statement).


96317 10-May-2002 obrien

Gcc 3.1 varargs support.


96035 04-May-2002 fenner

Restore the ability interrupt dumps on i386, based on
the old kern_shutdown.c . Other archs might be able to
use similar code but I don't have anything to test on.


95992 03-May-2002 jmallett

Typo fix: detects -> detect.

Reviewed by: phk


95940 02-May-2002 des

Join the pissing contest: generate LINT with a single sed(1) command.
Smaller script, smaller (though equivalent) output.


95922 02-May-2002 kuriyama

Use shell script version (using awk and sed) of makeLINT.pl.


95814 30-Apr-2002 phk

Don't export timecounter structures under debug. with sysctl, they
contain no truly interesting data anymore.


95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


95579 27-Apr-2002 alc

For what it's worth, fix the compilation of an I386_CPU-only kernel
now that certain warnings are fatal.


95571 27-Apr-2002 alc

Don't call vm_map_growstack() from trapwrite() as vm_fault() now performs
this automatically.


95536 27-Apr-2002 scottl

Add a CAM interface to the aac driver. This is useful in case you should
ever connect a SCSI Cdrom/Tape/Jukebox/Scanner/Printer/kitty-litter-scooper
to your high-end RAID controller. The interface to the arrays is still
via the block interface; this merely provides a way to circumvent the
RAID functionality and access the SCSI buses directly. Note that for
somewhat obvious reasons, hard drives are not exposed to the da driver
through this interface, though you can still talk to them via the pass
driver. Be the first on your block to low-level format unsuspecting
drives that are part of an array!

To enable this, add the 'aacp' device to your kernel config.

MFC after: 3 days


95489 26-Apr-2002 phk

Remove the tc_update() function. Any frequency change to the
timecounter will be used starting at the next second, which is
good enough for sysctl purposes. If better adjustment is needed
the NTP PLL should be used.


95410 25-Apr-2002 marcel

Don't use the symbol name to lookup the symbol value when we can use
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.

Don't pass the symbol name to elf_reloc(), as it isn't used any more.


95375 24-Apr-2002 imp

o Work around bugs in the powerof2 macro: It thinks that 0 is a power of
2, but that's not the case. This fixes the case where there were slots
in the PIR table that had no bits set, but we assumed they did and used
strange results as a result.
o Map invalid INTLINE registers to 255 in pci_cfgreg.c. This should allow
us to remove the bogus checks in MI code for non-255 values.

I put these changes out for review a while ago, but no one responded
to them, so into current they go.

This should help us work better on machines that don't route
interrupts in the traditional way.

MFC After: 4286 millifortnights


95374 24-Apr-2002 imp

Fix a PNPID in a comment

Submitted by: David Xu


95320 23-Apr-2002 phk

Don't free(9) a pointer which has been modified.

Chapeau de pointe: mux


95195 21-Apr-2002 markm

Stylify (mainly line up macro EOL-continuation \'s), and add a dummy
alternative for lint.


95076 19-Apr-2002 alfred

Clean up:

Comment run_filter() to explain what it does.

Remove chatty comments.

void busdma_swi() { } -> void busdma_swi(void) { }


94980 18-Apr-2002 rwatson

Since WITNESS doesn't just do mutexes, remove "mutex" from the WITNESS
comment in GENERIC config files of appropriate platforms. For whatever
reason, powerpc didn't use WITNESS in GENERIC.


94977 18-Apr-2002 alc

o Call vm_map_growstack() from vm_fault() if vm_map_lookup() has failed
due to conditions that suggest the possible need for stack growth.
This has two beneficial effects: (1) we can
now remove calls to vm_map_growstack() from the MD trap handlers and (2)
simple page faults are faster because we no longer unnecessarily perform
vm_map_growstack() on every page fault.
o Remove vm_map_growstack() from the i386's trap_pfault().
o Remove the acquisition and release of Giant from i386's trap_pfault().
(vm_fault() still acquires it.)


94967 17-Apr-2002 tegge

Fix typo in adjusted panic message.

Submitted by: cokane


94962 17-Apr-2002 tegge

Update io_apic_ints array properly when revoking an irq mapping.
Adjust panic message.

Submitted by: David Xu <bsddiy@yahoo.com>


94936 17-Apr-2002 mux

Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by: peter


94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


94683 14-Apr-2002 dwmalone

Make the MTRR code a bit more defensive - this should help people
trying to run X on some Athlon systems where the BIOS does odd things
(mines an ASUS A7A266, but it seems to also help on other systems).

Here's a description of the problem and my fix:

The problem with the old MTRR code is that it only expects
to find documented values in the bytes of MTRR registers.
To convert the MTRR byte into a FreeBSD "Memory Range Type"
(mrt) it uses the byte value and looks it up in an array.
If the value is not in range then the mrt value ends up
containing random junk.

This isn't an immediate problem. The mrt value is only used
later when rewriting the MTRR registers. When we finally
go to write a value back again, the function i686_mtrrtype()
searches for the junk value and returns -1 when it fails
to find it. This is converted to a byte (0xff) and written
back to the register, causing a GPF as 0xff is an illegal
value for a MTRR byte.

To work around this problem I've added a new mrt flag
MDF_UNKNOWN. We set this when we read a MTRR byte which
we do not understand. If we try to convert a MDF_UNKNOWN
back into a MTRR value, then the new function, i686_mrt2mtrr,
just returns the old value of the MTRR byte. This leaves
the memory range type unchanged.

I'd like to merge this before the 4.6 code freeze, so if people
can test this with XFree 4 that would be very useful.

PR: 28418, 25958
Tested by: jkh, Christopher Masto <chris@netmonger.net>
MFC after: 2 weeks


94386 10-Apr-2002 dwmalone

Move do_cpuid into the correct place in this file and make
the indentation more like the other multi-line assembley in
this file.

Someone who understands gcc constraints could update the
constraints for do_cpuid.


94383 10-Apr-2002 alc

o In osigreturn(), restore all of the registers in one place.
o Recent changes to osigreturn() and sigreturn() have made them MPSAFE. Add
a comment to this effect.

Submitted by: bde (bullet #1)
Reviewed by: jhb (bullet #2)


94380 10-Apr-2002 dfr

Initial support for executing IA-32 binaries. This will not compile
without a few patches for the rest of the kernel to allow the image
activator to override exec_copyout_strings and setregs.

None of the syscall argument translation has been done. Possibly, this
translation layer can be shared with any platform that wants to support
running ILP32 binaries on an LP64 host (e.g. sparc32 binaries?)


94275 09-Apr-2002 phk

GC various bits and pieces of USERCONFIG from all over the place.


94197 08-Apr-2002 bde

Removed ispc98 sysctl completely. Applications should understand that
ispc98 isn't set if its sysctl doesn't exist. At least make(1) already
understands this.

Approved by: nyan


94151 07-Apr-2002 phk

GC the "dumplo" variable, which is no longer used.

A lot of sys/*/*/machdep.c seems not to be.


93945 06-Apr-2002 nyan

Move ICU_* defines into icu.h.


93944 06-Apr-2002 nyan

Remove pc98 code.


93823 04-Apr-2002 dillon

Embed a struct vmmeter in the per-cpu structure and add a macro,
PCPU_LAZY_INC() which increments elements in it for cases where we
can afford the occassional inaccuracy. Use of per-cpu stats counters
avoids significant cache stalls in various critical paths that would
otherwise severely limit our cpu scaleability.

Adjust all sysctl's accessing cnt.* elements to now use a procedure
which aggregates the requested field for all cpus and for the global
vmmeter.

The global vmmeter is retained, since some stats counters, like v_free_min,
cannot be made per-cpu. Also, this allows us to convert counters from
the global vmmeter to the per-cpu vmmeter in a piecemeal fashion, so
have at it!


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93794 04-Apr-2002 brian

Back out the previous commit.

In the i386 case, options BOOTP requires options NFS_ROOT as well as
options NFSCLIENT. With *both* the NFS options, a bootpc_init()
prototype is brought in by nfsclient/nfsdiskless.h.

In the ia64 case, it just doesn't work and my change just pushes it
further away from working.

Suggested to be wrong by: bde


93793 04-Apr-2002 bde

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


93785 04-Apr-2002 brian

Pre-declare bootpc_init() so that options BOOTP doesn't break the
build in ia64 and i386 due to -Werror.


93731 03-Apr-2002 jhb

First round at trying to split up NOTES into MI and MD portions.
Unfortunately, this level doesn't really provide enough granularity. We
probably need several MI NOTES type files for things that are shared by
several architectures but not by all. For example, the PCI options could
live in a NOTES.pci.

This also updates the Makefile for i386 to generate LINT. The only changes
in the generated LINT are the order of various options.

Suggestions for improvement welcome.


93719 03-Apr-2002 ru

Dike out a highly insecure UCONSOLE option.
TIOCCONS must be able to VOP_ACCESS() /dev/console to succeed.

Obtained from: OpenBSD


93717 03-Apr-2002 marcel

Make the kernel dump header endianness invariant by always dumping
in dump byte order (=network byte order). Swap blocksize and dumptime
to avoid extraneous padding on 64-bit architectures. Use CTASSERT
instead of runtime checks to make sure the header is 512 bytes large.
Various style(9) fixes.

Reviewed by: phk, bde, mike


93702 02-Apr-2002 jhb

- Move the MI mutexes sched_lock and Giant from being declared in the
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.

Tested on: alpha, i386


93607 01-Apr-2002 dillon

Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by: jake


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93496 31-Mar-2002 phk

Here follows the new kernel dumping infrastructure.

Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by: DARPA, NAI Labs


93467 31-Mar-2002 phk

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


93461 31-Mar-2002 alc

Implement i386's (o)sigreturn() like the alpha's: Use copyin() to read
the osigcontext or ucontext_t rather than useracc() followed by direct user-
space memory accesses. This reduces (o)sigreturn()'s execution time by 5-
50%.

Submitted by: bde


93340 28-Mar-2002 jhb

GC #if 0'd assembly mutex micro operations. If someone wants to bring
these back later then can get them from the attic. Also, GC, some stale
macros to acquire and release sleep mutexes in assembly.


93334 28-Mar-2002 nyan

Remove unneeded pc98 hack.


93312 28-Mar-2002 obrien

style(9)

Approved by: jake


93273 27-Mar-2002 jeff

Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag. Remove the dup_list and dup_ok code from subr_witness. Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface. The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by: jhb


93265 27-Mar-2002 dillon

Tab-out the backslashes in icu_vector.s to make it more readable and to
match it up with apic_vector.s.


93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


93024 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


93023 23-Mar-2002 nsouch

Major rework of the iicbus/smbus framework:

- VIA chipset SMBus controllers added
- alpm driver updated
- Support for dynamic modules added
- bktr FreeBSD smbus updated but not tested
- cleanup


93018 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


93017 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


93005 23-Mar-2002 takawata

Add bios area range check (lower side).


92998 23-Mar-2002 obrien

ASM versions of __FBSDID.


92890 21-Mar-2002 alc

o Use the MI vm_map_growstack() instead of grow_stack() in trap_pfault()
and trapwrite().
o On i386/pc98, remove the (now) unused grow_stack().


92860 21-Mar-2002 imp

Fix abuses of cpu_critical_{enter,exit} by converting to
intr_{disable,restore} as well as providing an implemenation of
intr_{disable,restore}.

Reviewed by: jake, rwatson, jhb


92846 21-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


92824 20-Mar-2002 jhb

Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount. Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case. In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place. This also removse td_cache_ucred from struct thread. This is
still only done #ifdef DIAGNOSTIC.

Tested on: i386, alpha


92819 20-Mar-2002 imp

Fix minor style(9) violation in de__Ping


92770 20-Mar-2002 alfred

Remove __P.


92765 20-Mar-2002 alfred

Remove __P.


92761 20-Mar-2002 alfred

Remove __P.


92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


92548 18-Mar-2002 alc

Eliminate grow_stack() from (o)sendsig(). If the stack needs to grow,
copyout() will page fault and perform grow_stack() from trap_pfault().
These calls to grow_stack() accomplish nothing.

Reviewed by: bde


92518 17-Mar-2002 des

s/options\t\t/options \t/


92470 17-Mar-2002 alc

o Stop calling useracc() in (o)sendsig() now that we use copyout()
to copy the sigframe to the user's stack. Useracc() takes a non-trivial
amount of time. Eliminating it speeds up signal delivery by 15% or more.
o Update some comments.

Submitted by: bde


92458 16-Mar-2002 imp

Don't call the bios if the interrupt appaers to be already routed. Some
older PCI BIOSes hate this and this leads to panics when it is done. Also,
assume that a uniquely routed interrupt is already routed. This also
seems to help some older laptops with feable BIOSes cope.


92383 16-Mar-2002 des

Move the definition of PT_[GS]ET{,DB,FP}REGS from the MD ptrace.h to the
MI ptrace.h, since all platforms define them. Keep the MD ptrace.h around
for FIX_SSTEP (which is currently only needed on Alpha).


92018 10-Mar-2002 luigi

Export a (machine dependent) kernel variable bootdev as
machdep.guessed_bootdev, and add code to sysctl to parse its value
and give a (not necessarily correct) name to the device we booted
from (the main motivation for this code is to use the info in the
PicoBSD boot scripts, and the impact on the kernel is minimal).

NOTE: the information available in bootdev is not always reliable,
so you should not trust it too much. The parsing code is the same
as in boot2.c, and cannot cover all cases -- as it is, it seems to
work fine with floppies and IDE disks recognised by the BIOS. It
_should_ work as well with SCSI disks recognised by the BIOS.
Booting from a CDROM in floppy emulation will return /dev/fd0 (because
this is what the BIOS tells us).
Booting off the network (e.g. with etherboot) leaves bootdev unset so
the value will be printed as "invalid (0xffffffff)".

Finally, this feature might go away at some point, hopefully when we
have a more reliable way to get the same information.

MFC-after: 5 days


91978 10-Mar-2002 alc

Condition the compilation of trapwrite() on I386_CPU.


91893 08-Mar-2002 phk

#include <machine/smp.h> in the SMP case.
don't include <sys/smp.h> at all.

Fallout from: probably something jake did.
Hint by: jhb


91778 07-Mar-2002 jake

Add needed includes of machine/smp.h, remove nested include in sys/smp.h
so that inlines in machine/smp.h can use variables declared in sys/smp.h.


91673 05-Mar-2002 jeff

Add a new variable mp_maxid. This is used so that per cpu datastructures may
be allocated as arrays indexed by the cpu id. Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.

Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up. This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.

Reviewed by: jake
Approved by: jake


91640 04-Mar-2002 iwasaki

Add generalized power profile code.
This makes other power-management system (APM for now) to be able to
generate power profile change events (ie. AC-line status changes), and
other kernel components, not only the ACPI components, can be notified
the events.

- move subroutines in acpi_powerprofile.c (removed) to kern/subr_power.c
- call power_profile_set_state() also from APM driver when AC-line
status changes
- add call-back function for Crusoe LongRun controlling on power
profile changes for a example


91504 28-Feb-2002 arr

- Move a comment from being on the same line as a #ifdef to the line
following it. This should have gone in the previous commit, but
misviewed Bruce's patch.

Requested by: bde


91497 28-Feb-2002 markm

Make it a bit clearer where this file is to be used and where it
should not be. (Comments only)

Inspired by: bde


91473 28-Feb-2002 arr

- trap -> trap() in panic() string.
- Translate the message into some sort of understandable english.
- Fix a couple near-by style nits.

Submitted by: bde


91471 28-Feb-2002 silby

Fix a minor swap leak.

Previously, the UPAGES/KSTACK area of processes/threads would leak memory
at the time that a previously swapped process was terminated. Lukcily, the
leak was only 12K/proc, so it was unlikely to be a major problem unless you
had an undersized swap partition.

Submitted by: dillon
Reviewed by: silby
MFC after: 1 week


91469 28-Feb-2002 bmilekic

Make MPLOCKED work again in asm files and stringify it explicitly
where necessary.

Reviewed by: jake


91460 28-Feb-2002 peter

Fix warnings.. bootpc_init() and related.


91429 27-Feb-2002 jhb

Back out part of KSE/M2 that snuck in under the radar: changing the
prototype of bzero() on the i386 to have a volatile first argument.

Requested by: bde, jake


91422 27-Feb-2002 arr

- Insert a space in the panic() string in order more clearly show the
message.


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


91403 27-Feb-2002 silby

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


91394 27-Feb-2002 tmm

Add the following functions/macros to support byte order conversions and
device drivers for bus system with other endinesses than the CPU (using
interfaces compatible to NetBSD):

- bwap16() and bswap32(). These have optimized implementations on some
architectures; for those that don't, there exist generic implementations.
- macros to convert from a certain byte order to host byte order and vice
versa, using a naming scheme like le16toh(), htole16().
These are implemented using the bswap functions.
- stream bus space access functions, which do not perform a byte order
conversion (while the normal access functions would if the bus endianess
differs from the CPU endianess).

htons(), htonl(), ntohs() and ntohl() are implemented using the new
functions above for kernel usage. None of the above interfaces is currently
exported to user land.

Make use of the new functions in a few places where local implementations
of the same functionality existed.

Reviewed by: mike, bde
Tested on alpha by: mike


91368 27-Feb-2002 peter

Re-fix a pointer/integer warning.


91367 27-Feb-2002 peter

Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.


91358 27-Feb-2002 peter

Bandaid for the Uniprocessor kernel exploding. This makes a UP kernel
boot and run (and indeed I am committing from it) instead of exploding
during the int 0x15 call from inside the atkbd driver to get the keyboard
repeat rates.


91353 27-Feb-2002 alfred

clarify panic message


91344 27-Feb-2002 peter

Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places. Do the same for i386. This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead. This will help for PAE down the road.

Obtained from: jake (MI code, suggestions for MD part)


91341 27-Feb-2002 dillon

didn't quite undo the last reversion. This gets it.


91329 26-Feb-2002 dillon

revert compatibility fix temporarily (thought it would not break anything
leaving it in).


91328 26-Feb-2002 dillon

revert last commit temporarily due to whining on the lists.


91322 26-Feb-2002 dillon

Make peter's commit compatible with interrupt-enabled critical_enter()
and exit(), which has already solved the problem in regards to deadlocked
IPI's.


91315 26-Feb-2002 dillon

STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.

Other architectures are unaffected.

* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.

The new code places the burden of entering the critical section
in the trampoline code where it belongs.

* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.

* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.

* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.

* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.

* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.

* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.

Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.

* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.

Performance issues:

One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.

The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.

The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).

The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().

Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.


91262 26-Feb-2002 peter

Fix a warning. useracc() should take a const pointer argument.


91260 25-Feb-2002 peter

Work-in-progress commit syncing up pmap cleanups that I have been working
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.

This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.


91250 25-Feb-2002 peter

Tidy up some warnings


91090 22-Feb-2002 julian

Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by: dillon@freebsd.org


91066 22-Feb-2002 phk

Convert p->p_runtime and PCPU(switchtime) to bintime format.


90998 20-Feb-2002 peter

Pass me the pointy hat please. Be sure to return a value in a non-void
function. I've been running with this buried in the mountains of compiler
output for about a month on my desktop.


90960 20-Feb-2002 cjc

Fix typos in some comments.

PR: i386/35114
Submitted by: Gavin Atkinson <gavin.atkinson@ury.york.ac.uk>


90947 20-Feb-2002 peter

Some more tidy-up of stray "unsigned" variables instead of p[dt]_entry_t
etc.


90849 18-Feb-2002 nyan

Add stubs for bus_space_unmap() and bus_space_free(). They are needed to
release a bus_space_handle allocated by bus_space_subregion().


90776 17-Feb-2002 deischen

Use struct __ucontext in prototypes and associated functions instead of
ucontext_t. Forward declare struct __ucontext in <sys/signal.h> and
remove reliance on <sys/ucontext.h> being included.

While I'm here, also hide osigcontext types from userland; suggested
by bde.

Namespace pollution noticed by: Kevin Day <toasty@shell.dragondata.com>


90770 17-Feb-2002 nyan

Correct typo.


90763 17-Feb-2002 nyan

Move the bus_space_subregion function from the puc driver to the bus_space
sutff.

Reviewed by: jhay


90762 17-Feb-2002 nyan

- Split the routine to initialize a bus_space_handle into the separate
function.
- Only access a bus_space_handle if the resource type is SYS_RES_MEMORY or
SYS_RES_IOPORT.
- Add the bus_space_subregion supports.


90748 17-Feb-2002 julian

If the credential on an incoming thread is correct, don't bother
reaquiring it. In the same vein, don't bother dropping the thread cred
when goinf ot userland. We are guaranteed to nned it when we come back,
(which we are guaranteed to do).

Reviewed by: jhb@freebsd.org, bde@freebsd.org (slightly different version)


90718 16-Feb-2002 bde

Don't leave garbage in parts of fpregs in the fxsr case. All callers
(procfs and ptrace) supply kernel stack garbage, so kernel context was
leaked to userland.

Reviewed by: des


90633 13-Feb-2002 bde

Don't confuse a struct with its first member. This fixes:
./@/i386/i386/machdep.c: In function `init386':
./@/i386/i386/machdep.c:1700: warning: assignment from incompatible pointer type


90629 13-Feb-2002 alfred

Re-enable WITNESS for GENERIC. Since the 5.x branch is mostly about
SMP we'd like as much feedback as possible from users about possible
locking problems as early as possible.

To negate most of the performance impact I've also enabled
WITNESS_SKIPSPIN. I've done this as we've been running WITNESS
over the spinlock code for a while without incident and it goes a
long way to making the performance problems of WITNESS much more
bearable.

Users who should be running current should know about turning WITNESS
off for performance reasons.

That said and done, WITNESS could/should be made into a tuneable,
but we'll leave that as an excersize to those that want to disable
it without a kernel recompile.


90598 13-Feb-2002 rwatson

Remove WITNESS from GENERIC by default: as we grow more locks, this gets
slower, and may be impeding adoption of -CURRENT by developers. We
recommend turning on WITNESS by default on crash boxes, and when doing
locking development. It will probably get turned on by default for a week
or two following any major locking commits, also.

Approved by: all and sundry (jhb, phk, ...)


90590 12-Feb-2002 dwmalone

Add an option CPU_ATHLON_SSE_HACK which attempts to enable the SSE
feature bit on newer Athlon CPUs if the BIOS has forgotten to enable
it.

This patch was constructed using some info made available by John
Clemens at http://www.deater.net/john/PavilionN5430.html

Reviewed by: -audit
MFC after: 3 weeks


90589 12-Feb-2002 dwmalone

Move do_cpuid() from a identcpu.c into cpufunc.h.


90562 12-Feb-2002 alc

Remove an unused (but initialized) variable from vmapbuf().


90515 11-Feb-2002 bde

Garbage-collect the "LOCORE" version of MPLOCKED.


90468 10-Feb-2002 kato

Cosmetic changes:
- Collected i486 identification codes in one place like
586 and 686.
- Merged two cases (0x470 and 0x490) for `Enhanced Am486DX4
Write-Back.'
- Replaced `unknown' into `Unknown'.

Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)


90466 10-Feb-2002 nyan

Add needed include.


90425 09-Feb-2002 kato

Recognize VIA C3 Samuel 2.

MFC after: 3 days


90411 08-Feb-2002 jhb

Apparently during the KSE M2 commit bzero() on the i386 was changed so that
it's first parameter was volatile. Catch i486_bzero() and i586_bzero()'s
prototypes up to this to quiet warnings.


90410 08-Feb-2002 jhb

Don't grab the ICU lock while reading the current pending interrupts and
current masked interrupts from the AT PIC.

Requested by: bde


90372 07-Feb-2002 peter

Attempt to patch up some style bugs introduced in the previous commit


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


90344 07-Feb-2002 phk

GC the PC_SWITCH* symbols which are not used in assembly anymore.


90134 03-Feb-2002 markm

Make the style a little bit more consistant by removing parameter
names from some prototypes. (Other prototypes here already have
these removed).


90132 03-Feb-2002 bde

Use osigreturn(2) instead of sigreturn(2) plus broken magic for returning
from old signal handlers. This is simpler and faster, and fixes (new)
sigreturn(2) when %eip in the new signal context happens to match the
magic value (0x1d516). 0x1d516 is below the default ELF text section,
so this probably never broken anything in practice.

locore.s:
In addition, don't build the signal trampoline for old signal handlers
when it is not used.

alpha:
Not fixed, but seems to be even less broken in practice due to more
advanced magic. A false match occurs for register #32 in mc_regs[].
Since there is no hardware register #32, a false match is only possible
for direct calls to sigreturn(2) that happen to have the magic number
in the spare mc_regs[32] field.


90128 03-Feb-2002 bde

Improve the change in the previous commit: use a stub for osigreturn()
when it is not really used instead of unconditionalizing all of it.


90065 01-Feb-2002 bde

Compile osigreturn() unconditionally since it will always be needed on
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.

ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.

powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.

sparc64: copy the cleaned up the ia64 stub, since there was no stub before.


90024 31-Jan-2002 bde

Finish revs.1.23 and 1.24 so that MCOUNT_ENTER really actually compiles
for SMP in the plain profiling case. It seems to work too.

This error was not detected by LINT because LINT only compiles the
GUPROF profiling case, which is is a superset of the plain profiling
case for !SMP but which is so broken for SMP that the buggy code is
not compiled.


89990 30-Jan-2002 bde

Backed out the main part of revs.1.14-16. Don't disable interrupts in
the packet transfer routines, since rev.1.468 of machdep.c does this
better. I'm surprised that disabling interrupts helped much. Disabling
them in the packet receive routine is too late.

Fixed some minor style bugs in rev.1.14.


89989 30-Jan-2002 bde

Backed out the last vestiges of rev.1.51. Don't enter a critical
region in Debugger(), since rev.1.468 of machdep.c does this better.
Other cosmetic backouts.


89988 30-Jan-2002 bde

Cleaned up the 0ldSiG magic check before removing it. Just use fuword()
to fetch the magic word instead of useracc() plus a direct access.
This is more efficient as well as simpler and less incorrect:
- it was inefficent because useracc() takes much longer than just
accessing the data using a correct access method, at least on i386's.
- it was incorrect because direct access is incorrect unless the address
has been mapped. This and nearby direct accesses are mostly handled
better for other arches because they have to be (direct accesses don't
work).
- using magic in sigreturn is still fundamentally broken because false
matches are possible. On i386's, a false match occurs when %eip in a
new signal context happens to equal the magic value. This is not
handled better for other arches.


89980 30-Jan-2002 bde

Don't include <isa/isavar.h> or compile code depending on it when isa
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.

This was fixed in 1999 but was broken by unconditionalizing PNPBIOS.


89979 30-Jan-2002 bde

Removed unused includes. In particular, don't include <isa/isavar.h> since
its only effect is to break the optionality of the isa option.

Sorted includes.


89631 22-Jan-2002 peter

List bit 18 (reserved, apparently present on thunderbird cpus)
and bit 19 (athlon XP/MP rev 0x662 and later) for amd_features.

Submitted by: dwcjr


89580 20-Jan-2002 msmith

Add the 'iir' driver, for the Intel Integrated RAID controllers and
prior ICP Vortex models. This driver was developed by Achim Leubner
of Intel (previously with ICP Vortex) and Boji Kannanthanam of Intel.

Submitted by: "Kannanthanam, Boji T" <boji.t.kannanthanam@intel.com>
MFC after: 2 weeks


89577 20-Jan-2002 imp

The Libretto L series has no $PIR table, but does have a _PIR table.
This typo keeps us from properly routing an interrupt for CardBus
bridges on this machine. So, now we look for $PIR and then _PIR to
cope. With these changes, the Libretto L1 now works properly.
Evidentally, the idea comes from patch that the Japanese version of
RedHat (or against a Japanese version of Red Hat), but my Japanese
isn't good enough to to know for sure.

Reported by: Hiroyuki Aizu-san <eyes@navi.org>

# This may be an MFC candidate, but I'm not yet sure.


89489 18-Jan-2002 peter

Avoid __func__ string concatenation


89466 17-Jan-2002 bde

Changed the type of pcb_flags from u_char to u_int and adjusted things.
This removes the only atomic operation on a char type in the entire
kernel.


89412 16-Jan-2002 peter

Change <b28> to HTT (Hyperthreading technology). If this flag is set then
cpuid with %eax=1 will return a logical cpu count in bits 16-23 of %ebx.
Bit 29 is actually 'TM' according to AP-485. This signifies the presence
of the thermal control circuit (which I believe can slow the clock down
to reduce core temperature).


89410 16-Jan-2002 peter

Ensure that we set all the %cr0 bits to a known state for the AP's before
they make it through to userland. This should fix the p5-smp problem
without affecting the other cpus (eg: cyrix, see initcpu.c and the special
cache handling for these cpu types).


89195 10-Jan-2002 bde

Clear the single-step flag for signal handlers. This fixes bogus trace
traps on the first instruction of signal handlers.

In trap.c:syscall(), fake a trace trap if the single-step flag was set
on entry to the kernel, not if it will be set on exit from the kernel.
This fixes bogus trace traps after the last instruction of signal handlers.

gdb-4.18 (the version in FreeBSD) still has problems with the program in
the PR. These seem to be due to bugs in gdb and not in FreeBSD, and are
fixed in gdb-5.1 (the distribution version).

PR: 33262
Tested by: k Macy <kip_macy@yahoo.com>
MFC after: 1 day


89179 10-Jan-2002 wes

Fix typo in function name.

Reviewed by: peter@
Obtained from: mux@sneakerz.org


89175 10-Jan-2002 deischen

Use a spare slot in the machine context for a flags word to indicate
whether the machine context is valid and whether the FPU state is
valid (saved).

Mark the machine context valid before copying it out when sending a
signal.

Approved by: -arch


89156 09-Jan-2002 takawata

Fix S3 breakage.
Now AcpiEnterSleep() is light enough, so flushing cache
before the function is not too early.


89054 08-Jan-2002 msmith

Staticise devclasses and some unnecessarily global variables.


88903 05-Jan-2002 peter

Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).


88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


88838 03-Jan-2002 peter

Allow a specific setting for pv entries. This avoids the need to guess
(or calculate by hand) the effect of interactions between shpgperproc,
physical ram size, maxproc, maxdsiz, etc.


88744 31-Dec-2001 dillon

Grrr. The tlb code is strewn over 3 files and I misread it. Revert
the last change (it was a NOP), and remove the XXX comments that no longer
apply.


88742 31-Dec-2001 dillon

You know those 'XXX what about SMP' comments in pmap_kenter()? Well,
they were right. Fix both kenter() and kremove() for SMP by ensuring that
the tlb is flushed on other cpu's. This will directly solve random-corruption
panic issues in -stable when it is MFC'd. Better to be safe then sorry, we
can optimize this later.

Original Suspicion by: peter
Maybe MFC: immediately on re's permission


88719 30-Dec-2001 phk

GC an alternate trap_pfault() which has rotted away behind an "#ifdef notyet"
since 21-Mar-95 .


88376 21-Dec-2001 tmm

Use the new resource_list_print_type() function.
Pass the bus device to isa_init() (this is needed for the sparc64
version).


88322 20-Dec-2001 jhb

Introduce a standard name for the lock protecting an interrupt controller
and it's associated state variables: icu_lock with the name "icu". This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP. This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.


88245 20-Dec-2001 peter

Replace a bunch of:
for (pv = TAILQ_FIRST(&m->md.pv_list);
pv;
pv = TAILQ_NEXT(pv, pv_list)) {
with:
TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {


88240 20-Dec-2001 peter

Fix some whitespace nits, and a minor error that I made in some unused
#ifdef DEBUG code (VM_MAXUSER_ADDRESS vs UPT_MAX_ADDRESS).


88152 18-Dec-2001 jhb

Axe stale extern for a non-existent variable.


88146 18-Dec-2001 julian

In a couple of places, we recalculated addresses we already had in local
pointer variables.


88118 18-Dec-2001 jhb

Various assembly fixes mostly in the form of using the "+" modifier for
output operands to mark them as both input and output rather than listing
operands twice.

Reviewed by: bde


88117 18-Dec-2001 jhb

Allow the ATOMIC_ASM() macro to pass in the constraints on the V parameter
since the char versions need to use either ax, bx, cx, or dx.

Submitted by: Peter Jeremy (mostly)
Recommended by: bde


88088 18-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


88085 17-Dec-2001 jhb

Small cleanups to the SMP code:
- Axe inlvtlb_ok as it was completely redundant with smp_active.
- Remove references to non-existent variable and non-existent file
in i386/include/smp.h.
- Don't perform initializations local to each CPU while holding the
ap boot lock on i386 while an AP bootstraps itself.
- Reorganize the AP startup code some to unify the latter half of the
functions to bring an AP up. Eventually this might be broken out into
a MI function in subr_smp.c.


87902 14-Dec-2001 luigi

Device Polling code for -current.

Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

options DEVICE_POLLING

and at runtime enable polling with

sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications


87894 14-Dec-2001 iedowse

Enable UFS_DIRHASH in the GENERIC kernel.

Suggested by: silby
Reviewed by: dillon
MFC after: 5 days


87886 14-Dec-2001 nyan

Fixed to draw mouse cursor. The syscons driver for PC98 uses different
attributes from i386.

Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)
MFC after: 3 days


87721 12-Dec-2001 jhb

Axe an unneeded PCPU_SET(spinlocks, NULL) that I missed earlier.


87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


87637 11-Dec-2001 peter

Delete some leftover code from a bygone age. We dont have an array of
IdlePTDS anymore and dont to the PTD[MPPTDI] swapping etc.


87620 10-Dec-2001 guido

Add new boot flag to i386 boot: -p.
This flag adds a pausing utility. When ran with -p, during the kernel
probing phase, the kernel will pause after each line of output.
This pausing can be ended with the '.' key, and is automatically
suspended when entering ddb.

This flag comes in handy at systems without a serial port that either hang
during booting or reser.
Reviewed by: (partly by jlemon)
MFC after: 1 week


87603 10-Dec-2001 murray

Add identification string for AMD-761 host to PCI bridge.

PR: kern/32255


87546 09-Dec-2001 dillon

Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after: 1 week


87373 05-Dec-2001 mckusick

Update pathnames for creation of tags file.


87340 04-Dec-2001 des

PROCFS requires PSEUDOFS. I forgot that GENERIC didn't have PSEUDOFS yet.


87311 04-Dec-2001 jhb

Add a missing open paren to a macro that's been broken (and apparently
unused) since rev 1.1 so it is at least correct.

Submitted by: Maxime Henrion <mux@qualys.com>


87122 30-Nov-2001 peter

cpuid bit 30 is 'IA64', for when you're running in i386 mode on an ia64
cpu. (This is for either userland apps running in i386 mode on an ia64
OS, or when the cpu is in i386 legacy mode running an i386 OS).


86921 26-Nov-2001 imp

MFS: I was confused. This code wasn't in -current after all.

Merge in the irq 0 detection. Add comment about why.

If we have irq 0, ignore it like we do irq 255. Some BIOS writers aren't
careful like they should be.


86554 18-Nov-2001 iwasaki

Yet another verbose printing cleanup. Remove debug_wakeup flag and
check common verbose flag instead.


86486 17-Nov-2001 peter

Fix the non-KSTACK_GUARD case.. It has been broken since the KSE
commit. ptek was not been initialized.


86485 17-Nov-2001 peter

Start bringing i386/pmap.c into line with cleanups that were done to
alpha pmap. In particular -
- pd_entry_t and pt_entry_t are now u_int32_t instead of a pointer.
This is to enable cleaner PAE and x86-64 support down the track sor
that we can change the pd_entry_t/pt_entry_t types to 64 bit entities.
- Terminate "unsigned *ptep, pte" with extreme prejudice and use the
correct pt_entry_t/pd_entry_t types.
- Various other cosmetic changes to match cleanups elsewhere.
- This eliminates a boatload of casts.
- use VM_MAXUSER_ADDRESS in place of UPT_MIN_ADDRESS in a couple of places
where we're testing user address space limits. Assuming the page tables
start directly after the end of user space is not a safe assumption.
There is still more to go.


86443 16-Nov-2001 peter

Oops, I accidently merged a whitespace error from the original commit.
(whitespace at end of line in rev 1.264 pmap.c). Fix them all.


86439 16-Nov-2001 peter

Converge/fix some debug code (#if 0'ed on alpha, but whatever)
- use NPTEPG/NPDEPG instead of magic 1024 (important for PAE)
- use pt_entry_t instead of unsigned (important for PAE)
- use vm_offset_t instead of unsigned for va's (important for x86-64)


86430 15-Nov-2001 sobomax

Allow bit 21 of EFLAGS register (PSL_ID) be changed in the use-mode without
ill effects. This should fix problems threaded programs are having with
auto-detecting CPU type.

Reported by: Joe Clarke <marcus@marcuscom.com>
Tested by: Joe Clarke <marcus@marcuscom.com>
Reviewed by: jhb
MFC after: 1 week


86408 15-Nov-2001 jhb

- Don't enable interrupts in trap() if we trapped while holding a spin
lock as this usually makes the problem worse.
- If we get a page fault while holding a spin lock, treat it as a fatal
trap and don't even bother calling into the VM since calling into the
VM will panic when trying to lock Giant before we can get a useful
message anyways.


86303 12-Nov-2001 jhb

Use newer constraints for atomic_cmpset().

Requested by: bde


86301 12-Nov-2001 jhb

Use newer constraints for inline assembly for an operand that is both an
input and an output by using the '+' modifier rather than listing the
operand in both the input and output sections.

Reviwed by: bde


86262 11-Nov-2001 iwasaki

Add two minor changes.
- clean up wakeup routing fixup code by using macros.
- allocate pte object temporary for kernel thread to avoid kernel
panic by events from sleep button or lid switch.


86134 06-Nov-2001 obrien

Fix tab damage in rev 1.326.


86133 06-Nov-2001 iwasaki

Add S4BIOS sleep (BIOS hibernation) and DSDT overriding support.
- Add S4BIOS sleep implementation. This will works well if MIB
hw.acpi.s4bios is set (and of course BIOS supports it and hibernation
is enabled correctly).
- Add DSDT overriding support which is submitted by takawata originally.
If loader tunable acpi_dsdt_load="YES" and DSDT file is set to
acpi_dsdt_name (default DSDT file name is /boot/acpi_dsdt.aml),
ACPI CA core loads DSDT from given file rather than BIOS memory block.
DSDT file can be generated by iasl in ports/devel/acpicatools/.
- Add new files so that we can add our proposed additional code to Intel
ACPI CA into these files temporary. They will be removed when
similar code is added into ACPI CA officially.


85892 02-Nov-2001 mike

o Add new header <sys/stdint.h>.
o Make <stdint.h> a symbolic link to <sys/stdint.h>.
o Move most of <sys/inttypes.h> into <sys/stdint.h>, as per C99.
o Remove <sys/inttypes.h>.
o Adjust includes in sys/types.h and boot/efi/include/ia64/efibind.h
to reflect new location of integer types in <sys/stdint.h>.
o Remove previously symbolicly linked <inttypes.h>, instead create a
new file.
o Add MD headers <machine/_inttypes.h> from NetBSD.
o Include <sys/stdint.h> in <inttypes.h>, as required by C99; and
include <machine/_inttypes.h> in <inttypes.h>, to fill in the
remaining requirements for <inttypes.h>.
o Add additional integer types in <machine/ansi.h> and
<machine/limits.h> which are included via <sys/stdint.h>.

Partially obtain from: NetBSD
Tested on: alpha, i386
Discussed on: freebsd-standards@bostonradio.org
Reviewed by: bde, fenner, obrien, wollman


85835 01-Nov-2001 iwasaki

Some fix for the recent apm module changes.
- Now that apm loadable module can inform its existence to other kernel
components (e.g. i386/isa/clock.c:startrtclock()'s TCS hack).
- Exchange priority of SI_SUB_CPU and SI_SUB_KLD for above purpose.
- Add simple arbitration mechanism for APM vs. ACPI. This prevents
the kernel enables both of them.
- Remove obsolete `#ifdef DEV_APM' related code.
- Add abstracted interface for Powermanagement operations. Public apm(4)
functions, such as apm_suspend(), should be replaced new interfaces.
Currently only power_pm_suspend (successor of apm_suspend) is implemented.

Reviewed by: peter, arch@ and audit@


85806 01-Nov-2001 peter

Skip PG_UNMANAGED pages when we're shooting everything down to try and
reclaim pv_entries. PG_UNMANAGED pages dont have pv_entries to reclaim.

Reported by: David Xu <davidx@viasoft.com.cn>


85793 31-Oct-2001 mjacob

Remove previous revision. smp_started back in subr_smp where it belongs.


85788 31-Oct-2001 mjacob

Make the actual volatile int smp_started live *somewhere*. This is
a temporary fix so that we can compile kernels. I waited 30 minutes
for a response from the person who would likely know, but any longer
is too long to wait with breakage at ToT.


85786 31-Oct-2001 rwatson

Spell deivces as devices.


85762 31-Oct-2001 dillon

Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise(). It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.


85761 31-Oct-2001 msmith

Don't try to probe the PnP BIOS if ACPI is active.


85733 30-Oct-2001 green

Add kmupetext(), a function that expands the range of memory covered
by the profiler on a running system. This is not done sparsely, as
memory is cheaper than processor speed and each gprof mcount() and
mexitcount() operation is already very expensive.

Obtained from: NAI Labs CBOSS project
Funded by: DARPA


85715 30-Oct-2001 imp

Move device lnc to isa section, since it no longer uses the compat shims.
Add comment about lnc.
Remove probe order comment from isa_compat.c. That appears to no longer
be the case.


85711 30-Oct-2001 jhb

Fix a typo in comment and #ifdef fixes: GRAP_PRIO -> GRAB_PRIO so that
x86 SMP kernels actually boot again to single user mode.

Pointy hat to: jhb
Noticed by: jlemon


85695 29-Oct-2001 bde

Don't set CR0_NE in cpu_setregs() for the SMP case, since setting it
is npx.c's job and setting it here breaks the edit-time option of not
setting it in npx.c. (It is not set in the right places for the SMP
case, but always setting it here is harmless because there isn't even
an edit-time option to not set it.)


85627 28-Oct-2001 jhb

- More whitespace and comment cleanups.
- Remove unused sw1a label. A breakpoint can be set in choosethread() for
the same effect.

Reviewed by: bde
Submitted by: bde (partly)


85556 26-Oct-2001 iwasaki

Add APM compatibility feature to ACPI.
This emulates APM device node interface APIs (mainly ioctl) and
provides APM services for the applications. The goal is to support
most of APM applications without any changes.
Implemented ioctls in this commit are:
- APMIO_SUSPEND (mapped ACPI S3 as default but changable by sysctl)
- APMIO_STANDBY (mapped ACPI S1 as default but changable by sysctl)
- APMIO_GETINFO and APMIO_GETINFO_OLD
- APMIO_GETPWSTATUS

With above, many APM applications which get batteries, ac-line
info. and transition the system into suspend/standby mode (such as
wmapm, xbatt) should work with ACPI enabled kernel (if ACPI works well :-)

Reviewed by: arch@, audit@ and some guys


85525 26-Oct-2001 jhb

Add a per-thread ucred reference for syscalls and synchronous traps from
userland. The per thread ucred reference is immutable and thus needs no
locks to be read. However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on: x86 (SMP), alpha, sparc64


85491 25-Oct-2001 jhb

Currently no code does a CROSSJUMP() to sw1a, so we don't need a
CROSSJUMPTARGET() for it.

Submitted by: bde


85490 25-Oct-2001 jhb

Use %ecx instead of %ebx for the scratch register while updating %dr7 since
%ecx isn't a call safe register and thus we don't have to save and restore
it.

Submitted by: bde


85489 25-Oct-2001 jhb

- Fix typo in comment from previous revision.
- Fix a bug in the LDT changes where the wrong argument was passed to
set_user_ldt() from cpu_switch(). The bug was passing a pointer to the
ldt, but set_user_ldt() takes a pointer to the process' mdproc structure.

Submitted by: bde


85487 25-Oct-2001 jhb

Whitespace, comment, and string fixes.

Submitted by: bde (mostly)


85457 25-Oct-2001 jlemon

Add PCI_ENABLE_IO_MODES option, for BIOSen that neglect this.

Submitted by: Andrew R. Reiter arr@watson.org


85452 25-Oct-2001 luigi

Backout 1.61 -- both intrcnt and intrnames are already exported
via sysctl under "hw".


85449 25-Oct-2001 jhb

Split the per-process Local Descriptor Table out of the PCB and into
struct mdproc.

Submitted by: Andrew R. Reiter <arr@watson.org>
Silence on: -current


85419 24-Oct-2001 jhb

- Clean up the comments slightly here to make them more readable.
- Set the type and trapframe number for the F00F workaround since type
can be used later by sv_transtrap(). Debuggers might also want to look
at the type in the trapframe.

Submitted by: bde (mostly)


85384 23-Oct-2001 jhb

Set the code and signal for the F00F hack fault directly instead of
changing the code in the trapframe and looping back to the top of trap
again.

Tested by: cjc


85373 23-Oct-2001 jlemon

Implement multiple low-level console support.


85294 21-Oct-2001 des

[partially forced commit due to pilot error in earlier commit attempt]

{set,fill}_{,fp,db}regs() fixup:

- Add dummy {set,fill}_dbregs() on architectures that don't have them.

- KSEfy the powerpc versions (struct proc -> struct thread).

- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.

These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
one yet.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85271 21-Oct-2001 bde

MFi386:
- sys/pc98/pc98/npx.c 1.87 (2001/09/15; author: imp)
I don't think pc98 has acpi at all, so ifdef the acpi attachments for
now.

This completes merging sys/pc98/pc98/npx.c into sys/i386/isa/npx.c so
that the former can be removed.


85270 21-Oct-2001 bde

MFpc98: fundamental differences. The magic numbers for the i/o port
and the irq are different for pc98, and are not very well handled (we
use a historical mess of hard-coded values, values from header files
and values from hints).


85268 21-Oct-2001 bde

MFpc98: all changes in sys/pc98/pc98/npx.c related to FPU_ERROR_BROKEN.

- 1.58 (2000/09/01; author: kato)
Fixed FPU_ERROR_BROKEN code. It had old-isa code.
- 1.33 (1998/03/09; author: kato)
Make FPU_ERROR_BROKEN a new-style option.
- 1.7 (1996/10/09; author: asami)
Make sure FPU is recognized for non-Intel CPUs.

The log for rev.1.7 should have said something like:
Added FPU_ERROR_BROKEN option. This forces a successful probe for
exception 16, so that hardware with a broken FPU error signal can sort
of work.


85255 20-Oct-2001 mjacob

Remove wx.


85204 20-Oct-2001 obrien

Drop support for x87 emulation. Any CPU one would dare to run 5-CURRENT
on would have built-in FP support.


85035 16-Oct-2001 mjacob

Make SCSI changer and SES devices standard in generic kernels.

Reviewed by: ken@kdm.org


85029 16-Oct-2001 bde

Deleted most of npxprobe(), and merged npxprobe1() back into npxprobe().
Use the normal interrupt handler (npx_intr()) instead of a special
probe-time interrupt handler, although this causes problems due to
the bus_teardown_intr() not actually even tearing down the interrupt
(these problems were avoided by doing interrupt attachment for the
special interrupt handler directly). Fixed minor bitrot in comments.

The reason for the npxprobe()/npxprobe1() split mostly went away at
about the same time it was made (in 1992 or 1993 just before the
beginning of history). 386BSD ran all probes with interrupts completely
masked, and I didn't want to disturb this when I added an irq probe
to npxprobe(). An irq (not necessarily npx) must be acked for at least
external npx's to take the cpu out of the wait state that it enters
when an npx error occurs, so the probe must be done with a suitable
irq unmasked. npxprobe() went to great lengths to unmask precisely
the npx irq.

Running probes with all interrupts masked was never really needed in
FreeBSD, since FreeBSD always masked interrupts well enough using
splhigh(), but it wasn't until rev.1.48 (1995/12/12) of autoconf.c
that all probes were run with CPU interrupts enabled. This permits
npxprobe() to probe its irq using normal interrupt resources. Note
that most drivers still can't depend on this. It depends on the
interrupt handler being fast and the irq not being shared.


85028 16-Oct-2001 bde

Commit my old fixes for cosmetic bugs in npxprobe() so that they aren't
lost when the buggy code goes away completely:
- don't assume that the npx irq number is >= 8. Rev.1.73 only reversed
part of the hard-coding of it to 13 in rev.1.66.
- backed out the part of rev.1.84 that added a highly confused comment
about an enable_intr() being "highly bogus". The whole reason for
existence of npxprobe() (separate from the main probe, npxprobe1())
is to handle the complications to make this enable_intr() safe.
- backed out the part of rev.1.94 that modified npxprobe(). It mainly
broke the enable_intr() to restore_intr(). Restoring the interrupt
state in a nested way is precisely what is not wanted here. It was
harmless in practice because npxprobe() is called with interrupts
enabled, so restoring the interrupt state enables interrupts. Most
of npxprobe() is a no-op for the same reason...


85009 15-Oct-2001 tegge

Explicitly initialize the fpu when SSE is enabled since this no
longer happens as a side effect of calling npxsave.

Reviewed by: peter, bde


84935 14-Oct-2001 tegge

Change vmapbuf() to use pmap_qenter() and vunmapbuf() to use pmap_qremove().

This significantly reduces the number of TLB shootdowns caused by
vmapbuf/vunmapbuf when performing many large reads from raw disk devices.

Reviewed by: dillon


84934 14-Oct-2001 tegge

Reduce the number of TLB shootdowns caused by a call to pmap_qenter()
from number of pages mapped to 1.

Reviewed by: dillon


84850 12-Oct-2001 jdp

Correct the input/output/clobber specifications for the cpuid
instruction. Stefan Keller <dres@earth.serd.org> noticed that CPU
identification was broken when compiled with -O2, and tracked it
down to the asm statement, which was storing values into memory
without specifying that memory was modified. He submitted a patch
which added "memory" as a clobber, but I refined it further to
arrive at this version.

MFC after: 3 days


84815 11-Oct-2001 jhb

Oops, these already included sys/lock.h, they just did so after
sys/mutex.h which is too late.


84812 11-Oct-2001 jhb

Add missing includes of sys/ktr.h.


84811 11-Oct-2001 jhb

Add missing includes of sys/lock.h.


84783 10-Oct-2001 ps

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


84733 09-Oct-2001 iedowse

Remove the Xresume* labels from the i386 interrupt handlers; the
code in ipl.s and icu_ipl.s that used them was removed when the
interrupt thread system was committed. Debuggers also knew about
Xresume* because these labels hide the real names of the interrupt
handlers (Xintr*), and debuggers need to special-case interrupt
handlers to get the interrupt frame.

Both gdb and ddb will now use the Xintr* and Xfastintr* symbols to
detect interrupt frames. Fast interrupt frames were never identified
correctly before, so this fixes the problem of the running stack
frame getting lost in a ddb or gdb trace generated from a fast
interrupt - e.g. when debugging a simple infinite loop in the kernel
using a serial console, the frame containing the loop would never
appear in a gdb or ddb trace.

Reviewed by: jhb, bde


84721 09-Oct-2001 robert

Remove an unneeded variable declaration and statement.

Approved by: jake


84679 08-Oct-2001 jhb

Allow atomic ops to be somewhat safely used in userland. We always use
lock prefixes in the userland case so that the binaries will work on both
SMP and UP systems.


84624 07-Oct-2001 luigi

Export interrupt statistics via sysctl.

MFC-after: 3 days


84615 07-Oct-2001 nyan

Rewrite the pc98 bus_space stuff.

The type of bus_space_tag_t is now a pointer to bus_space_tag structure,
and the bus_space_tag structure saves pointers to functions for direct
access and relocate access.

Added bsh_bam member to the bus_space_handle structure, it saves access
method either direct access or relocate access which is called by
bus_space_* functions.

Added the mecia device support. If the bs_da and bs_ra in bus tag are set
NEPC_io_space_tag and NEPC_mem_space_tag respectively, new bus_space stuff
changes the register of mecia automatically for 16bit access.

Obtained from: NetBSD/pc98


84593 06-Oct-2001 nyan

- Moved the bus_dma declarations from bus_{at386,pc98}.h into bus_dma.h.
(bus_dma.h is repo-copied from bus_at386.h)
- Added '#include <machine/bus_dma.h>' into bus.h for backward compatibility.


84572 06-Oct-2001 peter

Fix a warning. (unused p if not INVARIANTS)


84553 05-Oct-2001 dfr

In in_cksumdata, len must be a signed type.


84381 02-Oct-2001 mjacob

Fix problem where a user buffer outside of the area being tested
will be corrupted.

PR: 29194
Obtained from: Tor.Egge@fast.no
MFC after: 2 weeks


84044 27-Sep-2001 jhb

Disable the check in icu_setup() to see if a handler was already used as
the current interrupt thread routines will guarantee the condition this is
checking for at a higher level but inthand_add() and inthand_remove() as
they currently exist don't satisfy this condition. (Which does need to be
fixed but which will take a bit more work.) This fixes shared interrupts.


84003 27-Sep-2001 jlemon

Return EINVAL if the passed intr is out of bounds.

PR: 30857
Submitted by: David Xu <davidx@viasoft.com.cn>
MFC: 1 week


83972 26-Sep-2001 rwatson

o Modify i386_set_ioperm() to use securelevel_gt() instead of
direct securelevel variable checks.

Obtained from: TrustedBSD Project


83971 26-Sep-2001 rwatson

o Modify device open access control for /dev/mem and friends to use
securelevel_gt() instead of direct securelevel variable checks.

Obtained from: TrustedBSD Project


83936 25-Sep-2001 brooks

The faith(4) device is no longer a count device so don't specify a count.


83872 24-Sep-2001 obrien

+ Fix misplacement of `txp'
+ Document our -CURRENT debugging bits


83827 22-Sep-2001 jedgar

Update NFS_ROOT comments to reflect the NFSCLIENT option
instead of the depricated NFS option.

Reviewed by: peter


83757 21-Sep-2001 peter

Introduce a new option, KVA_SPACE, which can be used to reconfigure
the size of the kernel virtual address space relatively painlessly.
Userland will adapt via the exported kernbase symbol. Increasing
this causes the user part of address space to reduce.


83660 19-Sep-2001 peter

Reserve an extra 16 bytes in case we have to grow the trapframe into
a vm86trapframe for switching to vm86 [unlikely] while exiting.
I lost this when doing the pcb move that went in with the KSE commit.

Reviewed by: jake


83659 19-Sep-2001 peter

Fix a mistake I made with the pcb movement relative to the stack in the
KSE patch. We need to leave the 16 bytes here for enabling the trapframe
to be converted to a vm86trapframe if we're switching *to* a vm86 context.


83651 18-Sep-2001 peter

Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.


83643 18-Sep-2001 jhb

- If we ever do the per-cpu KTR stuff, the index won't be volatile as it
will be private to each CPU.
- Re-style(9) the globaldata structures. There really needs to be a MI
struct pcpu that has a MD struct mdpcpu member at some point.


83640 18-Sep-2001 jhb

Whitespace fixes.


83506 15-Sep-2001 dfr

Fill out some gaps in ia64 DDB support. This involves generalising DDB's
breakpoint handling slightly to cope with the fact that ia64 instructions
are not located on byte boundaries.


83428 14-Sep-2001 imp

s/thread'/thread's/


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83276 10-Sep-2001 peter

Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by: jake, tmm, dillon


83275 10-Sep-2001 peter

gcc-3 has objections about the bluetrap6 and bluetrap13 inline asm
functions. Apparently multi-line string asm arguments are deprecated.


83223 08-Sep-2001 peter

Missing part of dillon's coredump commit. cpu_coredump() was still
passing IO_NODELOCKED to vn_rdwr(), this would cause operations on the
unlocked core vnode and softupdates nastiness if an a.out binary cored.


83181 07-Sep-2001 msmith

Now that this code is MD, we don't need the i386 ifdefs.


83163 06-Sep-2001 jhb

Call sendsig() with the proc lock held and return with it held.


83093 05-Sep-2001 jlemon

Remove superfluous statement.


83051 05-Sep-2001 yokota

Rework the ISA PnP driver pnp and the PnP resource parser to fix
the following bugs.

- When constructing a resource configuration, respect the order
in which resource descriptors are read, in order to establish
the correct mapping between the descriptors and configuration
registers.
"Plug and Play ISA Specification, Version 1.0a", Sec 4.6.1, May 5,
1994. "Clarifications to the Plug and Play ISA Specification,
Version 1.0a", Sec 6.2.1, Dec. 10, 1994.

- Do not ignore null (empty) descriptors; they are valid descriptors
acting as filler.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a",
Sec 6.2.1.

- Correctly set up logical device configuration registers for null
resources.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a"

- Handle null resources properly in the resource allocator for the
ISA bus.


83047 05-Sep-2001 obrien

style(9) the structure definitions.


82971 04-Sep-2001 iwasaki

Reenable RTC interrupts after wakeup. Some laptops have a problem
with system statistics monitoring tools (such as systat, vmstat...)
because of stopping RTC interrupts generation.
Restore all the timers (RTC and i8254) atomically.

Reviewed by: bde
MFC after: 1 week


82957 04-Sep-2001 peter

Mostly cosmetic. Move various variables from .s files to .c files so that
gdb generates debug info for them.


82939 04-Sep-2001 peter

Zap #if 0'ed map init code that got moved to the MI area.
Convert the powerpc tree to use the common code.


82938 04-Sep-2001 peter

Nuke #if 0'ed "setredzone()" stub. We never used it, and probably
never will. I've implemented an optional redzone as part of the KSE
upage breakup.


82845 03-Sep-2001 yokota

Fix the argument specifier for the PnP BIOS function 2
(PNP_SET_DEVNODE). The second argument is not a segment:offset
pointer, but a 16 bit short.

MFC after: 4 weeks


82624 31-Aug-2001 peter

Do a style cleanup pass for the pmap_{new,dispose,etc}_proc() functions
to get them closer to the KSE tree. I will do the other $machine/pmap.c
files shortly.


82585 30-Aug-2001 dillon

Remove the MPSAFE keyword from the parser for syscalls.master.
Instead introduce the [M] prefix to existing keywords. e.g.
MSTD is the MP SAFE version of STD. This is prepatory for a
massive Giant lock pushdown. The old MPSAFE keyword made
syscalls.master too messy.

Begin comments MP-Safe procedures with the comment:
/*
* MPSAFE
*/
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).

sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE

ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)

Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not. Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.

Rebuild syscall tables.


82555 30-Aug-2001 msmith

Add ACPI attachments.


82465 28-Aug-2001 imp

It turns out that while Toshiba laptops don't want to route interrupts
multiple times, others do. The last strategy, which was to assume
that already routed interrupts were good and just return them doesn't
work for some laptops. So, instead, we have a new strategy: we notice
that we have an interrupt that's already routed. We go ahead and try
to route it, none the less. We will assume that it is correctly
routed, even if the route fails. We still assume that other failures
in the bios32 call are because the interrupt is NOT routed.

Note: some laptops do not support the bios32 interface to PCI BIOS and
we need to call it via the INT 2A interface. That is another windmill
to till at later.

Also correct a minor typo and minor whitespace nits.

Strong MFC candidate.


82441 27-Aug-2001 imp

MFS: IRQ ordering, PRVERB and more whining in pcibios_get_version on failure.
Check return value from bios32.

[[ Yes, I was bad and committed this to stable first. I should have done
the commit in the other order. ]]


82394 27-Aug-2001 peter

There is nothing more embarresing than having three goes at correcting
typos in the same paragraph. s/in in/in/

Submitted by: iedowse


82393 27-Aug-2001 peter

Enable hardwiring of things like tunables from embedded enironments
that do not start from loader(8).


82366 26-Aug-2001 peter

I missed a typo in the last commit: s/whach/which/

Submitted by: bde


82318 25-Aug-2001 peter

Argh! Revert accidental commit.


82313 25-Aug-2001 peter

vm_page_zero_idle() is no longer MD.


82310 25-Aug-2001 julian

Add another comment.
check for 'teh's this time..


82309 25-Aug-2001 peter

Optionize UPAGES for the i386. As part of this I split some of the low
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places. The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.

This is mainly being done for the benefit of a MFC to RELENG_4 at some
point. -current doesn't really need this so much since each interrupt
runs on its own kstack.


82308 25-Aug-2001 peter

s/teh/the/


82307 25-Aug-2001 julian

Add an explanatory note that would have saved me an hour or two
of confusion had it been there when I started reading the code..


82281 24-Aug-2001 jhb

Axe a commented, unused #define related to the old giant lock.


82279 24-Aug-2001 jhb

Remove references to the old giant kernel lock in various comments.


82262 24-Aug-2001 peter

Export the actual KERNBASE to the symbol table. We can use nlist() to get
this without having to second guess it in userland.


82261 24-Aug-2001 peter

Move cpu_fxsr definition to C code (so debug info is generated) and where
it is easily #ifdef'ed so that we dont miss unintentional references to it.


82165 23-Aug-2001 peter

Fix a comment error that was fixed in the pc98 version. hw.maxmem is
really hw.physmem.


82157 23-Aug-2001 peter

Dont add UPAGES to the %cs segment limit. There is nothing there except
page tables.


82154 23-Aug-2001 peter

Dont compile in SSE fxsave/fxrstor instructions if CPU_ENABLE_SSE isn't
active.


82140 22-Aug-2001 iwasaki

Move CR4.PGE enabling code after paging is enabled via CR0.PG based on
the description (2.5. CONTROL REGISTERS) of Intel developer's manual at:
ftp://download.intel.com/design/PentiumII/manuals/24319202.pdf

Reviewed by: peter, bde, tlambert2@mindspring.com
Pointed-out by: "Shin'ya Kumabuchi" <kumabu@t3.rim.or.jp>
MFC after: 1 week


82127 22-Aug-2001 dillon

Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area. i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by: freebsd-smp, jake


82121 22-Aug-2001 peter

Introduce two new sysctl's.. vm.kvm_size and vm.kvm_free. These are
purely informational and can give some advance indications of tuning
problems. These are i386 only for now as it seems that the i386 is
the only one suffering kvm pressure.


82118 21-Aug-2001 jhb

Push down Giant some in trap_pfault() so we don't grab Giant around
trap_fatal() to make restarting from panic's slightly easier. Before if
one did 'w 0 0' in ddb, the longjmp in ddb inside of trap_fatal() would
result in Giant being held (or recursed one level deeper) which led to
problems later on. You can now drop to teh debugger, do 'w 0 0', and
continue w/o a problem.


82035 21-Aug-2001 imp

The general conesnsus on irc was that pci bios for config registers
and such was just a bad idea and one that users should be forced to
enable if they want it. This patch introduces a hw.pci.enable_pcibios
tunable for those people. This does not impact the pcibios interrupt
routing at all.

Approved by: peter, msmith


82031 21-Aug-2001 dillon

Fix bug in physmem_est calculation - the kernel_map size was not being
converted into pages.

Fix bug in maxbcache calculation, nbuf must be tested against maxbcache
rather then physmem_est.

Obtained from: bde


82026 21-Aug-2001 peter

Detect a certain type of PCIBIOS brain damage. For some reason,
some bios vendors took it apon themselves to "censor" the
host->pci bridges from PCIBIOS callers, even when the caller
explicitly asks for them. This includes certain Compaq machines
(eg: DL360) and some laptops.

If we detect this, shut down pcibios and revert to using IO
port bashing.

Under -current, apcica does a better job anyway.


82025 21-Aug-2001 peter

Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this
stuff.


81933 20-Aug-2001 dillon

Limit the amount of KVM reserved for the buffer cache and for swap-meta
information. The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache. This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating. The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.


81879 18-Aug-2001 peter

There is nothing special that requires SSE to be only on 686 class cpus.
This enables 586-only SMP kernels to compile again.

Problem reported by: Jacek Jedrzejczak <jacol@ids.gda.pl>


81763 16-Aug-2001 obrien

style(9) and make consistent across platforms


81711 15-Aug-2001 wpaul

Teach bus_dmamem_free() about contigfree(). This is a bit of a hack,
but it's better than the buggy behavior we have now. If we contigmalloc()
buffers in bus_dmamem_alloc(), then we must configfree() them in
bus_dmamem_free(). Trying to free() them is wrong, and will cause
a panic (at least, it does on the alpha.)

I tripped over this when trying to kldunload my busdma-ified if_rl
driver.


81704 15-Aug-2001 jhb

Whitespace fixes to make this mostly fit in 80 columns.


81584 13-Aug-2001 bde

Use interrupt gates instead of trap gates for breakpoint and trace
traps, so that ddb can keep control (almost) no matter how it is
entered. This breaks time-critical interrupts while the system is
stopped in ddb, but I haven't noticed any significant problems except
that applications become confused about the time. Lost time will be
adjusted for later. Anyway, the half-baked disabling of interrupts in
Debugger() gives the same problems for the usual way of entering ddb.


81583 13-Aug-2001 bde

Removed he BPTTRAP() macro and its use. It was intended for restoring
bug for bug compatibility to ddb trap handlers after fixing the debugger
trap gates to be interrupt gates, but the fix was never committed. Now
I want the fix to apply to ddb.


81544 12-Aug-2001 iwasaki

Fix some trivial bugs.
- fix segment limit mis-calculation for GCODE_SEL, GDATA_SEL, GPRIV_SEL,
LUCODE_SEL and LUDATA_SEL.
- move `loader(8) metadata' related printf() after cninit().
- use atop macro (address to pages) for segment limit calculation
instead of i386_btop macro (bytes to pages).
- fix style bugs for the declarations of ints.

Reviewed by: bde, msmith (and arch & audit ML)


81493 10-Aug-2001 jhb

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


81265 08-Aug-2001 peter

Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.


81168 05-Aug-2001 nate

- Removed comment about ThinkPad keyboards from the PCVT line. Any ThinkPad
that needs this probably won't run -current, as it's at least 5 years old.


80700 31-Jul-2001 jake

Use a machine dependent type, Elf_Hashelt, for the elements of the elf
dynamic symbol table buckets and chains. The sparc64 toolchain uses 32
bit .hash entries, unlike other 64 bits architectures (alpha), which use
64 bit entries.

Discussed with: dfr, jdp


80431 27-Jul-2001 peter

Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel
for this, since it is easy to run into with large systems with lots of
shared mmap space.

Obtained from: yahoo


80426 26-Jul-2001 peter

MASK_FPU_SW didn't do what it was expected to do.


80421 26-Jul-2001 peter

Call the early tunable setup functions as soon as kern_envp is available.
Some things depend on hz being set not long after this.


80399 26-Jul-2001 bmilekic

- Do not handle the per-CPU containers in mbuf code as though the cpuids
were indices in a dense array. The cpuids are a sparse set and treat
them as such, setting up containers only for CPUs activated during
mb_init().

- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
map, in accordance with the above.

This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.

Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob


80219 23-Jul-2001 wpaul

You were knocked senseless by the Boomerang, spun around by the Cyclone,
blown over by the Hurricane and had a house dropped on you by the Tornado.
Now it's time to have your parade rained on by... the Typhoon!

This commit adds driver support for 3Com 3cR990 10/100 ethernet
adapters based on the Typhoon I and Typhoon II chipsets. This is actually
a port of the OpenBSD driver with many hacks by me.

No Virginia, there isn't any support for the hardware crypto yet. However
there is support for TCP/IP checksum offload and VLANs.

Special thanks go to Jason Wright, Aaron Campbell and Theo de Raadt for
squeezing enough info out of 3Com to get this written, and for doing
most of the hard work.

Manual page is included. Compiled as a module and included in GENERIC.


80160 22-Jul-2001 iwasaki

Don't do sleep state transition if specified sleep state is not
supported by the system.


80078 21-Jul-2001 msmith

Convert from acpi_strerror() to AcpiFormatException()

Fix dangling include of the dear departed acpi_ecreg.h


80071 21-Jul-2001 msmith

Update the OSD module to match the ACPI CA 20010717 import.

Submitted by: "Grover, Andrew" <andrew.grover@intel.com> (OsdHardware.c)


80028 20-Jul-2001 takawata

Add ACPI S2-S4BIOS Suspend/Resume code.
Some problems may remain.

Reviewed by:iwasaki


79893 19-Jul-2001 bsd

swtch.s: During context save, use the correct bit mask for clearing
the non-reserved bits of dr7.

During context restore, load dr7 in such a way as to not
disturb reserved bits.

machdep.c: Don't explicitly disallow the setting of the reserved bits
in dr7 since we now keep from setting them when we load dr7
from the PCB.

This allows one to write back the dr7 value obtained from
the system without triggering an EINVAL (one of the
reserved bits always seems to be set after taking a trace
trap).

MFC after: 7 days


79885 19-Jul-2001 kris

Quiet a variable format-string warning.

MFC after: 1 week


79824 17-Jul-2001 tegge

The per-cpu temporary buffers are not needed since the pcb_save areas have
the proper alignment. Change dummy variable in npxinit from stack to bss
to ensure proper alignment.

Reviewed by: bde


79781 16-Jul-2001 tegge

Use PCPU_GET(cpuid) instead of curproc->p_oncpu.
Reviewed by: peter


79734 14-Jul-2001 jhb

Fix MCOUNT_ENTER() so it actually compiles in the profiling case.

Pointy hat to: me
Submitted by: Danny J. Zerkel <dzerkel@columbus.rr.com>


79663 13-Jul-2001 dd

`pcn' supports AMD Am79C97x cards, not Am79C79x cards.

PR: 28946
Submitted by: Ryuichiro Imura <imura@ryu16.org>


79662 13-Jul-2001 sobomax

Unbroke kernel if I686_CPU is not defined.


79630 12-Jul-2001 peter

The #define for pcb_savefpu seems to do more harm than good.


79628 12-Jul-2001 peter

Fix another missed pcb_savefpu reference (inside NPX_DEBUG)


79623 12-Jul-2001 peter

Forgot this fix from another tree. make enable_sse() a real prototype.


79611 12-Jul-2001 peter

Move init_sse() out of the "GenuineIntel" section, my AthlonMP system
has it, for example, and it works fine.


79609 12-Jul-2001 peter

Activate SSE/SIMD. This is the extra context switching support that
we are required to do if we let user processes use the extra 128 bit
registers etc.

This is the base part of the diff I got from:
http://www.issei.org/issei/FreeBSD/sse.html
I believe this is by: Mr. SUZUKI Issei <issei@issei.org>
SMP support apparently by: Takekazu KATO <kato@chino.it.okayama-u.ac.jp>
Test code by: NAKAMURA Kazushi <kaz@kobe1995.net>, see
http://kobe1995.net/~kaz/FreeBSD/SSE.en.html

I have fixed a couple of style(9) deviations. I have some followup
commits to fix a couple of non-style things.


79573 11-Jul-2001 bsd

Add 'hwatch' and 'dhwatch' ddb commands analogous to 'watch' and
'dwatch'. The new commands install hardware watchpoints if supported
by the architecture and if there are enough registers to cover the
desired memory area.

No objection by: audit@, hackers@

MFC after: 2 weeks


79418 08-Jul-2001 julian

A set of changes to reduce the number of include files the kernel
takes from /usr/include. I cannot check them on alpha.. (will try beast)

Briefly looked at by: Warner Losh <imp@harmony.village.org>


79265 05-Jul-2001 dillon

Move vm_page_zero_idle() from machine-dependant sections to a
machine-independant source file, vm/vm_zeroidle.c. It was exactly the
same for all platforms and updating them all was getting annoying.


79263 04-Jul-2001 dillon

Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)


79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


79153 03-Jul-2001 tmm

Make the code to read the kernel message buffer via sysctl machine-
independent and rename the corresponding sysctls from machdep.msgbuf and
machdep.msgbuf_clear (i386 only) to kern.msgbuf and kern.msgbuf_clear.


79137 03-Jul-2001 iwasaki

Add Transmeta Crusoe LongRun support.

Submitted by: Tamotsu HATTORI <athlete@kta.att.ne.jp>
Reviewed by: arch@ folks
MFC after: 1 week


79124 03-Jul-2001 jhb

Quiet warning by removing ast() prototype.

Forgotten by: jhb (me)


79123 03-Jul-2001 jhb

Allow Giant to be recursed when a process terminates.


79106 02-Jul-2001 brooks

gif(4) and stf(4) modernization:

- Remove gif dependencies from stf.
- Make gif and stf into modules
- Make gif cloneable.

PR: kern/27983
Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week


79008 30-Jun-2001 imp

Repo copy i8237.h to dev/ic so we can get rid of some of the final vestiges
of includes of i386 files from non-i386 ports.


78983 29-Jun-2001 jhb

Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.


78981 29-Jun-2001 imp

Remove cruft from old bus.


78962 29-Jun-2001 jhb

Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by: jake (in principle)


78946 29-Jun-2001 jhb

Grab Giant around trap_pfault() for now.


78908 28-Jun-2001 jhb

Get kernel profiling on SMP systems closer to working by replacing the
mcount spin mutex with a very simple non-recursive spinlock implemented
using atomic operations.


78903 28-Jun-2001 bsd

Provide access to the IA32 hardware debug registers from the ddb
kernel debugger. Proper use of these registers allows setting
hardware watchpoints for use in kernel debugging.

MFC after: 2 weeks


78798 26-Jun-2001 kato

Recognize FC-PGA2 Pentium III (Tualatin).


78760 25-Jun-2001 dfr

Add code to detect Transmeta Crusoe cpus.


78636 22-Jun-2001 jhb

- Grab the proc lock around CURSIG and postsig(). Don't release the proc
lock until after grabbing the sched_lock to avoid CURSIG racing with
psignal.
- Don't grab Giant for addupc_task() as it isn't needed.

Reported by: tegge (signal race), bde (addupc_task a while back)


78631 22-Jun-2001 peter

Make the hw.physmem and hw.usermem variables unsigned so that they dont
come up as negative on machines with >2GB ram.


78427 18-Jun-2001 jhb

Initialize mutexes needed early on all in the same place so that the
startup routine more closely matches that of alpha and ia64. At some
point the common mutexes shared across all platforms probably should move
into sys/kern_mutex.c.


78426 18-Jun-2001 jhb

- Add support for decoding syscall names. (Brought over from the new alpha
trace code that was brought over from NetBSD.)
- Check for "syscall_with_err_pushed" as the label prior to a syscall trap
frame rather than "Xlcall_syscall" and "Xint0x80_syscall". We don't
have a valid trapframe during the short range of code that those two
symbols now cover.
- Simplify db_next_frame() to avoid duplicating the code for the different
trap frame types.
- Don't try to trace a swapped-out process. (Brought over from NetBSD via
the new alpha trace code.)


78425 18-Jun-2001 jhb

Include sys/pcpu.h to get the prototype for globaldata_register() to quiet
a warning.


78391 17-Jun-2001 nyan

Don't assume that resource type is ioport and rid equal 0.


78353 16-Jun-2001 alex

Fix "alignemnt" typo.


78260 15-Jun-2001 peter

Fix warnings:
908: warning: long unsigned int format, unsigned int arg (arg 3)
887: warning: `timezero' defined but not used


78135 12-Jun-2001 peter

Hints overhaul:
- Replace some very poorly thought out API hacks that should have been
fixed a long while ago.
- Provide some much more flexible search functions (resource_find_*())
- Use strings for storage instead of an outgrowth of the rather
inconvenient temporary ioconf table from config(). We already had a
fallback to using strings before malloc/vm was running anyway.


77931 09-Jun-2001 obrien

Fix style of defines.


77796 06-Jun-2001 jhb

Don't hold sched_lock across addupc_task().

Reported by: David Taylor <davidt@yadt.co.uk>
Submitted by: bde


77626 02-Jun-2001 phk

Properly wrap mtx_intr_enable() macro in "do $bla while (0)"


77582 01-Jun-2001 tmm

Clean up the code exporting interrupt statistics via sysctl a bit:
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs

Requested by: bde
Reviewed by: bde (some time ago)


77502 30-May-2001 jhb

Quiet warnings by adding a prototype for set_user_ldt_rv() and making it
conditional on #ifdef SMP.


77486 30-May-2001 jhb

We can't grab the sched_lock in set_user_ldt() because when it is called
from cpu_switch(), curproc has been changed, but the sched_lock owner will
not be updated until we return to mi_switch(), thus we deadlock against
ourselves. As a workaround, push the acquire and release of sched_lock out
to the callers of set_user_ldt(). Note that we can't use a mtx_assert() in
set_user_ldt for the same reason.

Sleuting by: tmm
Tested by: tmm, dougb


77455 30-May-2001 mjacob

move wx to be part of miibus requiring chipsets


77414 29-May-2001 phk

Remove MFS options from all example kernel configs.


77214 26-May-2001 jkh

Remove pcm hints here now that it's gone from GENERIC.

Reminded-by: bde


77187 25-May-2001 jkh

Take pcm (audio) back out of GENERIC; there appears to be some
concensus, most notably among the maintainers, that it's better
loaded as a module.

Finally-pushed-over-the-edge-by-the-anguished-cries-of: rwatson


77097 23-May-2001 jhb

Don't acquire Giant just to call trap_fatal(), we are about to panic
anyway so we'd rather see the printf's then block if the system is
hosed.


77082 23-May-2001 alfred

pmap_mapdev needs the vm_mtx, aquire it if not already locked


77081 23-May-2001 alfred

lock vm while playing with pmap


77015 22-May-2001 bde

Convert npx interrupts into traps instead of vice versa. This is much
simpler for npx exceptions that start as traps (no assembly required...)
and works better for npx exceptions that start as interrupts (there is
no longer a problem for nested interrupts).

Submitted by: original (pre-SMPng) version by luoqi


76947 22-May-2001 jhb

Remove a few more spl's I missed earlier.

Reported by: Michael Harnois <mdharnois@home.com>
Pointy hat: me


76941 21-May-2001 jhb

Sort includes.


76939 21-May-2001 jhb

Axe unneeded spl()'s.


76906 20-May-2001 bde

Throw away the complications in npxsave() and their infrastructure.
npxsave() went to great lengths to excecute fnsave with interrupts
enabled in case executing it froze the CPU. This case can't happen,
at least for Intel CPU/NPX's. Spurious IRQ13's don't imply spurious
freezes. Anyway, the complications were usually no-ops because IRQ13
is not used on i486's and newer CPUs, and because SMPng broke them in
rev.1.84. Forcible enabling of interrupts was changed to
write_eflags(old_eflags), but since SMPng usually calls npxsave() from
cpu_switch() with interrupts disabled, write_eflags() usually just
kept interrupts disabled.


76905 20-May-2001 bde

Use a critical region to protect almost everything in npxinit().
npxinit() didn't have the usual race because it doesn't save to curpcb,
but it may have had a worse form of it since it uses the npx when it
doesn't "own" it. I'm not sure if locking prevented this. npxinit()
is normally caled with the proc lock but not sched_lock.

Use a critical region to protect pushing of curproc's npx state to
curpcb in npxexit(). Not doing so was harmless since it at worst
saved a wrong state to a dieing pcb.


76903 20-May-2001 bde

Use a critical region to protect saving of the npx state in savectx().
Not doing this was fairly harmless because savectx() is only called
for panic dumps and the bug could at worse reset the state.

savectx() is still missing saving of (volatile) debug registers, and
still isn't called for core dumps.


76827 19-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


76770 17-May-2001 jhb

- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT.
- Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be
toggled after boot.
- Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just
afer the display of the copyright message instead of doing it by hand in
three MD places.


76768 17-May-2001 jhb

- Axe the IMEN_BITS and APIC_IMEN_BITS constants.
- Add back in a definition of NHWI which is preferred over ICU_LEN.

Submitted by: bde


76650 15-May-2001 jhb

Remove unneeded includes of sys/ipl.h and machine/ipl.h.


76645 15-May-2001 jhb

Move the definition of HWI_MASK to the i386/isa/icu.h header right next to
the definition of ICU_LEN.


76642 15-May-2001 jhb

- Use ICU_LEN rather than NHWI for the size of the array of ithreads.
- Remove unneeded include of sys/ipl.h.


76554 13-May-2001 phk

Convert DEVFS from an "opt-in" to an "opt-out" option.

If for some reason DEVFS is undesired, the "NODEVFS" option is
needed now.

Pending any significant issues, DEVFS will be made mandatory in
-current on july 1st so that we can start reaping the full
benefits of having it.


76546 13-May-2001 bde

Use a critical region to protect pushing of the parent's npx state to the
pcb for fork(). It was possible for the state to be saved twice when an
interrupt handler saved it concurrently. This corrupted (reset) the state
because fnsave has the (in)convenient side effect of doing an implicit
fninit. Mundane null pointer bugs were not possible, because we save to
an "arbitrary" process's pcb and not to the "right" place (npxproc).

Push the parent's %gs to the pcb for fork(). Changes to %gs before
fork() were not preserved in the child unless an accidental context
switch did the pushing. Updated the list of pcb contents which is
supposed to inhibit bugs like this. pcb_dr*, pcb_gs and pcb_ext were
missing. Copying is correct for pcb_dr*, and pcb_ext is already
handled specially (although XXX'ly).

Reducing the savectx() call to an npxsave() call in rev.1.80 was a
mistake. The above bugs are duplicated in many places, including in
savectx() itself.

The arbitraryness of the parent process pointer for the fork()
subroutines, the pcb pointer for savectx(), and the save87 pointer
for npxsave(), is illusory. These functions don't work "right" unless
the pointers are precisely curproc, curpcb, and the address of npxproc's
save87 area, respectively, although the special context in which they
are called allows savectx(&dumppcb) to sort of work and npxsave(&dummy)
to work. cpu_fork() just doesn't work unless the parent process
pointer is curproc, or the caller has pushed %gs to the pcb, or %gs
happens to already be in the pcb.


76525 12-May-2001 deischen

Revert part of last commit. Instead of using %fs for KSD/TSD, we'll
follow Linux' convention and use %gs. This adds back the setting of
%fs to a sane value in sendsig(). The value of %gs remains preserved
to whatever it was in user context.


76494 11-May-2001 jhb

Simplify the vm fault trap handling code a bit by using if-else instead of
duplicating code in the then case and then using a goto to jump around
the else case.


76456 11-May-2001 msmith

Un-swap irq/link byte values so that printf works.


76440 10-May-2001 jhb

- Split out the support for per-CPU data from the SMP code. UP kernels
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.

Reviewed by: jake


76434 10-May-2001 jhb

- Use sched_lock and critical regions to ensure that LDT updates are thread
safe from preemption and concurrent access to the LDT.
- Move the prototype for i386_extend_pcb() to <machine/pcb_ext.h>.

Reviewed by: silence on -hackers


76298 06-May-2001 deischen

When setting up the frame to invoke a signal handler, preserve the
%fs and %gs registers instead of setting them to known sane values.
%fs is going to be used for thread/KSE specific data by the new
threads library; we'll want it to be valid inside of signal handlers.

According to bde, Linux preserves the state of %fs and %gs when setting
up signal handlers, so there is precedent for doing this.

The same changes should be made in the Linux emulator, but when made,
they seem to break (at least one version of) the IBM JDK for Linux
(reported by drew).

Approved by: bde


76205 02-May-2001 bde

Fixed panics in npx exception handling. When using IRQ13 exception
handling, SMPng always switches the npx context away from curproc
before calling the handler, so the handler always paniced. When using
exception 16 exception handling, SMPng sometimes switches the npx
context away from curproc before calling the handler, so the handler
sometimes paniced. Also, we didn't lock the context while using it,
so we sometimes didn't detect the switch and then paniced in a less
controlled way.

Just lock the context while using it, and return without doing anything
except clearing the busy latch if the context is not for curproc. This
fixes the exception 16 case and makes the IRQ13 case harmless. In both
cases, the instruction that caused the exception is restarted and the
exception repeats. In the exception 16 case, we soon get an exception
that can be handled without doing anything special. In the IRQ13 case,
we get an easy to kill hung process.


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


76089 28-Apr-2001 jhb

Add in a missing call to forward_hardclock() in the SMP case.

Submitted by: bde


76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


76031 26-Apr-2001 jake

Remove a leading underscore that prevented I386_CPU kernels from
compiling.

Submitted by: Alexander N. Kabaev <ak03@gte.com>
PR: kern/26858


75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


75724 20-Apr-2001 jhb

Make the ap_boot_mtx mutex static.


75723 20-Apr-2001 jhb

Split up the db_printf's for 'show pcpu' so that we only output at most one
line for each db_printf(). Also, just use spaces to line the columns up
rather than trying to be fancy with tabs.


75677 18-Apr-2001 imp

Back out 1.103. It wasn't approved by the owner of the file and
introduced style bugs.

Submited by: bde


75570 17-Apr-2001 jhb

Blow away the panic mutex in favor of using a single atomic_cmpset() on a
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.


75528 15-Apr-2001 obrien

Turn on kernel debugging support (DDB, INVARIANTS, INVARIANT_SUPPORT, WITNESS)
by default while SMPng is still being developed.

Submitted by: jhb


75488 13-Apr-2001 jhb

People are still having problems with i586_* on UP machines and SMP
machines, so just hack it to disable them for now until it can be fixed.

Inspired by hair pulling of: asmodai


75421 11-Apr-2001 jhb

Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just
"redundant noise" and to match the IPI constant namespace (IPI_*).

Requested by: bde


75397 10-Apr-2001 jhb

Remove constants defining the bitmasks of the old giant kernel lock.


75396 10-Apr-2001 jhb

Remove the old APIC I/O higher level IPI API in favor of the newer MI
API for IPI's that isn't tied to the Intel APIC. MD code can still use
the apic_ipi() function or dink with the apic directly if needed to send
MD IPI's.


75393 10-Apr-2001 jhb

Remove the BETTER_CLOCK #ifdef's. The code is on by default and is here
to stay for the foreseeable future.

OK'd by: peter (the idea)


75392 10-Apr-2001 jhb

Add an MI API for sending IPI's. I used the same API present on the alpha
because:
- it used a better namespace (smp_ipi_* rather than *_ipi),
- it used better constant names for the IPI's (IPI_* rather than
X*_OFFSET), and
- this API also somewhat exists for both alpha and ia64 already.


75357 09-Apr-2001 jhb

- One can now specify the decimal pid of a process to trace as a parameter.
Since pid's are not in the kernel address space, this doesn't conflict
with the funcionality of specifying an arbitrary frame pointer to the
trace command.
- If the first function of a backtrace maps to fork_trampoline, then this
is a newly fork'd process that has not been executed yet, so just print
out the first frame and then return for that case.
- Lower the default count from 65535 to 1024. ddb doesn't trace into
userland, and if the stack gets hosed and starts looping it's less
annoying.


75274 06-Apr-2001 jhb

Add a new ddb command 'show pcpu' which lists some of the per-cpu data.
Specifically, the cpuid, curproc, curpcb, npxproc, and idleproc members.
Also, if witness is compiled into the kernel, then a list of all the spin
locks held by this CPU is displayed. By default the information for the
current CPU is displayed, but a decimal cpu id may be specified as a
parameter to obtain information on a specific CPU.


75256 06-Apr-2001 jhb

Axe the per-cpu variable witness_spin_check as it was replaced by the
per-cpu spinlocks list.


75141 03-Apr-2001 imp

De __P() while I'm here. Done as a separate commit since it is just
stylistic.

# Yes, this break K&R, but this file already used so many gcc extensions
# keeping K&R support seemed too anachronistic for me.

Didn't fix the bug where functions that can only be used in the kernel
are exported to userland.


75139 03-Apr-2001 imp

Make this file C++ safe. It defines many useful functions (inb, outb)
that people use from userland in C++ programs. I've had this in my
tree for ages and just got bit by it not being in the real tree again.

This is a MFC candidate.


74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


74914 28-Mar-2001 jhb

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


74912 28-Mar-2001 jhb

Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.


74903 28-Mar-2001 jhb

Switch from save/disable/restore_intr() to critical_enter/exit().


74902 28-Mar-2001 jhb

Catch up to the mtx_saveintr -> mtx_savecrit change.


74900 28-Mar-2001 jhb

- Switch from using save/disable/restore_intr to using critical_enter/exit
and change the u_int mtx_saveintr member of struct mtx to a critical_t
mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules. Change all the
_mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
parameters and rename them to use a prefix of two underscores. Inside
of kern_mutex.c, generate wrapper functions for
_mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
that are called from modules. The macros mtx_{un,}lock_{spin,}_flags()
are mapped to the __mtx_* macros inside of the kernel to inline the
usual case of mutex operations and map to the internal _mtx_* functions
in the module case so that modules will use WITNESS and KTR logging if
the kernel is compiled with support for it.


74897 28-Mar-2001 jhb

- Add the new critical_t type used to save state inside of critical
sections.
- Add implementations of the critical_enter() and critical_exit() functions
and remove restore_intr() and save_intr().
- Remove the somewhat bogus disable_intr() and enable_intr() functions on
the alpha as the alpha actually uses a priority level and not simple bit
flag on the CPU.


74810 26-Mar-2001 phk

Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.


74738 24-Mar-2001 obrien

Fix a problem where we were switching npxproc from underneath processes
running in process context in order to run interrupt handlers. This
caused a big smashing of the stack on AMD K6, K5 and Intel Pentium (ie, P5)
processors because we are using npxproc as a flag to indicate whether
the state has been pushed onto the stack.

Submitted by: bde


74670 23-Mar-2001 tmm

Export intrnames and intrcnt as sysctls (hw.nintr, hw.intrnames and
hw.intrcnt).

Approved by: rwatson


74430 19-Mar-2001 des

Show the bzero() bandwidth in kBps instead of Bps; use u_int32_t instead
of long and int64_t; and print the result as an unsigned long. This should
make the output from the bzero() test more readable, and avoid printing a
negative bandwidth. Note that this doesn't change the decision process,
since that is based on time elapsed, not on computed bandwidth.


74337 16-Mar-2001 sos

Remove the now defunct ATA_ENABLE* options

Spotted by: phk


74283 15-Mar-2001 peter

Kill the 4MB kernel limit dead. [I hope :-)].
For UP, we were using $tmp_stk as a stack from the data section. If the
kernel text section grew beyond ~3MB, the data section would be pushed
beyond the temporary 4MB P==V mapping. This would cause the trampoline
up to high memory to fault. The hack workaround I did was to use all of
the page table pages that we already have while preparing the initial
P==V mapping, instead of just the first one.
For SMP, the AP bootstrap process suffered the same sort of problem and
got the same treatment.

MFC candidate - this breaks on 4.x just the same..

Thanks to: Richard Todd <rmtodd@ichotolot.servalan.com>


74182 12-Mar-2001 jlemon

Move the fxp driver so it is under the miibus section.


74016 09-Mar-2001 jhb

Fix mtx_legal2block. The only time that it is bad to block on a mutex is
if we hold a spin mutex, since we can trivially get into deadlocks if we
start switching out of processes that hold spinlocks. Checking to see if
interrupts were disabled was a sort of cheap way of doing this since most
of the time interrupts were only disabled when holding a spin lock. At
least on the i386. To fix this properly, use a per-process counter
p_spinlocks that counts the number of spin locks currently held, and
instead of checking to see if interrupts are disabled in the witness code,
check to see if we hold any spin locks. Since child processes always
start up with the sched lock magically held in fork_exit(), we initialize
p_spinlocks to 1 for child processes. Note that proc0 doesn't go through
fork_exit(), so it starts with no spin locks held.

Consulting from: cp


73936 07-Mar-2001 jhb

Unrevert the pmap_map() changes. They weren't broken on x86.

Sense beaten into me by: peter


73933 07-Mar-2001 gsutter

Spelling and capitalization fixes.

Reviewed by: gshapiro, jake, jhb, rwatson (all within 30 seconds)


73931 07-Mar-2001 jhb

- Release Giant a bit earlier on syscall exit.
- Don't try to grab Giant before postsig() in userret() as it is no longer
needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.


73929 07-Mar-2001 jhb

Grab the process lock while calling psignal and before calling psignal.


73922 07-Mar-2001 jhb

Use the proc lock to protect p_pptr when waking up our parent in cpu_exit()
and remove the mpfixme() message that is now fixed.


73903 07-Mar-2001 jhb

Back out the pmap_map() change for now, it isn't completely stable on the
i386.


73862 06-Mar-2001 jhb

- Rework pmap_map() to take advantage of direct-mapped segments on
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.

Submitted by: dfr
Debugging help: peter
Tested by: me


73586 05-Mar-2001 jhb

Don't enable interrupts before calling sched_ithd for threaded interrupts.

Tested by: obrien


73374 03-Mar-2001 imp

Add support for Dlink DL10022 to the ed driver. This is a mii part
bolted to a ne-2000 chip. This is necessary for the NetGear FA-410TX
and other cards.

This also requires you add mii to your kernel if you have an ed driver
configured.

This code will result in a couple of timeout messages for ed on the
impacted cards. Additional work will be needed, but this does work
right now, and many people need these cards.

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>


73314 02-Mar-2001 mdodd

version 1.7 made some changes to correct problems identifed by compiling
with egcs-1.1.1. bus_space_write_multi_2() had an extra operation that
should have been removed.

Remove it.

This fixes the panic when bus_space_write_multi_2() is used.

Obtained from: jake


73017 25-Feb-2001 peter

Make the kernel actually compile and link under a.out, using
gcc -aout -mno-underscores. The bioscall.s tweak is not an a.out
requirement really, but to work around the bugs in the antique version of
gas that used for a.out. Makefile hacks are all that is needed to
get an a.out kernel. There is no telling if it will work though.
This is little more than an academic curiosity anyway since all it is
good for is situations where the boot code is hard wired, eg: rom
bootstraps (such as the gnat box).

GENERIC:
...
size -aout kernel ; chmod 755 kernel
text data bss dec hex
3051520 368640 198688 3618848 373820


73013 25-Feb-2001 peter

Always use the ELF naming after the demise of asnames.h.


73011 25-Feb-2001 jake

Remove the leading underscore from all symbols defined in x86 asm
and used in C or vice versa. The elf compiler uses the same names
for both. Remove asnames.h with great prejudice; it has served its
purpose.

Note that this does not affect the ability to generate an aout kernel
due to gcc's -mno-underscores option.

moral support from: peter, jhb


73007 25-Feb-2001 peter

Drop the 'count' from the aha device specs


73001 25-Feb-2001 jake

- Rename the lcall system call handler from Xsyscall to Xlcall_syscall
to be more like Xint0x80_syscall and less like c function syscall().
- Reduce code duplication between the int0x80 and lcall handlers by
shuffling the elfags into the right place, saving the sizeof the
instruction in tf_err and jumping into the common int0x80 code.

Reviewed by: peter


72930 23-Feb-2001 peter

Activate USER_LDT by default. The new thread libraries are going to
depend on this. The linux ABI emulator tries to use it for some linux
binaries too. VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by: jhb, imp


72917 22-Feb-2001 jhb

The p_md.md_regs member of proc is used in signal handling to reference
the the original trapframe of the syscall, trap, or interrupt that entered
the kernel. Before SMPng, ast's were handled via a psuedo trap at the
end of doerti. With the SMPng commit, ast's were broken out into a
separate ast() function that was called from doreti to match the behavior
of other architectures. Unfortunately, when this was done, the
p_md.md_regs member of curproc was not updateda in ast(), thus when
signals are handled by userret() after an interrupt that returns to
userland, we end up using a stale trapframe that will result in the
registers from the old trapframe overwriting the real trapframe and
smashing all the registers right before we return to usermode. The saved
%cs:%eip from where we were in usermode are saved in the trapframe for
example.


72911 22-Feb-2001 jhb

- Change ast() to take a pointer to a trapframe like other architectures.
- Don't use an atomic operation to update cnt.v_soft in ast(). This is
the only place the variable is written to, and sched_lock is always
held when it is written, so it is already protected and the mutex release
of sched_lock asserts a memory barrier that ensures the value will be
updated in a timely fashion.


72900 22-Feb-2001 jhb

- Use TRAPF_PC() on the alpha to acess the PC in the trap frame.
- Don't hold sched_lock around addupc_task() as this apparently breaks
profiling badly due to sched_lock being held across copyin().

Reported by: bde (2)


72897 22-Feb-2001 jhb

GC unused and now obsolete assertion macros.


72759 20-Feb-2001 jhb

- Add a new ithread_schedule() function to do the bulk of the work of
scheduling an interrupt thread to run when needed. This has the side
effect of enabling support for entropy gathering from interrupts on
all architectures.
- Change the software interrupt and x86 and alpha hardware interrupt code
to use ithread_schedule() for most of their processing when scheduling
an interrupt to run.
- Remove the pesky Warning message about interrupt threads having entropy
enabled. I'm not sure why I put that in there in the first place.
- Add more error checking for parameters and change some cases that
returned EINVAL to panic on failure instead via KASSERT().
- Instead of doing a documented evil hack of setting the P_NOLOAD flag
on every interrupt thread whose pri was SWI_CLOCK, set the flag
explicity for clk_ithd's proc during start_softintr().


72746 20-Feb-2001 jhb

- Don't call clear_resched() in userret(), instead, clear the resched flag
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.


72700 19-Feb-2001 bde

Removed all traces of T_ASTFLT (except for gaps where it was). It became
unused except in dead code when ast() was split off from trap().


72683 19-Feb-2001 bde

Changed the aston() family to operate on a specified process instead of
always on curproc. This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).

Debogotified the definition of aston(). aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.

Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.


72678 19-Feb-2001 bde

Fixed style bugs in clock.c rev.1.164 and cpu.h rev.1.52-1.53 -- declare
tsc_present in the right places (together with other variables of the
same linkage), and don't use messy ifdefs just to avoid exporting it in
some cases.


72668 18-Feb-2001 markm

Allow the superuser to prefent all interrupt harvesting on
her system.


72376 12-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


72358 11-Feb-2001 markm

RIP <machine/lock.h>.

Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb


72334 10-Feb-2001 jake

Clear the reschedule flag after finding it set in userret(). This
used to be in cpu_switch(), but I don't see any difference between
doing it here.


72278 10-Feb-2001 jhb

Re-enable preemption on interrupts. My last commit accidentally reverted
it as I was playing with some other ways of doing kernel preemption.


72276 10-Feb-2001 jhb

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


72274 10-Feb-2001 jhb

Add a macro mtx_intr_enable() to alter a spin lock such that interrupts
will be enabled when it is released.


72240 09-Feb-2001 jhb

Catch up to changes to inthand_add().


72239 09-Feb-2001 jhb

Use the MI ithread helper functions in the x86 interrupt code.


72238 09-Feb-2001 jhb

- Catch up to the new swi API changes:
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).


72226 09-Feb-2001 jhb

Move the initailization of the proc lock for proc0 very early into the MD
startup code.


72225 09-Feb-2001 jhb

Woops, remove an obsolete reference to gd_cpu_lockid.


72220 09-Feb-2001 jhb

Remove unused forward_irq counters.


72219 09-Feb-2001 jhb

Axe gd_cpu_lockid as it is no longer used.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


72182 08-Feb-2001 msmith

Free the memory we get from devclass_get_devices and device_get_children.

Submitted by: wpaul


72148 08-Feb-2001 jhb

Don't enable interrupts for a kernel breakpoint or trace trap. Otherwise,
this negates the explicit disabling of interrupts when entering the
debugger in Debugger().


72145 07-Feb-2001 jhb

When SMPng was first committed, we removed 'cpl' from the interrupt
frame. Teach ddb about this as there is one less word for it to skip
over when finding a trapframe on the interrupt frame stack.


72091 06-Feb-2001 asmodai

Fix typo: seperate -> separate.

Seperate does not exist in the english language.


72011 04-Feb-2001 peter

Clean up some leftovers from the root mount cleanup that was done some
time ago. FFS_ROOT and CD9660_ROOT are obsolete.


71983 04-Feb-2001 dillon

This commit represents work mainly submitted by Tor and slightly modified
by myself. It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used. This code has been
heavily tested in -stable but only tested somewhat on -current. An MFC
will occur in a few days. My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.

Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.

Ensure that VM objects are not allocated for system maps. There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.

Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.

Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map. Previously only a simple
linear optimization was made. (The buffer vm_map typically has only a
handful of vm_map_entry's. This stabilizes it at that level permanently).

PR: 20609
Submitted by: (Tor Egge) tegge


71890 01-Feb-2001 jake

Implement preemptive scheduling of hardware interrupt threads.

- If possible, context switch to the thread directly in sched_ithd(),
rather than triggering a delayed ast reschedule.

- Disable interrupts while restoring fpu state in the trap handler,
in order to ensure that we are not preempted in the middle, which
could cause migration to another cpu.

Reviewed by: peter
Tested by: peter (alpha)


71880 31-Jan-2001 peter

Remove count for NSIO. The only places it was used it were incorrect.
(alpha-gdbstub.c got sync'ed up a bit with the i386 version)


71818 30-Jan-2001 peter

Remove some leftovers from the CMAP* stuff in globaldata and the
BSP and AP startup.


71817 30-Jan-2001 peter

Remove unused GD_CPU_LOCKID, GD_OTHER_CPUS, PS_IDLESTACK and
PS_IDLESTACK_TOP


71816 30-Jan-2001 jhb

Remove unnecessary locking to protect the p_upages_obj and p_addr
pointers.


71797 29-Jan-2001 peter

Convert mca (microchannel bus support) from something that we count
(bogus) to something that we test for the presence of.


71785 29-Jan-2001 peter

Send "#if NISA > 0" to the bit-bucket and replace it with an option.
These were compile-time "is the isa code present?" tests and not
'how many isa busses' tests.


71779 29-Jan-2001 peter

Remove stray #include "isa.h"


71739 28-Jan-2001 jake

Clear intr_nesting_level when an interrupt thread has no more
handlers and wants to exit, so it doesn't panic in exit1()
which malloc()s with M_WAITOK.

Reported by: Bob Bishop <rb@gid.co.uk>


71728 28-Jan-2001 bmilekic

Move the setting of curproc to idleproc up earlier in ap_init(). The
problem is that a mutex lock, prior to this change, is acquired before
the curproc is set to idleproc, so we mess ourselves up by calling
the mutex lock routine with curproc == NULL.

Moving it up after the aps_ready spin-wait has us hopefully setting it
after idleproc is setup.

Solved by: jake (the allmighty) :-)


71727 28-Jan-2001 tegge

Defer assignment of low level interrupt handlers for PCI interrupts
described in the MP table until something asks for the interrupt number
later on.


71710 27-Jan-2001 phk

Turn DEVFS on by default.

You may need to turn this off if you you vinum. Apart from that I know of
no reason not to run with DEVFS.


71665 26-Jan-2001 jake

Push Giant down into the trap handlers that need it, instead of
acquiring it unconditionally.

Reviewed by: jhb


71647 25-Jan-2001 jhb

Whitespace fix: convert code indented 6 spaces to use tabs instead.


71604 24-Jan-2001 jhb

- Change fork_exit() to take a pointer to a trapframe as its 3rd argument
instead of a trapframe directly. (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
fork_return() instead of child_return().
- Axe child_return().


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


71533 24-Jan-2001 jhb

Remove the Xforward_irq IPI.


71532 24-Jan-2001 jhb

- Remove all the #if 0'd code that used to implement IRQ forwarding.
- Remove #if 0'd lazy interrupt mask.


71530 24-Jan-2001 jhb

- Proc locking.
- P_OWEUPC -> PS_OWEUPC.
- Remove obsolete prototype for MD fork_return().


71528 24-Jan-2001 jhb

Setup the return values for a child process in the trapframe when we setup
the rest of the trapframe instead of doing it in fork_return().


71527 24-Jan-2001 jhb

- Kill the have_giant parameter to userret() along with all instances of
that name as a variable. Use mtx_owned(&Giant) where appropriate
instead.
- Proc locking.
- P_FOO -> PS_FOO.
- Update comments about enable interrupts during trap and why this may be
bad if we trap while holding a spin mutex.
- Don't bother resetting p to curproc in syscall() in case we are the child
returning from fork. The child hasn't returned from fork through syscall
in a while.
- Remove fork_return() as it has been superseded by the MI version.


71526 24-Jan-2001 jhb

- Proc locking.
- P_INMEM -> PS_INMEM.


71525 24-Jan-2001 jhb

- Relocate portions of this file to get it into an order closer to that of
the alpha mp_machdep.c.
- Proc locking.
- Catch up to the P_FOO -> PS_FOO proc flags changes.
- Stick ap_init()'s prototype with the other prototypes.
- Remove the Xforwardirq IPI.
- Remove unused simplelocks.
- Don't try to psignal() from forward_statclock(), but set the appropriate
signal pending flag in p_sflag instead.
- Add in KTR_SMP tracepoints for various SMP functions. (Brought over
from the alpha port)


71524 24-Jan-2001 jhb

- Proc locking.
- Setup proc0.p_heldmtx, proc0.contested, and curproc earlier so that we
can use mutexes.
- Initialize sched_lock and Giant earlier and enter Giant during init386.
- Use suser(9) instead of checking cr_uid directly.


71522 24-Jan-2001 jhb

Call fork_exit() now instead of futzing around in assembly during a fork
return.


71352 21-Jan-2001 jasone

Move most of sys/mutex.h into kern/kern_mutex.c, thereby making the mutex
inline functions non-inlined. Hide parts of the mutex implementation that
should not be exposed.

Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).

Submitted by: peter


71350 21-Jan-2001 des

First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
- remove zalloci() and zfreei(), which are now redundant.

Reviewed by: bmilekic, jasone


71337 21-Jan-2001 jake

Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By: peter


71321 21-Jan-2001 peter

Remove APIC_INTR_DIAGNOSTIC - this has been disabled for some time now.
Remove some leftovers of removed SMP options.


71320 21-Jan-2001 jasone

Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.

This change is necessary in order to avoid some circular bootstrapping
dependencies.


71318 21-Jan-2001 jake

Remove the per-cpu pages used for copy and zero-ing pages of memory
for SMP; just use the same ones as UP. These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.


71294 20-Jan-2001 jake

Rename the ASSYM MTX_RECURSE to MTX_RECURSECNT in order to not conflict
with the flag of the same name.


71292 20-Jan-2001 jake

Simplify the i386 asm MTX_{ENTER,EXIT} macros to just call the
appropriate function, rather than doing a horse-and-buggy
acquire. They now take the mutex type as an arg and can be
used with sleep as well as spin mutexes.


71287 20-Jan-2001 jake

- Make npx_intr INTR_MPSAFE and move acquiring Giant into the
function itself.
- Remove a hack to allow acquiring Giant from the npx asm trap
vector.


71262 19-Jan-2001 peter

Convert apm from a bogus 'count' into a plain option. Clean out some
other cruft from the files.alpha and files.ia64 that were related to this.


71261 19-Jan-2001 peter

Zap unused #include "apm.h"


71257 19-Jan-2001 peter

Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h


71255 19-Jan-2001 peter

At great personal risk to my sanity, turn off COMPAT_OLDISA and the
two drivers that depend on it - ie and le. The compat code has not been
disabled.


71249 19-Jan-2001 jhb

Add in a space that got lost in the previous commit in some debugging code
so that '&' becomes a binary operator and not a unary operator.


71247 19-Jan-2001 peter

EEK! I missed a couple of places with the 24->32 interrupt change.


71245 19-Jan-2001 peter

Remove reference to splz_unpend - it is long gone.


71244 19-Jan-2001 peter

Catch a few alternative names for the syscall entry frame, eg: post-ELF
and int $0x80 entry methods.


71243 19-Jan-2001 peter

apic_itrace_splz[] is unused


71237 19-Jan-2001 peter

Fix a warning due to missing prototype.


71236 19-Jan-2001 peter

Fix a warning (the prototypes probably shouldn't be so over-zealously
#ifdef'ed though)


71228 19-Jan-2001 bmilekic

Implement MTX_RECURSE flag for mtx_init().
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.

The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.

The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.

Reviewed by: jhb


71211 18-Jan-2001 jhb

Protect p_stat and p_oncpu with sched_lock in forward_signal().


71141 17-Jan-2001 jhb

- Sort of lie and say that %eax is an output only and not an input for the
non-386 atomic_load_acq(). %eax is an input since its value is used in
the cmpxchg instruction, but we don't care what value it is, so setting
it to a specific value is just wasteful. Thus, it is being used without
being initialized as the warning stated, but it is ok for it to be used
because its value isn't important. Thus, we are only sort of lying when
we say it is an output only operand.
- Add "cc" to the clobber list for atomic_load_acq() since the cmpxchgl
changes ZF.


71098 16-Jan-2001 peter

Stop doing runtime checking on i386 cpus for cpu class. The cpu is
slow enough as it is, without having to constantly check that it really
is an i386 still. It was possible to compile out the conditionals for
faster cpus by leaving out 'I386_CPU', but it was not possible to
unconditionally compile for the i386. You got the runtime checking whether
you wanted it or not. This makes I386_CPU mutually exclusive with the
other cpu types, and tidies things up a little in the process.

Reviewed by: alfred, markm, phk, benno, jlemon, jhb, jake, grog, msmith,
jasone, dcs, des (and a bunch more people who encouraged it)


71092 16-Jan-2001 jhb

Argh, disable the micro-ops again. I didn't test these adequately and
managed to lock up one of my machines in world again.

Pointy-hat to: me


71091 16-Jan-2001 jhb

- Use "+a" instead of "=&a" for several constraints. This should fix
compiling errors where gcc would run out of registers.
- Add "cc" to the list of clobbers for micro-ops where we perform
instructions that alter %eflags.
- Use xchgl instead of cmpxchgl to release a spin lock. This could allow
for more efficient register allocation as we no longer mandate that %eax
be used.
- Reenable the optimized mutex micro-ops in the non-i386 case.


71090 16-Jan-2001 jhb

Free the intrhand name when free'ing a intrhand.

Submitted by: bde


71085 16-Jan-2001 jhb

- Fix atomic_load_* and atomic_store_* to generate functions for atomic.c
that modules can call.
- Remove the old gcc <= 2.8 versions of the atomic ops.
- Resort the order of some things in the file so that there is only
one #ifdef for KLD_MODULE, and so that all WANT_FUNCTIONS stuff is
moved to the bottom of the file.
- Remove ATOMIC_ACQ_REL() and just use explicit macros instead.


71050 15-Jan-2001 peter

Implement an optimization for INTREN/INTRDIS that bde pointed out last
time I tinkered around here. Since INTREN is called from the interrupt
critical path now, it should not be too expensive. In this case, we
look at the bits being changed to decide which 8 bit IO port to write to
rather than unconditionally writing to both. I could probably have gone
further and only done the write if the bits actually changed, but that
seemed overkill for the usual case in interrupt threads.

[an outb is rather expensive when it has to cross the ISA bus]


71037 14-Jan-2001 markm

Remove NOBLOCKRANDOM as a compile-time option. Instead, provide
exactly the same functionality via a sysctl, making this feature
a run-time option.

The default is 1(ON), which means that /dev/random device will
NOT block at startup.

setting kern.random.sys.seeded to 0(OFF) will cause /dev/random
to block until the next reseed, at which stage the sysctl
will be changed back to 1(ON).

While I'm here, clean up the sysctls, and make them dynamic.
Reviewed by: des
Tested on Alpha by: obrien


71026 14-Jan-2001 jhb

Argh, remove a local customization that snuck in here.

Noticed by: jasone


71025 14-Jan-2001 jhb

Remove I386_CPU from GENERIC. Support for the 386 seriously pessimizes
performance on other x86 processors. Custom kernels can still be built
that will run on the 386.


71024 14-Jan-2001 jhb

Revert the previous revision now that atomic_store_rel_ptr() actually
works.


71023 14-Jan-2001 jhb

Fix the atomic_load_acq() and atomic_store_rel() functions to properly
implement memory fences for the 486+. The 386 still uses versions w/o
memory fences as all operations on the 386 are not program ordered.
The 386 versions are not MP safe.


71005 14-Jan-2001 jhb

Work around the broken atomic_store_rel_ptr() on the i386 arch by just
using atomic_cmpset_rel_ptr() instead for _release_lock_quick(). When
atomic_store_rel_ptr() is functional and MP safe, then this can be
reverted.


70954 12-Jan-2001 jake

Change return ??? to return -1 in some #if 0'ed code.


70953 12-Jan-2001 bmilekic

Remove declaration of airq variable from outer block. There were two
declarations of a variable of the same name. The one in the outer block
was unused and probably just slipped in at one point or another. This
silences a compiler warning.


70952 12-Jan-2001 jake

Remove unused per-cpu variables inside_intr and ss_eflags.


70950 12-Jan-2001 bmilekic

Remove useless include of sys/mbuf.h (no longer useful since the
mbuf subsystem init was moved to a better place).


70928 11-Jan-2001 jake

- Remove compatibility macros for accessing per-cpu variables.
__FreeBSD_version 500015 can be used to detect their disappearance.
- Move the symbols for SMP_prvspace and lapic from globals.s to
locore.s.
- Remove globals.s with extreme prejudice.


70861 10-Jan-2001 jake

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


70798 08-Jan-2001 jake

Fix a warning. The type of globaldata.gd_prvspace has changed.


70723 06-Jan-2001 jake

Implement accessors for per-cpu variables which don't depend on the
symbols in globals.s.

PCPU_GET(name) returns the value of the per-cpu variable
PCPU_PTR(name) returns a pointer to the per-cpu variable
PCPU_SET(name, val) sets the value of the per-cpu variable

In general these are not yet used, compatibility macros remain.

Unifdef SMP struct globaldata, this makes variables such as cpuid
available for UP as well.

Rebuilding modules is probably a good idea, but I believe old
modules will still work, as most of the old infrastructure
remains.


70714 06-Jan-2001 jake

Use %fs to access per-cpu variables in uni-processor kernels the same
as multi-processor kernels. The old way made it difficult for kernel
modules to be portable between uni-processor and multi-processor
kernels. It is no longer necessary to jump through hoops.

- always load %fs with the private segment on entry to the kernel
- change the type of the self referntial pointer from struct privatespace
to struct globaldata
- make the globaldata symbol have value 0 in all cases, so the symbols
in globals.s are always offsets, not aliases for fields in globaldata
- define the globaldata space used for uniprocessor kernels in C, rather
than assembler
- change the assmebly language accessors to use %fs, add a macro
PCPU_ADDR(member, reg), which loads the register reg with the address
of the per-cpu variable member


70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


70223 20-Dec-2000 paul

Re-enable the lnc driver in GENERIC.


70034 14-Dec-2000 jhb

Remove the "machine dependent" KTR trace buffer ddb commands. The code was
exactly the same on all platforms.


70006 14-Dec-2000 jake

Use _lapic+offset to access the local apic from assembly language
files, rather than the symbols in globals.s. The offsets are
generated by genassym.


69987 13-Dec-2000 jhb

If we fail to emulate a vm86 trap in kernel mode, then we use
vm86_trap() to return to the calling program directly. vm86_trap()
doesn't return, thus it was never returning to trap() to release
Giant. Thus, release Giant before calling vm86_trap().


69972 13-Dec-2000 tanimura

- If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by: dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by: alfred, dillon


69971 13-Dec-2000 jake

Introduce a new potientially cleaner interface for accessing per-cpu
variables from i386 assembly language. The syntax is PCPU(member)
where member is the capitalized name of the per-cpu variable, without
the gd_ prefix. Example: movl %eax,PCPU(CURPROC). The capitalization
is due to using the offsets generated by genassym rather than the symbols
provided by linking with globals.o. asmacros.h is the wrong place for
this but it seemed as good a place as any for now. The old implementation
in asnames.h has not been removed because it is still used to de-mangle
the symbols used by the C variables for the UP case.


69952 13-Dec-2000 msmith

Remove the COMPAT_OLDPCI option, it's going away.

Turn 'lnc' off in GENERIC for the moment, pending its update to newbus.


69947 13-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


69919 12-Dec-2000 jhb

Add in symbols needed in the WITNESS_ENTER and WITNESS_EXIT macros in
i386/include/mutex.h.


69892 12-Dec-2000 jhb

Fix the assembly mutex macros to call the appropriate witness functions if
the witness code is compiled in. Without this, the witness code doesn't
notice that sched_lock is released by fork_trampoline() and thus gets all
confused about spin lock order later on.


69881 12-Dec-2000 jake

- Add code to detect if a system call returns with locks other than Giant
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.


69855 11-Dec-2000 phk

Remove DDB, it leaked in here with another commit.

Submitted by: bde


69783 08-Dec-2000 msmith

Next phase in the PCI subsystem cleanup.

- Move PCI core code to dev/pci.
- Split bridge code out into separate modules.
- Remove the descriptive strings from the bridge drivers. If you
want to know what a device is, use pciconf. Add support for
broadly identifying devices based on class/subclass, and for
parsing a preloaded device identification database so that if
you want to waste the memory, you can identify *anything* we know
about.
- Remove machine-dependant code from the core PCI code. APIC interrupt
mapping is performed by shadowing the intline register in machine-
dependant code.
- Bring interrupt routing support to the Alpha
(although many platforms don't yet support routing or mapping
interrupts entirely correctly). This resulted in spamming
<sys/bus.h> into more places than it really should have gone.
- Put sys/dev on the kernel/modules include path. This avoids
having to change *all* the pci*.h includes.


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


69779 08-Dec-2000 jake

Revert the previous change I made to cpu_switch. It doesn't help as
much as I thought it would and according to bde was a pessimization.


69774 08-Dec-2000 phk

Staticize some malloc M_ instances.


69773 08-Dec-2000 jake

Fix a jump to the wrong label, <sigh>. Put a period at the end of a
sentence in a comment.

Submitted by: bde


69770 08-Dec-2000 jhb

Argh, revert the clobber changes. Since %ecx and %edx aren't call safe,
calling the C functions mtx_enter_hard() and mtx_exit_hard() clobbers them.
Note that %eax is also not call safe, but it is already clobbered due to
cmpxchg. However, now we are back to not compiling again, so these macros
are still left disabled for now.


69743 08-Dec-2000 jake

Change the calling conventions of the MTX_ENTER macro to match
that of MTX_EXIT. Don't assume that the reg parameter to MTX_ENTER
holds curproc, load it explicitly. Put semi-colons at the end of
the macros to be more consistent and so its harder to forget them
when these change.


69740 08-Dec-2000 jhb

Well, the previous commit wasn't entirely correct either. For now, just
disable the optimized mutex micro-operations for the non-I386_CPU case
and fall back to the C stubs that call the atomic_foo() inlines.


69728 07-Dec-2000 phk

Move extern tsc_present outside function to quelch a warning.


69704 07-Dec-2000 iwasaki

Create a pmtimer device instance for GENERIC and NEWCARD kernels by default.

Submitted by: Masayuki FUKUI <fukui@sonic.nm.fujitsu.co.jp>


69690 07-Dec-2000 jhb

Fix broken register restraints that needlessly clobbered registers %ecx
and %edx resulting in gcc not having enough registers left to work with.


69658 06-Dec-2000 peter

This is kind of a nasty hack, but it appears to solve the Compaq DL360
SMP problem. Compaq, in their infinite wisdom, forgot to put the IO apic
intpin #0 connection to the 8259 PIC into the mptable. This hack is to
look and see if intpin #0 has *no* table entry and adds a fake ExtInt
entry for the remap routines to use. isa/clock.c will still test the
interrupts. This entry is only ever used on an already broken system.


69651 06-Dec-2000 peter

Move io_apic_{read,write} from apic_ipl.s (where they do not belong) into
mpapic.c. This gives us the benefit of C type checking. These functions
are not called in any critical paths and are not used by the interrupt
routines.


69646 06-Dec-2000 peter

GC unused assembler function apic_eoi()


69586 05-Dec-2000 jake

Remove the last of the MD netisr code. It is now all MI. Remove
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.

Reviewed by: jhb


69578 04-Dec-2000 peter

Cleanup some leftover lint from the old interrupt system.
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.

Reviewed by: jhb, jakeb


69570 04-Dec-2000 jake

(1) Allow a stray lock prefix to be compiled out with the
MPLOCKED macro
(2) Use decimal 12 rather than hex 0xc in an addl
(3) Implement MTX_ENTER for the I386_CPU case
(4) Use semi-colons between instructions to allow MTX_ENTER
and MTX_ENTER_WITH_RECURSION to be assembled
(5) Use incl instead of incw to increment the recusion count
(6) 10 is not a valid label, use 7, 8 and 9 rather than 8, 9 and 10
(7) Sort numeric labels

Submitted by: bde (2, 4, and 5)


69536 03-Dec-2000 jake

Change cpu_switch to explicitly popl the callers program counter and
pushl that of the new process, rather than doing a movl (%esp) and
assuming that the stack has been setup right. This make the initial
stack setup slightly more sane, and will make it easier to stick
an interrupted process onto the run queue without its knowing.


69521 02-Dec-2000 markm

Namespace cleanup. Remove some #includes in favour of an explicit
declaration.

Asked for by: bde


69440 01-Dec-2000 jhb

Fix this slightly better by using NON_GPROF_RET instead of duplicating
hard-coded asm.

Suggested by: bde


69431 01-Dec-2000 jake

Change doreti to take a trapframe instead of an intrframe.
Remove associated pushes of dummy units to convert frame.

Reviewed by: jhb


69406 30-Nov-2000 jhb

Revert the previous change to this file. We have to hardcode in the opcode
for return because we do Evil Things(tm) with a 'ret' macro in
asmacros.h.

Noticed by: markm


69379 30-Nov-2000 marcel

Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286


69377 30-Nov-2000 peter

Increase NKPT from 17 to 30. This fixes the 4GB ram boot panic on both
-current and RELENG_4 with GENERIC.

NKPT is the number of initial bootstrap page table pages we create for
the kernel during startup. Once VM is up, we resize it as needed, but
with 4G ram, the size of the vm_page_t structures was pushing it over
the limit. The fact that trimmed down kernels boot on 4G ram machines
suggests that we were pretty close to the edge.

The "30" is arbitary, but smaller than the 'nkpt' variable on all
machines that I checked.


69334 28-Nov-2000 jhb

Don't wait forever for CPUs to stop or restart. Instead, give up after a
timeout. If DIAGNOSTIC is turned on, then display a message to the console
with a map of which CPUs failed to stop or restart. This gives an SMP box
at least a fighting chance of getting into DDB if one of the other CPUs has
interrupts disabled.


69333 28-Nov-2000 jhb

Use atomic ops to close a race condition on the in_Debugger variable used
to only allow 1 CPU at a time to (non-recursively) enter the debugger.


69147 25-Nov-2000 jlemon

Revert the last commit to the callout interface, and add a flag to
callout_init() indicating whether the callout is safe or not. Update
the callers of callout_init() to reflect the new interface.

Okayed by: Jake


69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


69006 21-Nov-2000 markm

Assembler fixes.

Fix opcodes that were typed as ".byte 0xNN, 0xMM" when an older
assembler could not recognise the newer Pentium instructions.
Reviewed by: jhb


69003 21-Nov-2000 markm

Add a consistent API to a feature that most modern CPUs have; a fast
counter register in-CPU.

This is to be used as a fast "timer", where linearity is more important
than time, and multiple lines in the linearity caused by multiple CPUs
in an SMP machine is not a problem.

This adds no code whatsoever to the FreeBSD kernel until it is actually
used, and then as a single-instruction inline routine (except for the
80386 and 80486 where it is some more inline code around nanotime(9).

Reviewed by: bde, kris, jhb


69001 21-Nov-2000 jhb

Stop handcoding a couple of instructions since gas 2.10 can properly
assemble 16-bit code.

Noticed by: markm


68889 19-Nov-2000 jake

- Protect the callout wheel with a separate spin mutex, callout_lock.
- Use the mutex in hardclock to ensure no races between it and
softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
CALLOUT_MPSAFE, which specifies that a callout handler does not
need giant. There is still no way to set this flag when
regstering a callout.

Reviewed by: -smp@, jlemon


68862 17-Nov-2000 jake

- Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb


68860 17-Nov-2000 jhb

- Change extra sanity checks in cpu_switch() to be conditional on INVARIANTS
instead of DIAGNOSTIC.
- Remove the p_wchan check as it no longer applies since a process may be
switched out during CURSIG() within msleep() or mawait().
- Remove an extra sanity check only needed during the early SMPng work.


68808 16-Nov-2000 jhb

Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by: witness


68787 15-Nov-2000 jhb

Assert that Giant is not owned during the main loop of ithd_loop().


68757 15-Nov-2000 imp

Add pmtimer device, necessary for proper time keeping when apm or
other power management devices are enabled.


68737 14-Nov-2000 jhb

Always enable interrupts during fork_trampoline() after releasing the
sched_lock. This is needed for kernel threads that are created before
interrupts are enabled. kthreads created by kld's that are created at
SI_SUB_KLD such as the random kthread.

Tested by: phk


68697 14-Nov-2000 jkh

Proper capitalization of PCMCIA (and avoid matching pcm)


68696 14-Nov-2000 jkh

In the year 2000, I think it's perfectly reasonable to include audio
support by default in GENERIC.


68684 13-Nov-2000 jhb

Fix a bug with handling of the saved interrupt state for spin mutexes in
the MTX_EXIT_WITH_RECURSION() assembly macro (currently unused).

Submitted by: bde


68676 13-Nov-2000 nyan

Initialize bus_space_handle_t with zero (for PC-98).


68520 09-Nov-2000 marcel

Make MINSIGSTKSZ machine dependent, and have the sigaltstack
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.

The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.

The native MINSIGSTKSZ is now defined as follows:

Arch MINSIGSTKSZ
---- -----------
alpha 4096
i386 2048
ia64 12288

Reviewed by: mjacob
Suggested by: bde


68490 08-Nov-2000 asmodai

Fix some further english grammar and typo's.


68489 08-Nov-2000 asmodai

Fix typo's: UPGRADE_CPU_HW_CACHE -> CPU_UPGRADE_HW_CACHE


68485 08-Nov-2000 msmith

Hack to work around a probe which will lock up at least some i450GX-based
systems.

From the PR:

When 'probe.slot' is PCI_SLOTMAX (== 31) and 'probe.func' is 7,
call to 'pci_cfgread()' here and machine suddenly hangs up.
I don't know why... (or 450GX chipset's bug?)

PR: i386/20379
Submitted by: Masayuki FUKUI <fukui@sonic.nm.fujitsu.co.jp>


68450 07-Nov-2000 imp

Make the ISA nic section look like the other device sections with
comments on the same line like so:
device foo # FooInc Brand NetEther cards

Also, move the wireless NIC cards to their own section.

Add commented out wl driver in wireless section.

Remove obsolete or redundant comments about some of the wireless cards
that used to apply but don't since we've removed 'at foobus'.

There should be no functional changes in this change.


68448 07-Nov-2000 imp

Minor ordering changes to make more sections strictly alphabetical.


68445 07-Nov-2000 semenu

Synced tx(4) driver descriptions + ``device tx'' line moved to the
list of drivers using miibus.

PR: kern/22556


68441 07-Nov-2000 alfred

Protect against an infinite loop when prefaulting pages. This can
happen when the vm system maps past the end of an object or tries
to map a zero length object, the pmap layer misses the fact that
offsets wrap into negative numbers and we get stuck.

Found by: Joost Pol aka Nohican <nohican@marcella.niets.org>
Submitted by: tegge


68418 07-Nov-2000 wpaul

The vx driver no longer needs the PCI compat shims. Also should now
work on the alpha (at least the PCI part should).


68218 02-Nov-2000 msmith

Improve the PCI interrupt routing code. Now the process is as follows:

- Look for a hardwired interrupt in the routing table for this
bus/device/pin (we already did this).
- Look for another device with the same link byte which has a hardwired
interrupt.
- Look for a PCI device matching an entry with the same link byte
which has already been assigned an interrupt, and use that.
- Look for a routable interrupt listed in the "PCI only" interrupts
field and use that.
- Pick the first interrupt that's marked as routable and use that.


68063 31-Oct-2000 phk

Deprecate devsw->d_bmaj entirely.

This removes support for booting current kernels with very old bootblocks.

Device driver writers: Please remove initializations for the d_bmaj
field in your cdevsw{}.


67899 29-Oct-2000 phk

Remove unneeded <stddef.h> #includes.


67882 29-Oct-2000 phk

Remove unneeded #include <sys/proc.h> lines.


67814 28-Oct-2000 nik

Add a brief comment telling people to retain 'device miibus' as necessary.

PR: docs/21981
Submitted by: Matthew Emmerton <matt@gsicomp.on.ca>


67760 28-Oct-2000 msmith

FreeBSD-specific OSD (operating system dependant) modules for the Intel
ACPICA code.


67759 28-Oct-2000 phk

Revert two experimental changes which escaped from my devel machine.


67740 28-Oct-2000 jhb

The x86 atomic operations are already locked, so they do not need an
additional locked instruction to guarantee a write barrier for the acquire
variants.

Approved by: dfr
Pointy hat to: jhb


67732 27-Oct-2000 jhb

Fix a couple of whitespace nits.


67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


67694 27-Oct-2000 bde

Declare or #define per-cpu globals in <machine/globals.h> in all cases.
The i386 UP case was messily different.


67689 27-Oct-2000 markm

As the blocking model has seems to be troublesome for many, disable
it for now with an option.

This option is already deprecated, and will be removed when the
entropy-harvesting code is fast enough to warrant it.


67587 25-Oct-2000 jhb

- Add atomic_cmpset_{acq_,rel_,}_long
- Add in atomic operations for 8-bit, 16-bit, and 32-bit integers


67563 25-Oct-2000 ps

Fast interrupts have no associated process, therefore do not try
and schedule it. This fixes booting machines with broken MP tables.


67562 25-Oct-2000 n_hibma

The USB scanner driver. To be used together with SANE.


67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


67477 23-Oct-2000 jhb

Don't dink with interrupts in vm_page_zero_idle(). This code assumed it
was being called with interrupts disabled, when it was actually being called
with them enabled.

Pointed out by: tegge


67468 23-Oct-2000 non

Add PC-Card/ISA SCSI host adpater drivers from NetBSD/pc98
(a NetBSD port for NEC PC-98x1 machines). They are ncv for NCR 53C500,
nsp for Workbit Ninja SCSI-3, and stg for TMC 18C30 and 18C50.

I thank NetBSD/pc98 and bsd-nomads people.

Obtained from: NetBSD/pc98


67403 20-Oct-2000 jhb

Define the mtx_legal2block() macro used in the witness code that managed
to get lost during the MI mutex conversion.

Reported by: Steve Kargl <sgk@troutmask.apl.washington.edu>


67378 20-Oct-2000 ache

Return -10000 in pci_hostb_probe to allow agp driver (disabled otherwise)


67377 20-Oct-2000 ache

Add i815 Host to Hub


67365 20-Oct-2000 jhb

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


67361 20-Oct-2000 jhb

Actually harvest interrupt threads when the last handler is removed from a
thread.


67360 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Use cpu_throw() instead of cpu_switch() during cpu_exit() since we don't
need to save our previous state.


67358 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Catch up to the MI mutex structure due to saveflags,saveipl,savepsr
becoming saveintr.


67357 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for Giant and sched_lock.


67356 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- machine/ipl.h -> sys/ipl.h
- Use MUTEX_DECLARE() for clock_lock


67352 20-Oct-2000 jhb

- Make the mutex code almost completely machine independent. This greatly
reducues the maintenance load for the mutex code. The only MD portions
of the mutex code are in machine/mutex.h now, which include the assembly
macros for handling mutexes as well as optionally overriding the mutex
micro-operations. For example, we use optimized micro-ops on the x86
platform #ifndef I386_CPU.
- Change the behavior of the SMP_DEBUG kernel option. In the new code,
mtx_assert() only depends on INVARIANTS, allowing other kernel developers
to have working mutex assertiions without having to include all of the
mutex debugging code. The SMP_DEBUG kernel option has been renamed to
MUTEX_DEBUG and now just controls extra mutex debugging code.
- Abolish the ugly mtx_f hack. Instead, we dynamically allocate
seperate mtx_debug structures on the fly in mtx_init, except for mutexes
that are initiated very early in the boot process. These mutexes
are declared using a special MUTEX_DECLARE() macro, and use a new
flag MTX_COLD when calling mtx_init. This is still somewhat hackish,
but it is less evil than the mtx_f filler struct, and the mtx struct is
now the same size with and without mutex debugging code.
- Add some micro-micro-operation macros for doing the actual atomic
operations on the mutex mtx_lock field to make it easier for other archs
to override/optimize mutex ops if needed. These new tiny ops also clean
up the code in some places by replacing long atomic operation function
calls that spanned 2-3 lines with a short 1-line macro call.
- Don't call mi_switch() from mtx_enter_hard() when we block while trying
to obtain a sleep mutex. Calling mi_switch() would bogusly release
Giant before switching to the next process. Instead, inline most of the
code from mi_switch() in the mtx_enter_hard() function. Note that when
we finally kill Giant we can back this out and go back to calling
mi_switch().


67351 20-Oct-2000 jhb

- Expand the set of atomic operations to optionally include memory barriers
in most of the atomic operations. Now for these operations, you can
use the normal atomic operation, you can use the operation with a read
barrier, or you can use the operation with a write barrier. The function
names follow the same semantics used in the ia64 instruction set. An
atomic operation with a read barrier has the extra suffix 'acq', due to
it having "acquire" semantics. An atomic operation with a write barrier
has the extra suffix 'rel'. These suffixes are inserted between the
name of the operation to perform and the typename. For example, the
atomic_add_int() function now has 3 variants:
- atomic_add_int() - this is the same as the previous function
- atomic_add_acq_int() - this function combines the add operation with a
read memory barrier
- atomic_add_rel_int() - this function combines the add operation with a
write memory barrier
- Add 'ptr' to the list of types that we can perform atomic operations
on. This allows one to do atomic operations on uintptr_t's. This is
useful in the mutex code, for example, because the actual mutex lock is
a pointer.
- Add two new operations for doing loads and stores with memory barriers.
The new load operations use a read barrier before the load, and the
new store operations use a write barrier after the load. For example,
atomic_load_acq_int() will atomically load an integer as well as
enforcing a read barrier.


67350 20-Oct-2000 jhb

Axe the barrier_{read,write,rw}() helper functions as this method of
doing memory barriers doesn't really scale well for the ia64. Also,
memory barriers are more a property of the CPU than bus space.

Requested by: dfr


67346 20-Oct-2000 kato

Convert the type of bus_space_handle_t of pc98 from structure into
pointer to structure.

Reviewed by: nyan


67311 19-Oct-2000 msmith

Call the BIOS to route the selected interrupt. Correctly calculate the
interrupt from the PCI routing table (ffs returns 1 for the rightmost
bit, not 0).


67310 19-Oct-2000 msmith

Add PCI BIOS function codes for IRQ routing fetch and route.


67308 19-Oct-2000 jhb

Axe the idle_event eventhandler, and add a MD cpu_idle function used
for things such as halting CPU's, idling CPU's, etc.

Discussed with: msmith


67285 18-Oct-2000 jhb

Add in a simple API for memory barriers to machine/bus.h:
- barrier_read() enforces a memory read barrier
- barrier_write() enforces a memory write barrier
- barrier_rw() enforces a memory read/write barrier


67268 18-Oct-2000 mdodd

Use appropriate resource management accessors instead of directly
referencing structure members.

Use rman_get_size() instead of end - start + 1.


67265 17-Oct-2000 jhb

- Catch up to moving headers, machine/ipl.h -> sys/ipl.h
- Fix some whitespace bogons.

Submitted by: bde (2)


67247 17-Oct-2000 ps

Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before: After:
WCE TIME WCE TIME
------------------ ------------------
1 141.820972 1 15.600111
0 797.265072 0 65.480465

Obtained from: Yahoo!
Reviewed by: peter


67223 16-Oct-2000 imp

Add types and prototypes.

Submitted by: msmith


67186 16-Oct-2000 imp

Remove debug writes introduced in prior commit


67185 16-Oct-2000 imp

Add the ability to use the $PIR table in the BIOS to route interrupts
on demand.

Submitted by: msmith


67164 15-Oct-2000 phk

Remove unneeded #include <machine/clock.h>


67158 15-Oct-2000 phk

Move DELAY() from <machine/clock.h> to <sys/systm.h>


67126 14-Oct-2000 alc

Change the text for the ServerWorks north bridge chips. RCC is now
officially listed as ServerWorks by www.pcisig.com.


67095 13-Oct-2000 peter

savectx() is now used exclusively by the crash dump system. Move the
i386 specific gunk (copy %cr3 to the pcb) from the MI dumpsys() to the
MD savectx().


67030 12-Oct-2000 bde

Removed unused include of <machine/lock.h>. The locking interface stopped
being (ab)used here in rev.1.97.


67011 12-Oct-2000 bde

Moved the definitions of AST_PENDING and AST_RESCHED to the correct place.


66994 12-Oct-2000 msmith

Bring the 'twe' driver back now that we think it should work.


66985 11-Oct-2000 msmith

When testing for PCI bus overlap with another enumerator, make sure we
check for the right bus number. This is still not quite right, but
fixes things for multi-bus machines again.

Submitted by: tegge


66855 09-Oct-2000 bde

Unremoved used include of <machine/ipl.h>. Removing it in rev.1.95
significantly pessimized syscalls by arranging to do null rescheduling
on return from every syscall. (AST_RESCHED was not defined, and the
mask ~AST_RESCHED gets replaced by the useless mask ~0. This bug has
been fixed before, in rev.1.92.)


66843 09-Oct-2000 msmith

Only attach "legacy" PCI busses if none have been attached via any other
method.


66739 06-Oct-2000 bde

Work around a bug by adding struct tags. gcc-2.95 apparently gets the
check in the [basic.link] section of the C++ standard wrong. gcc-2.7.2.3
apparently doesn't do the check, so the bug doesn't affect RELENG_3.

PR: 16170, 21427
Submitted by: Max Khon <fjoe@lark.websci.ru> (i386 version)
Discussed with: jdp


66716 06-Oct-2000 jhb

- Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's. This is necessary for the
clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
This is needed because both hardclock() and statclock() need to run in
the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
during the clock interrupt. Instead, use two p_flag's in the proc struct
to mark the current process as having a pending SIGVTALRM or a SIGPROF
and let them be delivered during ast() when hardclock() has finished
running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways. It was
broken on the x86 if it was turned on since cpl is gone. It's only use
was to bogusly run softclock() directly during hardclock() rather than
scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
mutex. Since the spin mutex already handles disabling/restoring
interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
signals in hardclock() that are to be delivered in ast().

Submitted by: jakeb (making statclock safe in a fast interrupt)
Submitted by: cp (concept of delaying signals until ast())


66715 06-Oct-2000 jhb

currentldt is now a "special" global-data variable, and as such, there
is no actual currentldt integer variable directly. Thus, don't claim that
there is.


66714 06-Oct-2000 jhb

Interrupt frames don't include the saved cpl anymore since cpl is dead.


66713 06-Oct-2000 jhb

Various whitespace cleanups after the SMPng commit, which jumbled things
around a bit in the trap handling code.


66712 06-Oct-2000 jhb

Don't treat a kernel stack fault the same as a general protect fault or
a segment not present fault in the non-vm86 case.


66711 06-Oct-2000 jhb

Remove an unnecessary sti and spl0() in fork_trampoline. Interrupts
should be enabled by MTX_EXIT() now when it releases the sched_lock.


66698 05-Oct-2000 jhb

- Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
completely from the x86 hardware interrupt code.
- The ihandlers array is now gone. Instead, there is a MI shandlers array
that just contains SWI handlers.
- Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by: dfr


66696 05-Oct-2000 jhb

Replace loadandclear() with atomic_readandclear_int().


66695 05-Oct-2000 jhb

Add atomic_readandclear_int and atomic_readandclear_long.


66692 05-Oct-2000 jhb

Make the gd_currentldt member in struct globaldata unconditional so
that this header doesn't depend on USER_LDT. This fixes the USER_LDT
breakage with SMP kernels.


66614 04-Oct-2000 jasone

Reduce userland namespace polution.


66559 02-Oct-2000 peter

Fix a cosmetic sign problem on machines with 4G of ram.
0x00312000 - 0xe5fe7fff, 3855441920 bytes (4294859990 pages)
.. becomes
0x00314000 - 0xe5fe7fff, 3855433728 bytes (941268 pages)


66529 02-Oct-2000 msmith

Move the i386 PCI attachment code out of i386/isa back into i386/pci.

Split out the configuration space access primitives, as these are needed
elsewhere as well.


66503 01-Oct-2000 peter

Fix the no-pci case of attaching isa, eisa and mca devices.
device_add_child() is meant to be called by the bus add_child method, not
to replace the bus add_child method. We could have called nexus_add_device
directly too, that would have also worked.

PR: 21657
Tested by: markm


66489 30-Sep-2000 msmith

More updates to the ACPI code:

- Move all register I/O into acpi_io.c
- Move event handling into acpi_event.c
- Reorganise headers into acpivar/acpireg/acpiio
- Move find-RSDT and find-ACPI-owned-memory into acpi_machdep
- Allocate all resources (except those detailed only by AML)
as real resources. Add infrastructure that will make adding
resource support to AML code easy.
- Remove all ACPI #ifdefs in non-ACPI code
- Removed unnecessary includes
- Minor style and commenting fixes

Reviewed by: iwasaki


66475 30-Sep-2000 bmilekic

Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.

Here's a list of things that have changed (I may have left out a few); for a
relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal

* Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
nobody uses anymore. It was great while it lasted, but now we're moving
onto bigger and better things (Approved by: wollman).

* Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
new allocations which grab the necessary lock.

* Make sure that necessary mbstat variables are manipulated with
corresponding atomic() routines.

* Changed the "wait" routines, cleaned it up, made one routine that does
the job.

* Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
are now included in the generalized "wait" routines.

* Sleep routines now use msleep().

* Free lists have locks.

* etc... probably other stuff I'm missing...

Things to look out for and work on later:

* find a better way to (dynamically) adjust EXT_COUNTERS

* move necessity to recurse on a lock from drain routines by providing
lock-free lower-level version of MFREE() (and possibly m_free()?).

* checkout include of mutex.h in sys/sys/mbuf.h - probably violating
general philosophy here.

The code has been reviewed quite a bit, but problems may arise... please,
don't panic! Send me Emails: bmilekic@freebsd.org

Reviewed by: jlemon, cp, alfred, others?


66464 29-Sep-2000 dfr

Ansify and fix warnings.


66458 29-Sep-2000 dfr

This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.


66442 29-Sep-2000 peter

Fill in some more missing bits from cpu_features according to the Intel
Pentium4 cpuid docs.


66441 29-Sep-2000 peter

First shot at identifying the Pentum 4 acording to our reading of the
the cpu_id extensions in the Intel docs. There is more info available.
See the following URL for more details.
http://developer.intel.com/design/processor/future/manuals/CPUID_Supplement.htm

Requested by: Intel


66416 28-Sep-2000 peter

Get out the roto-rooter and clean up the abuse of nexus ivars by the
i386/isa/pcibus.c. This gets -current running again on multiple host->pci
machines after the most recent nexus commits. I had discussed this with
Mike Smith, but ended up doing it slightly differently to what we
discussed as it turned out cleaner this way. Mike was suggesting creating
a new resource (SYS_RES_PCIBUS) or something and using *_[gs]et_resource(),
but IMHO that wasn't ideal as SYS_RES_* is meant to be a global platform
property, not a quirk of a given implementation. This does use the ivar
methods but does so properly. It also now prints the physical pci bus that
a host->pci bridge (pcib) corresponds to.


66407 27-Sep-2000 asmodai

Fix spelling of Katmai [Katami].


66400 27-Sep-2000 msmith

Since the nexus is responsible for creating the I/O resources (ports, memory)
it ought to be able to deal with devices directly attached to it having
allocations of such resources. Make it so.


66383 26-Sep-2000 kato

Recognize new Pentium III Xeon (stepping A0).

PR: 21233
Submitted by: ade


66328 24-Sep-2000 jhb

Fix the assmebly mutex macros to handle saving/restoring interrupt state
properly. Fix the recursive mutex macros to actually compile. At the
moment we only use MTX_EXIT anyways.


66296 23-Sep-2000 ps

Move MAXCPU from machine/smp.h to machine/param.h to fix breakage
with !SMP kernels. Also, replace NCPUS with MAXCPU since they are
redundant.


66280 23-Sep-2000 jasone

#include <sys/proc.h> in order to get curproc. This seems to be the lesser
of two evils; the greater evil is requiring sys/proc.h to be included
before including machine/mutex.h.


66277 22-Sep-2000 ps

Remove the NCPU, NAPIC, NBUS, NINTR config options. Make NAPIC,
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.

Reviewed by: peter


66211 22-Sep-2000 jhb

Teach MTX_EXIT_RECURSE that the recursion count is a 32-bit integer,
not a 16-bit one.


66206 22-Sep-2000 msmith

Implement halt-on-idle in the !SMP case, which should significantly
reduce power consumption on most systems.


66174 21-Sep-2000 bsd

Add a couple of debug register helper functions to assist in setting
and clearing watchpoints.

Reviewed by: jwd@FreeBSD.org, -hackers@


66131 20-Sep-2000 wpaul

Add a new driver for the AMD PCnet/FAST, FAST+ and Home PCI adapters.
Previously, these cards were supported by the lnc driver (and they
still are, but the pcn driver will claim them first), which is fine
except the lnc driver runs them in 16-bit LANCE compatibility mode.
The pcn driver runs these chips in 32-bit mode and uses the RX alignment
feature to achieve zero-copy receive. (Which puts it in the same
class as the xl, fxp and tl chipsets.) This driver is also MI, so it
will work on the x86 and alpha platforms. (The lnc driver is still
needed to support non-PCI cards. At some point, I'll need to newbusify
it so that it too will me MI.)

The Am79c978 HomePNA adapter is also supported.


66069 19-Sep-2000 eivind

Better error message when booting an SMP kernel on an UP system.


65932 16-Sep-2000 phk

Make LINT compile.


65904 15-Sep-2000 jhb

- Add a new process flag P_NOLOAD that marks a process that should be
ignored during load average calcuations.
- Set this flag for the idle processes and the softinterrupt process.


65873 15-Sep-2000 nyan

Moved the fe driver from the compat section to the correct section.

Submitted by: sanpei


65871 15-Sep-2000 jhb

Check to see if we actually have an interrupt descriptor and an interrupt
thread for each interrupt that comes in. If we don't, log the event and
return immediately for a hardware interrupt. For a softinterrupt, panic
instead.

Submitted by: ben


65856 14-Sep-2000 jhb

Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just
use struct mtx, struct witness, and struct witness_blessed.

Requested by: bde


65822 13-Sep-2000 jhb

- Remove the inthand2_t type and use the equivalent driver_intr_t type from
newbus for referencing device interrupt handlers.
- Move the 'struct intrec' type which describes interrupt sources into
sys/interrupt.h instead of making it just be a x86 structure.
- Don't create 'ithd' and 'intrec' typedefs, instead, just use 'struct ithd'
and 'struct intrec'
- Move the code to translate new-bus interrupt flags into an interrupt thread
priority out of the x86 nexus code and into a MI ithread_priority()
function in sys/kern/kern_intr.c.
- Remove now-uneeded x86-specific headers from sys/dev/ata/ata-all.c and
sys/pci/pci_compat.c.


65815 13-Sep-2000 bde

Be more careful about cleaning up the stack after function calls early
in the boot. The cleanup must be done in one of the few ways that
db_numargs() understands, so that early backtraces in ddb don't underrun
the stack. The underruns caused reboots a few years ago when there
was an unmapped page above the stack (trapping to abort the command
doesn't work early).

Cleaned up some nearby code.


65811 13-Sep-2000 bde

Fixed hang on booting with -d. mtx_enter() was called on an uninitialized
lock. The quick fix in trap.c was not quite the version tested and had no
effect; back it out.


65793 13-Sep-2000 msmith

A new driver for PCI:SCSI RAID controllers based on the Adaptec FSA
design. This includes integrated Dell RAID controllers, the Dell
PERC 2/QC and the HP NetRAID-4M.


65782 12-Sep-2000 jhb

Clean up process accounting some more. Unfortunately, it is still not
quite right on i386 as the CPU who runs statclock() doesn't have a valid
clockframe to calculate statistics with.


65781 12-Sep-2000 bde

Quick fix for hang on booting with -d. mtx_enter() was called before
curproc was initialized. curproc == NULL was interpreted as matching
the process holding Giant... Just skip mtx_enter() and mtx_exit() in
trap() if (curproc == NULL && cold) (&& cold for safety).


65778 12-Sep-2000 bde

Don't panic for delivery of a multiplexed SWI. Most SWI handlers
don't take an arg, but swi_generic() is special in order to avoid one
whole conditional branch in the old SWI dispatch code. The new SWI
dispatch code passed it a garbage arg. Bypass swi_generic() and call
swi_dispatcher() directly, like the corresponding alpha code has always
done.

The panic was rare because because it only occurred if more than one
of the {sio,cy,rc} drivers was configured and one was active, and the
cy driver doesn't even compile.


65776 12-Sep-2000 markm

Turn the /dev/random device into a (pseudo-)device, not an option.

(I didn't realise that it was this easy!)
Submitted by: jhb


65761 11-Sep-2000 billf

Move tx to the list of drivers that now require miibus.


65713 11-Sep-2000 jhb

When doing statistics for statclock on other CPU's, use the other CPUs'
idleproc pointers instead of our own for comparisons.

Submitted by: tegge


65651 09-Sep-2000 jasone

Style cleanups. No functional changes.


65650 09-Sep-2000 jasone

Add file and line arguments to WITNESS_ENTER() and WITNESS_EXIT, since
__FILE__ and __LINE__ don't get expanded usefully in inline functions.

Add const to all witness*() arguments that are filenames.


65624 08-Sep-2000 jasone

Rename mtx_enter(), mtx_try_enter(), and mtx_exit() and wrap them with cpp
macros that expand to pass filename and line number information. This is
necessary since we're using inline functions instead of macros now.

Add const to the filename pointers passed througout the mtx and witness
code.


65620 08-Sep-2000 jhb

Remove an unneeded extern declaration of cp_time.


65597 08-Sep-2000 jake

Really fix USER_LDT. (Don't use currentldt as an L-value.)


65587 07-Sep-2000 jake

Don't use currentldt as an L-value.
This should fix options USER_LDT.

Reported-by: John Hay <jhay@zibbi.mikom.csir.co.za>
Nickolay Dudorov <nnd@mail.nsk.ru>


65575 07-Sep-2000 jhb

Test for both SMP and I386_CPU being set before generating an error.


65570 07-Sep-2000 nyan

Don't assume that address of I/O address table increase (PC-98 only).

Pointed out by: Tomokazu HARADA <tkhara@osk4.3web.ne.jp>


65557 07-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


65556 07-Sep-2000 jasone

Add KTR, a facility that logs kernel events in order to to facilitate
debugging.

Acquired from: BSDi (BSD/OS)
Submitted by: dfr, grog, jake, jhb


65514 06-Sep-2000 phk

Introduce atomic_cmpset_int() and atomic_cmpset_long() from SMPng a
few hours earlier than the rest.

The next DEVFS commit needs these functions.

Alpha versions by: dfr
i386 versions by: jakeb

Approved by: SMPng


65500 05-Sep-2000 msmith

Teach the NFS && NFS_ROOT case how to pick up information left by the
PXE loader, and use this to build the nfs_diskless structure.


65459 05-Sep-2000 peter

Catch a few more bogosities in certain chipsets before they mess us up.
Some have dual host->PCI bridges for the same logical pci bus (!), eg:
some of the RCC chipsets. This is a 32/64 bit 33/66MHz and dual pci
voltage motherboard so persumably there are electical or signalling
differences but they are otherwise the same logical bus.
The new PCI probe code however was getting somewhat upset about it and
ended up creating two pci bridges to the same logical bus, which caused
devices on that logical bus to appear and be probed twice.

The ACPI data on this box correctly identifies this stuff, so bring on
ACPI! :-)


65389 03-Sep-2000 peter

Complain if we cannot find loader(8) metadata.


65312 01-Sep-2000 msmith

Add the 'asr' driver, supplied by Mark Salyzyn of Adaptec (nee DPT).
This provides support for the Adaptec SCSI RAID controller family,
as well as the DPT SmartRAID V and VI families.

The driver will be maintained by Mark and Adaptec, and any changes
should be referred to the MAINTAINER.


65304 31-Aug-2000 peter

Take a shot at fixing multiple pci busses on i386.
pcib_set_bus() cannot be used on the new child because it is
meant to be used on the *pci* device (it looks at the parent internally)
not the pcib being added. Bite the bullet and use ivars for the bus
number to avoid any doubts about whether the softc is consistant between
probe and attach. This should not break the Alpha code.


65292 31-Aug-2000 takawata

Merge rest piece of ACPI driver.To activate acpi driver ,add

device acpi

line. Merge finished. But still experimental phase.Need more hack!

Obtained from:ACPI for FreeBSD project


65273 31-Aug-2000 kato

Improved Cyrix 486DX supports for NEC PC-98.
- Enable WB cache via CCR2 and CR0.
- Set the need_pre_dma_flush when the CPU_I486_ON_386 option is
defined.

Submitted by: Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>


65176 28-Aug-2000 dfr

* Completely rewrite the alpha busspace to hide the implementation from
the drivers.
* Remove legacy inx/outx support from chipset and replace with macros
which call busspace.
* Rework pci config accesses to route through the pcib device instead of
calling a MD function directly.

With these changes it is possible to cleanly support machines which have
more than one independantly numbered PCI busses. As a bonus, the new
busspace implementation should be measurably faster than the old one.


65059 24-Aug-2000 peter

Comment out the static wiring of hints for GENERIC - the release process
now installs the hints file into /boot.


64989 23-Aug-2000 msmith

Add entries for the 'mly' driver. Re-group 'mly' and 'dpt' into a new
classification for RAID controllers that have CAM interfaces.


64880 20-Aug-2000 phk

Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)

Remove old DEVFS support fields from dev_t.

Make uid, gid & mode members of dev_t and set them in make_dev().

Use correct uid, gid & mode in make_dev in disk minilayer.

Add support for registering alias names for a dev_t using the
new function make_dev_alias(). These will show up as symlinks
in DEVFS.

Use makedev() rather than make_dev() for MFSs magic devices to prevent
DEVFS from noticing this abuse.

Add a field for DEVFS inode number in dev_t.

Add new DEVFS in fs/devfs.

Add devfs cloning to:
disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
md(4), tun(4), bpf(4), fd(4)

If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

Add commented out DEVFS to GENERIC


64837 19-Aug-2000 dwmalone

Replace the mbuf external reference counting code with something
that should be better.

The old code counted references to mbuf clusters by using the offset
of the cluster from the start of memory allocated for mbufs and
clusters as an index into an array of chars, which did the reference
counting. If the external storage was not a cluster then reference
counting had to be done by the code using that external storage.

NetBSD's system of linked lists of mbufs was cosidered, but Alfred
felt it would have locking issues when the kernel was made more
SMP friendly.

The system implimented uses a pool of unions to track external
storage. The union contains an int for counting the references and
a pointer for forming a free list. The reference counts are
incremented and decremented atomically and so should be SMP friendly.
This system can track reference counts for any sort of external
storage.

Access to the reference counting stuff is now through macros defined
in mbuf.h, so it should be easier to make changes to the system in
the future.

The possibility of storing the reference count in one of the
referencing mbufs was considered, but was rejected 'cos it would
often leave extra mbufs allocated. Storing the reference count in
the cluster was also considered, but because the external storage
may not be a cluster this isn't an option.

The size of the pool of reference counters is available in the
stats provided by "netstat -m".

PR: 19866
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: alfred (glanced at by others on -net)


64834 18-Aug-2000 msmith

Increase the default NAPIC from 1 to 2 as a bandaid until we allocate
these dynamically (ie. typically you shouldn't have to set NAPIC at all)


64781 17-Aug-2000 bsd

Don't let an illegal value for dr7 get set, which can lead to an
unexpected TRCTRAP.

Reported by: John W. De Boskey <jwd@FreeBSD.org>


64728 16-Aug-2000 tegge

Prepare for a cleanup of pmap module API pollution introduced by the
suggested fix in PR 12378.

Keep track of all existing pmaps independent of existing processes.

This allows for a process to temporarily connect to a different address
space without the risk of missing an update of the original address space if
the kernel grows.

pmap_pinit2() is no longer needed on the i386 platform but is left as a
stub until the alpha pmap code is updated.

PR: 12378


64592 13-Aug-2000 jhb

Include machine/cputypes.h so we get the cpu_class variable. This is needed
if I386_CPU is defined in the kernel config file.


64529 11-Aug-2000 peter

Clean up some low level bootstrap code:

- stop using the evil 'struct trapframe' argument for mi_startup()
(formerly main()). There are much better ways of doing it.
- do not use prepare_usermode() - setregs() in execve() will do it
all for us as long as the p_md.md_regs pointer is set. (which is
now done in machdep.c rather than init_main.c. The Alpha port did it
this way all along and is much cleaner).
- collect all the magic %cr0 etc register settings into one place and
have the AP's call that instead of using magic numbers (!!) that keep
changing over and over again.
- Make it safe to call kthread_create() earlier, including during the
device probe sequence. It doesn't need the callback mechanism that
NetBSD's version uses.
- kthreads created this way are root-less as they exist before the root
filesystem is mounted. init(1) is set up so that it aquires the root
pointers prior to running. If other kthreads want filesystem acccess
we can make this code more generic.
- set all threads start times once we have decided what time it is.
- init uses a trampoline rather than the evil prepare_usermode() hack.
- kern_descrip.c has a couple of tweaks to deal with forking when there
is no rootdir or cwd etc.
- adjust the early SYSINIT() sequence so that a few prereqisites are in
place. eg: make sure the run queue is initialized before doing forks.

With this, the USB code can easily create a kthread to do the device
tree discovery. (I have tested it, it works nicely).

There are still some open issues before this is truely useful.
- tsleep() does not like working before the clock is running. It
sort-of tries to spin wait, but it can do more useful things now.
- stopping a kthread in kld code at unload time is "interesting" but
we have a solution for that.

The Alpha code needs no changes for this. It already uses pretty much the
same strategies, but a little cleaner.


64494 10-Aug-2000 tegge

Don't skip IOAPIC id conflict detection when only one pci bus is present.
PR: 20312
Reviewed by: Steve Roome <steve@sse0691.bri.hp.com>


64325 07-Aug-2000 tegge

Add workaround for livelock problem when starting APs.

With more than 1 AP present, an AP could fail to properly release
the mp lock before waiting for smp_started to become nonzero.

With early startup of APs, the BSP could fail to properly release
the mp lock before waiting for smp_started to become nonzero.


64294 06-Aug-2000 ps

Change the behavior of isa_nmi to log an error message instead of
panicing and return a status so that we can decide whether to drop
into DDB or panic. If the status from isa_nmi is true, panic the
kernel based on machdep.panic_on_nmi, otherwise if DDB is
enabled, drop to DDB based on machdep.ddb_on_nmi.

Reviewed by: peter, phk


64290 06-Aug-2000 tegge

Be more verbose when changing APIC ID on an IO APIC.

Don't allow cpu entries in the MP table to contain APIC IDs out of range.

Don't write outside array boundaries if an IO APIC entry in the MP table
contains an APIC ID out of range.

Assign APIC IDs for all IO APICs according to section 3.6.6 in the
Intel MP spec:

- If the current APIC ID on an IO APIC doesn't conflict with other
IO APICs or CPUs, that APIC ID should be used. The copy of the MP
table must be updated if the corresponding APIC ID in the MP table
is different.

- If the current APIC ID was in conflict with other units, the
corresponding APIC ID specified in the MP table is checked for conflict.

- If a conflict is still found then fall back to using a new unique ID.
The copy of the MP table must be updated.

- IDs out of range is considered to be in conflict.

During these operations, the IO_TO_ID array cannot be used, since any
conflict would have caused information loss. The array is then corrected,
since all APIC ID conflicts should have been resolved.

PR: 20312, 18919


64063 31-Jul-2000 luoqi

Handle write page faults (both write only or read-modify-write) as MI vm
write-only faults. This would allow write-only mmapped regions to function
correctly.


64031 30-Jul-2000 phk

Allow use of TSC even if APM is compiled in but disabled.


63994 29-Jul-2000 obrien

Revert previous commit. Not all RAID controllers are SCSI.


63993 29-Jul-2000 obrien

Move the RAID controllers next to the SCSI controllers.


63989 29-Jul-2000 obrien

Comment out `ncr' as `sym' handles all that `ncr' does.
(only commented out to make it easy for people to find it that really
wants it.)

Asked for by: Peter


63981 28-Jul-2000 peter

Fix warning - isa/isavar.h is a prerequisite for isa/pnpvar.h


63838 25-Jul-2000 billf

s%LINT%NOTES%g


63538 19-Jul-2000 imp

Default the pcic to polling. Some laptops need to have polling mode
due to a paucity of IRQs. I have some reservations about this, so I'm
not going to MFC this just yet. I'm doing this to see how many
problems it causes so we can do this in 4.2. I've been seeing hangs
on my laptop from time to time, but sometimes it was not in polling
mode, other tmies it was. Don't know if this is one problem or more
than one.

Requested by: Sean O Connell


63140 14-Jul-2000 ps

Change the way NMI's are handled. Before, if DDB was enabled and
a NMI occured, you could type continue in DDB and the kernel would
not attempt to detect what type of NMI was recieved. Now we check
for the type of NMI first and then go to DDB if it is enabled.

This will solve the problem with having DDB enabled and getting an
NMI due to some possibly bad error and being able to continue the
operation of the kernel when you really want to panic and know
what happened.

Submitted by: jhb


62947 11-Jul-2000 tanimura

Finally merge newmidi.
(I had been busy for my own research activity until the last weekend)

Supported devices:

SB Midi Port (sbc + midi)
SB OPL3 (sbc + midi)
16550 UART (midi, needs a trick in your hint)
CS461x Midi Port (csa + midi)

OSS-compatible sequencer (seq)

Supported playing software:

playmidi (We definitely need more)

Notes:

/dev/midistat now reports installed midi drivers. /dev/sndstat reports
only pcm drivers. We need the new name(pcmstat?).

EMU8000(SB AWE) does not sound yet but does get probed so that the OPL3
synth on an AWE card works.

TODO:

MSS/PCI bridge drivers
Midi-tty interface to support general serial devices
Modules


62908 10-Jul-2000 mjacob

Removing commented out devices I added.


62870 10-Jul-2000 kris

Don't call printf with no format string.

Reviewed by: msmith


62808 08-Jul-2000 mjacob

Oops- remove the '0' appended to targbh.


62806 08-Jul-2000 mjacob

Add in the commented out SCSI device entries of

#device ses # SCSI Environmental Services (and SAF-TE)
#device targ # SCSI Target Mode Code
#device targbh0 # SCSI Target Mode Blackhole Device
#define pt # SCSI Processor Target Device

so that people know that they are there.


62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


62491 04-Jul-2000 mckusick

Update tags directive to reflect the new location of soft updates
and the reorganization of the eisa directory.


62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


62298 01-Jul-2000 bsd

Fix my own style bugs (use of spaces instead of tabs for indentation).
This is a style-only change.


62088 25-Jun-2000 markm

Duh. Fix a fatfingered patch.


62085 25-Jun-2000 markm

Fix an uninitialised variable and a function return value.

Reported by: dillon


62058 25-Jun-2000 markm

Get the build bits right for the new Architecture Independant null- and
entropy drivers.
Reviewed by: dfr(mostly)


62057 25-Jun-2000 markm

Strip out the machine-independant parts of the memory device.
/dev/(u)random, /dev/null, /dev/zero are all moving to machine-independant
drivers.
Reviewed by: dfr


62042 24-Jun-2000 fsmp

Fixed atpic_attach() for the SMP (specifically APIC_IO) case.

Approved by: msmith@freebsd.org


61996 23-Jun-2000 msmith

Make the PnP 'slopsucker' quiet in the !bootverbose case - the real NPX
probe happens much earlier, and may come to very different conclusions
about the system's NPX setup.


61995 23-Jun-2000 msmith

Add a stub driver to consume the PnP "system resource" items, and hide
them in the !bootverbose case.


61994 23-Jun-2000 msmith

Add PnP probe methods to some common AT hardware drivers. In each case,
the PnP probe is merely a stub as we make assumptions about some of this
hardware before we have probed it.

Since these devices (with the exception of the speaker) are 'standard',
suppress output in the !bootverbose case to clean up the probe messages
somewhat.


61992 23-Jun-2000 msmith

Stop trying to do anything funny with the interrupt resource range. The
AT PIC will consume IRQ 2 correctly in the !APIC_IO case.


61936 22-Jun-2000 peter

Add SOFTUPDATES to GENERIC (BOOTMFS has this filtered out)


61717 15-Jun-2000 phk

Add disk_enumerate() for finding names of disks. Vinum and libh will
need this RSN.

Remove a pointless warning in the root device locating code.

Remove the "wd" compatibility name from the "ad" driver.

WARNING: If you have not updated to use /dev/wd* in your /etc/fstab
and modern bootblocks, it would be a very good idea to do so BEFORE
you upgrade your kernel.


61692 14-Jun-2000 bde

Fixed syntax errors and style bugs in previous commit. The syntax
errors were normally harmless because they were in unreachable code
and gcc apparently doesn't check the syntax inside asm statements
that it optimizes away.


61655 14-Jun-2000 peter

s/iomem/maddr/ - these were generated from an older verion of the
gethints script. :-(


61640 13-Jun-2000 peter

Borrow phk's axe and apply the next stage of config(8)'s evolution.

Use Warner Losh's "hint" driver to decode ascii strings to fill the
resource table at boot time.

config(8) no longer generates an ioconf.c table - ie: the configuration
no longer has to be compiled into the kernel. You can reconfigure your
isa devices with the likes of this at loader(8) time:
set hint.ed.0.port=0x320

userconfig will be rewritten to use this style interface one day and will
move to /boot/userconfig.4th or something like that.

It is still possible to statically compile in a set of hints into a kernel
if you do not wish to use loader(8). See the "hints" directive in GENERIC
as an example.

All device wiring has been moved out of config(8). There is a set of
helper scripts (see i386/conf/gethints.pl, and the same for alpha and pc98)
that extract the 'at isa? port foo irq bar' from the old files and produces
a hints file. If you install this file as /boot/device.hints (and update
/boot/defaults/loader.conf - You can do a build/install in sys/boot) then
loader will load it automatically for you. You can also compile in the
hints directly with: hints "device.hints" as well.

There are a few things that I'm not too happy with yet. Under this scheme,
things like LINT would no longer be useful as "documentation" of settings.
I have renamed this file to 'NOTES' and stored the example hints strings
in it. However... this is not something that config(8) understands, so
there is a script that extracts the build-specific data from the
documentation file (NOTES) to produce a LINT that can be config'ed and
built. A stack of man4 pages will need updating. :-/

Also, since there is no longer a difference between 'device' and
'pseudo-device' I collapsed the two together, and the resulting 'device'
takes a 'number of units' for devices that still have it statically
allocated. eg: 'device fe 4' will compile the fe driver with NFE set
to 4. You can then set hints for 4 units (0 - 3). Also note that
'device fe0' will be interpreted as "zero units of 'fe'" which would be
bad, so there is a config warning for this. This is only needed for
old drivers that still have static limits on numbers of units.
All the statically limited drivers that I could find were marked.

Please exercise EXTREME CAUTION when transitioning!

Moral support by: phk, msmith, dfr, asmodai, imp, and others


61623 13-Jun-2000 kato

Recognize Coppermine Celeron processors whose CPU ID = 0x68?. They
were recognized as "Pentium III/Pentium III Xeon."


61616 13-Jun-2000 kato

Added new options CPU_PPRO2CELERON and CPU_L2_LATENCY to support
Socket 8 to 370 converters. When (1) CPU_PPRO2CELERON option is
defined, (2) Intel CPU is found and (3) CPU ID is 0x66?, L2 cache is
enabled through MSR 0x11e. The L2 cache latency value can be
specified by CPU_L2_LATENCY option. Default value of L2 cache latency
is 5.

These options are useful if you use Socket 8 to Socket 370 converter
(e.g. Power Leap's PL-Pro/II.) Most PentiumPro BIOSs don't enable L2
cache of Mendocino Celeron CPUs because they don't know Celeron CPUs.
These options are needles if you use a Coppermine (FCPGA) Celeron or
PentiumIII, becuase the L2 cache enable bit is hard wired and L2 cache
is always enabled.


61533 10-Jun-2000 msmith

Don't include opt_smp.h - we don't use anything defined in it.


61531 10-Jun-2000 msmith

Correct the tests for ISA PIC/APIC so that they actually work.


61474 10-Jun-2000 peter

Unused include: #include "ether.h"


61469 10-Jun-2000 peter

Add option BROKEN_KEYBOARD_RESET to an opt_*.h file


61422 08-Jun-2000 bde

Always include the full symbol table (as specified by its start and
end values in bootinfo) in kernel space if it is loaded (i.e., if its
specified end address is nonzero), not just if it is loaded and DDB
is configured. This may be used to fix kldsym(2) for booting without
/dev/loader; currently, in this case, it just fixes unused pointers
and wastes space consistently. For booting in the normal way with
/boot/loader, the table is included and pointed to in a different way
and kldsym(2) works.


61362 07-Jun-2000 iwasaki

Fix gdt pointer for the current cpu on SMP.
This will support power-off only. Fix for suspend/resume will come later.
Also, MFC on this is shceduled on next week.

Submitted by: sumitani@bd2.hnes.nec.co.jp
Reviewed by: jlemon


61339 06-Jun-2000 dillon

INTR_TYPE_FAST / FAST_INTR interrupts (currently just serial interrupts)
have their own lock and do not need the MP lock. The SMP cleanup was
a little too conservative in MP locking fast interrupts but at least
it's trivial to fix. MFC soon.

Submitted by: bde


61220 03-Jun-2000 bde

Fixed some style bugs in the signal handling funcations. This doesn't
change the object file.


61136 31-May-2000 msmith

Further fixes for multiple-IO-APIC systems from Tor Egge:

Further experimentation showed that some Dell 2450 machines with the
prevention kludge installed still got T_RESERVED traps. CPU interrupt
vector 0x7A was observed to be triggered. This might have been the
bitwise OR of two different vectors sent from each of the IOAPICs at
the same time.

IOAPIC #0: 0x68 --> irq 8: RTC timer interrupt
IOAPIC #1: 0x32 --> irq 18: scsi host adapter or network interface
----
0x7a --> T_RESERVED

Both IOAPICs had ID 0.

Appendix B.3 in the MP spec indicates that the operating system is
responsible for assigning unique IDs to the IOAPICs.

The enclosed patch programs the IOAPIC IDs according to the IOAPIC
entries in the MP table.

Submitted by: tegge


61132 31-May-2000 msmith

Bump the default NBUS value to 8.


61130 31-May-2000 bde

Pack the SWI bits to save some time and space.


61126 31-May-2000 bde

Add SWI_TQ_MASK to all interrupt masks except SWI_CLOCK_MASK. Use a
new macro SWI_LOW_MASK to give the mask for low priority SWIs instead
of hard-coding this mask as SWI_CLOCK_MASK.

Reviewed by: dfr


61100 30-May-2000 green

Change sl(4) configuration lines to reflect its new dynamic nature.


61081 29-May-2000 dillon

This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it. It implements a new
PG_UNMANAGED flag that has slightly different characteristics
from PG_FICTICIOUS.

A new sysctl, kern.ipc.shm_use_phys has been added to enable the
use of physically-backed sysv shared memory rather then swap-backed.
Physically backed shm segments are not tracked with PV entries,
allowing programs which use a large shm segment as a rendezvous
point to operate without eating an insane amount of KVM in the
PV entry management. Read: Oracle.

Peter's OBJT_PHYS object will also allow us to eventually implement
page-table sharing and/or 4MB physical page support for such segments.
We're half way there.


61075 29-May-2000 dfr

Add SWI_TQ_MASK to imask definition.


61074 29-May-2000 dfr

Brucify the pmap_enter_temporary() changes.


61036 28-May-2000 dfr

Add a new pmap entry point, pmap_enter_temporary() to be used during
dumps to create temporary page mappings. This replaces the use of CADDR1
which is fairly x86 specific.

Reviewed by: dillon


61009 28-May-2000 peter

Redo the isa compat driver shim so that each driver is self contained
and does not require that evil list of drivers in isa_compat.h.
It uses the same strategy that pci drivers use, namely a
COMPAT_ISA_DRIVER() macro that creates the glue on the fly.
Theoretically old-style isa drivers should be preloadable now.


60973 27-May-2000 jhb

- Remove unnecessary 'data32' and 'addr32' prefixes and #define's.
- Go ahead and use 'lgdt' again instead of hand-assembling the instruction.
During testing this code worked fine. If for some reason a 32-bit offset
is needed, 'lgdtl' should be used instead of reverting to manual machine
code.

Tested by: peter


60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


60933 25-May-2000 tegge

Reintroduce a workaround for a gas bug (misassembled lgdt instruction)
Use .code16 for the real mode part of the AP bootstrap trampoline code.


60881 24-May-2000 peter

pmap_enter() masked off the page offset bits, pmap_kenter() did not.
This (I believe) is the cause of the XFree86 startup and/or mptable(8)
panics when programs were reading from /dev/mem at non-page-aligned
offsets. The offsets were being converted into random page flags in the
page tables. :-( (including PG_PS = 4MB page size)


60862 24-May-2000 kuriyama

Add OPTi 82C700 chipset.

Submitted by: sanpei@sanpei.org
PR: kern/18155 (part of)


60847 24-May-2000 kuriyama

Add 440MX chipset.

Submitted by: YOSHIMURA Hideaki <hideakiy@cs-tokyo01.chuosystem.co.jp>
References: [bsd-nomads:13764]


60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


60804 22-May-2000 obrien

Sort the sys includes.


60798 22-May-2000 dan

sysctl'ize ICMP_BANDLIM and ICMP_BANDLIM_SUPPRESS_OUTPUT.

Suggested by: des/nbm


60755 21-May-2000 peter

Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that. In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t. This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system. (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc). Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries. This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist. This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by: dillon


60676 18-May-2000 grog

Correct previous commit: solve the "stopped clock" syndrome in remote
kernel debugger.


60672 18-May-2000 msmith

Implement real read/write barriers for the i386. Despite the comment in
previous versions of this file, some barrier functionality is required.


60668 17-May-2000 msmith

If we are running in APIC_IO mode, pretend that we didn't see the BIOS
reporting an AT PIC. We do this because otherwise the PIC will claim
IRQ 2 in an unshareable mode, preventing other devices from legitimately
using it.

For symmetry, in !APIC_IO mode, ignore the APIC if it's reported.

This is a hack; a better solution would have the PIC's driver release
the IRQ if it was not going to be active.


60497 13-May-2000 hoek

Change to comments only: spell FreeBSD.org correctly


60419 12-May-2000 jhb

Turn on USB support for most USB devices. udbp is not turned on since
NETGRAPH is not present in GENERIC at the moment. Also, change some
settings to support USB installs:

- Add KBD_INSTALL_CDEV as an option to make /dev/kbd[01] actually work.
- Turn on keyboard probing in sc0. The syscons driver will now use a
flag documented in ukbd(4) but not in sc(4) that tells syscons to
actively search for a keyboard device if none is found. This allows
USB keyboards to just be plugged in and instantly start working.
- Require the atkbd0 driver to actually probe to see if a keyboard is
there. This allows USB keyboards to be seen by sc0 if an AT keyboard
isn't plugged into the computer. This also means that you will no
longer be able to plug an AT keyboard into a machine after it has
booted a GENERIC kernel and use it. AT keyboards aren't designed for
this anyway. USB keyboards are designed for this, and they work.


60346 11-May-2000 peter

Move <machine/ipl.h> outside #ifdef SMP because it supplies AST_RESCHED.
Without this, it shows up as an undefined symbol in /kernel. (!)
(This looks very freaky when doing a nm /kernel!)


60303 10-May-2000 obrien

1. `movl' is for use with 32-bit operands. Do NOT use it with 16-bit
operands. `movw' could be used, but instead let the assembler decide
the right instruction to use.
2. AT&T asm syntax requires a leading '*' in front of the operand for
indirect calls and jumps.


60300 10-May-2000 obrien

When using _asm{} in GCC, one must specify the operand's size if one
specifies the instruction's operation size. GCC will default to 32-bit
operands reguardless of the prototype (ie, formal parameters' type)
of an inline function.


60298 10-May-2000 obrien

Do not specify the size to move. Allow the assembler to figure it out.


60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


60008 04-May-2000 wollman

Add a little do-nothing ``slopsucker'' device which gives a home
to PNP0c04 (legacy ISA coprocessor support). Tourist info.


59978 04-May-2000 msmith

Don't assume that the PCI BIOS is going to clear the unused bits in %ecx
when it returns.


59926 03-May-2000 dwhite

I mentioned yesterday that I could use some work, and Kelly says, "Commit my
PRs!" So here I go.

Add definitions for some of the AMD CPU feature bits. Also add a comment on
where to find the rest of them. This is a purely cosmetic change.

PR: i386/14438
Submitted by: Kelly Yancey <kbyanc@egroups.net>


59911 03-May-2000 imp

Move sn and cs drivers from the compat section to the real section.
Enable xe driver now that I've had reports that it works.

PR: 18323
Submitted by: MIHIRA Yoshiro-san


59877 01-May-2000 n_hibma

The USB double bulk pipe driver (Host to host cables). Currently there
are two supported chips, the NetChip 1080 (only prototypes available)
and the EzLink cable. Any other cable should be supported however as they
are all very much alike (there is a difference between them wrt
performance).

It uses Netgraph.

This driver was mostly written by Doug Ambrisko and Julian Elischer and
I would like to thank Whistle for yet another contribution. And my
aplogies to them for me sitting on the driver for so long (2 months).

Also, many thanks to Reid Augustin from NetChip for providing me with a
prototype of their 1080 chip.

Be aware of the fact that this driver is very immature and has only been
tested very lightly. If someone feels like learning about Netgraph however
this is an excellent driver to start playing with.


59839 01-May-2000 peter

Move the MSG* and SEM* options to opt_sysvipc.h
Remove evil allocation macros from machdep.c (why was that there???) and
use malloc() instead.
Move paramters out of param.h and into the code itself.
Move a bunch of internal definitions from public sys/*.h headers (without
#ifdef _KERNEL even) into the code itself.

I had hoped to make some of this more dynamic, but the cost of doing
wakeups on all sleeping processes on old arrays was too frightening.
The other possibility is to initialize on the first use, and allow
dynamic sysctl changes to parameters right until that point. That would
allow /etc/rc.sysctl to change SEM* and MSG* defaults as we presently
do with SHM*, but without the nightmare of changing a running system.


59739 29-Apr-2000 peter

Mark two functions as private.


59615 25-Apr-2000 grog

Fix a long-standing bug which caused massive character loss in remote
serial gdb: interrupts were causing either overruns or stealing
characters. Put splhigh() around the routines which transfer packets
across the line. Since this happens when the system is halted in
debug, this doesn't cause any particular problem. Now it is possible
to run the link at 115,200 bps.

PR: (not assigned yet, must be in limbo somewhere)

Add partial support for detecting non-existent gdb devices.

Add $FreeBSD$ tag.


59604 24-Apr-2000 obrien

* Use sys/sys/random.h rather than a i386 specific one.
* There was nothing that should be machine dependant about
i386/isa/random_machdep.c, so it is now sys/kern/kern_random.c.


59539 23-Apr-2000 nyan

Disable PCI BIOS on PC-98.


59495 22-Apr-2000 nyan

- PC-98 uses IRQ2 too.
- Fixed the range of DMA channels on PC-98.

Submitted by: "T.Yamaoka" <taka@windows.squares.net>


59440 20-Apr-2000 luoqi

IO apics are not necessarily page aligned, they are only required to be aligned
on 1K boundary. Correct a typo that would cause problem to a second IO apic.

Pointed out by: Steve Passe <smp.csn.net>


59368 18-Apr-2000 phk

Remove unneeded <sys/buf.h> includes.

Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.


59294 16-Apr-2000 msmith

Some more i386-only BIOS-friendliness:

- Add support for using the PCI BIOS functions for configuration space
accesses, and make this the default.

- Make PNPBIOS the default (obsoletes the PNPBIOS config option).

- Add two new boot-time tunables to disable each of the above.


59260 15-Apr-2000 asmodai

Fix typo, extentions -> extensions

Submitted by: George Cox <gjvc@sophos.com>


59249 15-Apr-2000 phk

Complete the bio/buf divorce for all code below devfs::strategy

Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.

CCD not converted yet, casts to struct buf (still safe)

atapi-cd casts to struct buf to examine B_PHYS


59058 06-Apr-2000 imp

Awi driver, ported from NetBSD from Atsushi Once-san.

From the README:
Any IEEE 802.11 cards use AMD Am79C930 and Harris (Intersil) Chipset
with PCnetMobile firmware by AMD.
BayStack 650 1Mbps Frequency Hopping PCCARD adapter
BayStack 660 2Mbps Direct Sequence PCCARD adapter
Icom SL-200 2Mbps Direct Sequence PCCARD adapter
Melco WLI-PCM 2Mbps Direct Sequence PCCARD adapter
NEL SSMagic 2Mbps Direct Sequence PCCARD adapter
Netwave AirSurfer Plus
1Mbps Frequency Hopping PCCARD adapter
Netwave AirSurfer Pro
2Mbps Direct Sequence PCCARD adapter

Known Problems:
WEP is not supported.
Does not create IBSS itself.
Cannot configure the following on FreeBSD:
selection of infrastructure/adhoc mode
ESSID
...

Submitted by: Atsushi Onoe <onoe@sm.sony.co.jp>


59008 04-Apr-2000 hm

Remove obsolete reference to PCVT_FREEBSD.


58962 03-Apr-2000 msmith

Remove the !(I386 & SMP) tests; we don't run SMP on an i386 system, and
they break the LINT build.


58941 02-Apr-2000 dillon

Make the sigprocmask() and geteuid() system calls MP SAFE. Expand
commentary for copyin/copyout to indicate that they are MP SAFE as
well.

Reviewed by: msmith


58934 02-Apr-2000 phk

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


58820 30-Mar-2000 peter

Make sysv-style shared memory tuneable params fully runtime adjustable
via sysctl. It's done pretty simply but it should be quite adequate.
Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that
went with it were wrong... we don't allocate KVM space for the pages so
that comment is bogus.. The only practical limit is how much physical
ram you want to lock up as this stuff isn't paged out or swap backed.


58786 29-Mar-2000 kato

PC-98 BIOS copies the DX register into its work area. The value of it
shows `CPUID' and it is useful to identify CPU. So, it is copied from
BIOS work area to the cpu_id variable (PC-98 only).

Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)


58764 29-Mar-2000 dillon

The SMP cleanup commit broke need_resched, this fixes that and also
removed unncessary MPLOCKED and 'lock' prefixes from the interrupt
nesting level, since (A) the MP lock is held at the time, and (B) since
the neting level is restored prior to return any interrupted code
will see a consistent value.


58762 29-Mar-2000 kato

Added indirect pio into the bus space stuff for the NEC PC-98. bus.h
includes one of bus_at386.h and bus_pc98.h. Becuase only bus_pc98.h
supports indirect pio and bus_at386.h is identical to old bus.h, there
is no functional change in PC-AT's kernels. That is, it cannot cause
performance loss.

Submitted by: nyan
Reviewed by: imp
bde and luoqi provided useful comments for earlier version.


58755 28-Mar-2000 dillon

The SMP cleanup commit broke UP compiles. Make UP compiles work again.


58717 28-Mar-2000 dillon

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


58706 27-Mar-2000 dillon

Commit the buffer cache cleanup patch to 4.x and 5.x. This patch fixes a
fragmentation problem due to geteblk() reserving too much space for the
buffer and imposes a larger granularity (16K) on KVA reservations for
the buffer cache to avoid fragmentation issues. The buffer cache size
calculations have been redone to simplify them (fewer defines, better
comments, less chance of running out of KVA).

The geteblk() fix solves a performance problem that DG was able reproduce.

This patch does not completely fix the KVA fragmentation problems, but
it goes a long way

Mostly Reviewed by: bde and others
Approved by: jkh


58440 21-Mar-2000 dan

Include a space between hash ('#') and 'Berkeley packet filter' like
all the other comments have.


58377 20-Mar-2000 phk

Isolate the Timecounter internals in their own two files.

Make the public interface more systematically named.

Remove the alternate method, it doesn't do any good, only ruins performance.

Add counters to profile the usage of the 8 access functions.

Apply the beer-ware to my code.

The weird +/- counts are caused by two repocopies behind the scenes:
kern/kern_clock.c -> kern/kern_tc.c
sys/time.h -> sys/timetc.h
(thanks peter!)


58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


58342 20-Mar-2000 cracauer

Exchange numerical values for FPE_INTDIV and FPE_INTOVF, so that they
are compatible with the older ones implemented in FreeBSD 3.x.

PR: 15488


58288 19-Mar-2000 peter

Document and supply COMPAT_OLDPCI and COMPAT_OLDISA so 'make release'
still works.


58287 19-Mar-2000 peter

Connect the ISA and PCI compatability shims to an option. In this case
it's options COMPAT_OLDISA and COMPAT_OLDPCI. This is meant to be a
fairly strong incentive to update the older drivers to newbus, but doesn't
(quite) leave anybody hanging with no hardware support. I was talking with
a few folks and I was encouraged to simply break or disable the shims but
that was a bit too drastic for my liking.


58133 16-Mar-2000 n_hibma

Please welcome the URio driver. Written by
Iwasa Kazmi <kzmi\@ca2.so-net.ne.jp>


58132 16-Mar-2000 phk

Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.


57996 13-Mar-2000 bde

Disabled the optimization of not doing an invltlb_1pg() when changing
pte's from zero. The TLB is supposed to be invalidated when pte's are
changed _to_ zero, but this doesn't occur in all cases for global pages
(PG_G stops invltlb() from working, and invltlb_1pg() is not used
enough).

PR: 14141, 16568
Submitted by: dillon


57973 13-Mar-2000 phk

Stop isadma from abusing the B_READ, B_RAW and B_WRITE flags.

Define ISADMA_{READ,WRITE,RAW} macros with the same numeric
values as the B_{READ,WRITE,RAW} and use them instead throughout.


57890 10-Mar-2000 cracauer

Change the default FPU control word so that exceptions for new
processes are now masked until set by fpsetmask(3).

Submitted by: bde
Approved by: jkh, bde


57863 09-Mar-2000 jlemon

Add Compaq `ida' driver to GENERIC, update it's LINT entry.

Approved by: jordan


57704 02-Mar-2000 dufault

I applied the wrong patch set. Back out anything associated
with the known bogus currtpriority. This undoes the previous changes to
sys/i386/i386/trap.c, sys/alpha/alpha/trap.c, sys/sys/systm.h

Now we have the patch set approved by bde.

Approved by: bde


57701 02-Mar-2000 dufault

Patches that eliminate extra context switches in FIFO case.
Fixes p1003_1b regression test in the simple case of no RR and
FIFO processes competing.

Reviewed by: jkh, bde


57571 28-Feb-2000 bsd

Reset the hardware debug registers when exec'ing a new image.

Reviewed by: bde,jlemon
Approved by: jkh


57531 27-Feb-2000 green

Do some cleanups of the IPv6 stuff. This is a non-functional change.

Approved by: jkh


57524 26-Feb-2000 jkh

Enable IPv6 support by default.


57402 23-Feb-2000 dfr

Add a workaround to allow us to detect the second pci bus on an HP
Netserver LS/2.

Approved by: jkh


57376 21-Feb-2000 bsd

Fix an __asm operand constraint which broke the -O3 and -O0 builds.

Submitted by: Seigo Tanimura <tanimura@freebsd.org>
Approved by: jkh


57362 20-Feb-2000 bsd

Don't forget to reset the hardware debug registers when a process that
was using them exits.

Don't allow a user process to cause the kernel to take a TRCTRAP on a
user space address.

Reviewed by: jlemon, sef
Approved by: jkh


57359 20-Feb-2000 n_hibma

Update the documentation to reflect Bill Paul's latest changes.


57251 16-Feb-2000 yokota

Make it clear that 'options XSERVER' is for pcvt and not for syscons.

Submitted by: Doug Barton <Doug@gorean.org>
Approved by: jkh


57181 13-Feb-2000 dfr

Fix an uninitialised variable which affected probing on some machines.

Approved by: jkh
Reviewed by: gallatin


57178 13-Feb-2000 peter

Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs. Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by: jkh


57168 12-Feb-2000 obrien

Document the support in the kernel for hardware debug registers on the
ix86 platform which allows for hardware watchpoints, etc...

Submitted by: Brian Dean <brdean@unx.sas.com>


57092 09-Feb-2000 gallatin

Allow allows peer pci buses which are directly connected to the RCC host pci
chipset to be probed & attached on newer Dell PowerEdge servers, such as
the 2400 and 4400.

Reviewed by: dfr, msmith, jlemon
Tested by: hnokubi@yyy.or.jp (in a previous incantation)
Approved by: jkh


57021 07-Feb-2000 n_hibma

Add PCI Id's for i810 chipsets.

PR: 16517
Submitted by: SAKIYAMA Nobuo <sakichan@lares.dti.ne.jp>
Approved by: jhk


56983 04-Feb-2000 jkh

Clean up POSIX options, syncronize generics.


56935 01-Feb-2000 n_hibma

da0 -> da


56867 29-Jan-2000 peter

Remove 'conflicts' token - it has been effectively doing absolutely
nothing for quite some time. The only thing that cared was userconfig,
but it was for one invisible device so we never saw it's effects.


56845 29-Jan-2000 peter

Remove a bunch of unused (NO-OP) #if NFOO > 0 type includes and some
#include "foo.h" headers.


56814 29-Jan-2000 peter

Remove a workaround for a gas bug. It couldn't assemble a certain
lgdt instruction, but the binutils based one is fine and has been
for ages.


56797 29-Jan-2000 kato

Simplify messages of Pentium II, Pentium II Xeon, Celeron, Pentium III
and Pentium III Xeon CPUs. If a CPU is one of Pentium II, Pentium II
Xeon and Celeron, the message is always "Pentium II/Pentium II
Xeon/Celeron". If a CPU is one of Pentium III and Pentium III Xeon,
the message is always "Pentium III/Pentium III Xeon".


56724 28-Jan-2000 imp

Mitigate the stream.c attacks

o Drop all broadcast and multicast source addresses in tcp_input.
o Enable ICMP_BANDLIM in GENERIC.
o Change default to 200/s from 100/s. This will still stop the attack, but
is conservative enough to do this close to code freeze.

This is not the optimal patch for the problem, but is likely the least
intrusive patch that can be made for this.

Obtained from: Don Lewis and Matt Dillon.
Reviewed by: freebsd-security


56657 27-Jan-2000 mckusick

Add soft updates to the set of things being tagged. Syntax cleanup.


56623 26-Jan-2000 msmith

Correctly initialise the available IRQ numbers in the APIC_IO case.
IRQ 2 was being unilaterally disallowed, which is only appropriate if
the interrupt hardware is the traditional chained PIC arrangement.

Reviewed by: tegge (in principle)


56615 25-Jan-2000 dfr

Use device_printf() instead of device_print_prettyname().


56581 25-Jan-2000 bde

Fixed the profiling version ALTENTRY(). Again. The previous version
didn't set up the frame pointer before calling mcount, and then jumped
to the wrong place in ENTRY() to defeat the point of the jump.


56514 24-Jan-2000 peter

Remove a bunch of no-op "port ?" and "irq ?" declarations.


56503 24-Jan-2000 bde

Removed bogus quotes and unmangled related contractions.
"ktrace(1) syscall trace" -> "ktrace(1)".


56476 23-Jan-2000 peter

Some formatting cleanups and remove comments about numbers of units that
were intended to head off confusion about the trailing '0'.


56456 23-Jan-2000 peter

Drop 'at ppbus?' and the trailing '0' from the ppbus children.


56441 23-Jan-2000 peter

Update GENERIC/LINT to leave out the useless digit at the end of pci
or other unwired devices.


56425 23-Jan-2000 imp

Add the two wireless pccard nics.


56379 21-Jan-2000 wilko

updated comments


56309 20-Jan-2000 jasone

Move ENTRY and ALTENTRY definitions to asm.h where they belong.

Unbreak profiling. Again.

Submitted by: bde


56243 18-Jan-2000 billf

Cast rman_get_virtual() to a vm_offset_t.

Submitted by: msmith


56237 18-Jan-2000 alfred

unbreak (rv -> r), afaik what Mike intended, boots fine on my machine


56225 18-Jan-2000 jkh

Enable POSIX P1003_1B extentions by default; there's no reason I can see
not to class them with the SYSV extentions as "optional but damn useful".

Also desired by: wollman


56213 18-Jan-2000 msmith

Don't try to map memory resources into the kernel until they're actually
activated. Some of the things that get listed as "resources" aren't
necessarily suited for this.

(This shouldn't be a problem for any driver that correctly passes
RF_ACTIVE)


56024 15-Jan-2000 tanimura

A processor with the CPUID of 0x?8? is Pentium III.
(aka Coppermine)

Noticed by: Satoshi Sawada <k-sawata@gnoc2.comminet.or.jp>
Reviewd by: Takuma Yamada <fuzzy2@st.rim.or.jp>


55992 14-Jan-2000 wpaul

Add driver support for the Aironet 4500/4800 series wireless 802.11
NICs. (Finally!) The PCMCIA, ISA and PCI varieties are all supported,
though only the ISA and PCI ones will work on the alpha for now.
PCCARD, ISA and PCI attachments are all provided. Also provided an
ancontrol(8) utility for configuring the NIC, man pages, and updated
pccard.conf.sample. ISA cards are supported in both ISA PnP and hard-wired
mode, although you must configure the kernel explicitly to support the
hardwired mode since you have to know the I/O address and port ahead
of time.

Special thanks to Doug Ambrisko for doing the initial newbus hackery
and getting it to work in infrastructure mode.


55958 14-Jan-2000 peter

Add back the 'at ppbus?' for the lpt etc drivers. Now it's used.


55944 14-Jan-2000 wpaul

Add device driver support for USB ethernet adapters based on the CATC
USB-EL1202A chipset. Between this and the other two drivers, we should
have support for pretty much every USB ethernet adapter on the market.
The only other USB chip that I know of is the SMC USB97C196, and right
now I don't know of any adapters that use it (including the ones made
by SMC :/ ).

Note that the CATC chip supports a nifty feature: read and write combining.
This allows multiple ethernet packets to be transfered in a single USB
bulk in/out transaction. However I'm again having trouble with large
bulk in transfers like I did with the ADMtek chip, which leads me to
believe that our USB stack needs some work before we can really make
use of this feature. When/if things improve, I intend to revisit the
aue and cue drivers. For now, I've lost enough sanity points.


55891 13-Jan-2000 mdodd

Allow SMP systems with an MCA bus to work properly.

Reviewed by: peter


55884 13-Jan-2000 mdodd

Remove the 'at isa? ...' bits for ex0.

Remove the confusing text about pccard and unit numbers for ep0.


55832 12-Jan-2000 obrien

Sort.


55823 11-Jan-2000 yokota

Add a new mechanism, cndbctl(), to tell the console driver that
ddb is entered. Don't refer to `in_Debugger' to see if we
are in the debugger. (The variable used to be static in Debugger()
and wasn't updated if ddb is entered via traps and panic anyway.)

- Don't refer to `in_Debugger'.
- Add `db_active' to i386/i386/db_interface.d (as in
alpha/alpha/db_interface.c).
- Remove cnpollc() stub from ddb/db_input.c.
- Add the dbctl function to syscons, pcvt, and sio. (The function for
pcvt and sio is noop at the moment.)

Jointly developed by: bde and me

(The final version was tweaked by me and not reviewed by bde. Thus,
if there is any error in this commit, that is entirely of mine, not
his.)

Some changes were obtained from: NetBSD


55701 10-Jan-2000 imp

Uncomment pcic device and put pccard in GENERIC. PCCARD will be removed
in a little while as soon as I find all the places it is used in the
tree.


55672 09-Jan-2000 bde

Fixed style bugs related to the access functions for the bsfl and bsrl
i386 instructions.


55606 08-Jan-2000 peter

s/controller/device/ as per config(8) changes


55604 08-Jan-2000 bde

Compile genassym.c with ordinary ${CFLAGS}. The (small) needs for
${GEN_CFLAGS} and -U_KERNEL became negative when all all the
genassym.c's were converted to be cross-built.

Makefile.*:
- Cleanups associated with the old genassym.
- Fixed deprecated spelling of ${.IMPSRC} as "$<".


55590 08-Jan-2000 peter

Clean up the cfgmech/pci_mechanism debris. The reason for the existance
of this is no longer an issue as we have a replacement driver for the
one that needed it.

Reviewed by: dfr


55545 07-Jan-2000 marcel

Use genassym(1). The definitions of NKPDE and NKPT have been removed
because they are already defined in pmap.h, resulting in duplicate
definitions.

Reviewed by: bde


55540 07-Jan-2000 luoqi

Allow SMP && NCPU == 1 to work. From now on, there's no restriction on the
value of NCPU relative to the number of cpus physically present, the actual
number of cpus utilized will be the smaller of the two.


55429 05-Jan-2000 wpaul

Add device driver support for USB ethernet adapters based on the
Kawasaki LSI KL5KUSB101B chip, including the LinkSys USB10T, the
Entrega NET-USB-E45, the Peracom USB Ethernet Adapter, the 3Com
3c19250 and the ADS Technologies USB-10BT. This device is 10mbs
half-duplex only, so there's miibus or ifmedia support. This device
also requires firmware to be loaded into it, however KLSI allows
redistribution of the firmware images (I specifically asked about
this; they said it was ok).

Special thanks to Annelise Anderson for getting me in touch with
KLSI (eventually) and thanks to KLSI for providing the necessary
programming info.

Highlights:
- Add driver files to /sys/dev/usb
- update usbdevs and regenerate attendate files
- update usb_quirks.c
- Update HARDWARE.TXT and RELNOTES.TXT for i386 and alpha
- Update LINT, GENERIC and others for i386, alpha and pc98
- Add man page
- Add module
- Update sysinstall and userconfig.c


55420 04-Jan-2000 tegge

ISA device drivers use the ISA source interrupt number in locations where
the low level interrupt handler number should be used. Change
setup_apic_irq_mapping() to allocate low level interrupt handler X (Xintr${X})
for any ISA interrupt X mentioned in the MP table.

Remove an assumption in the driver for the system clock (clock.c) that
interrupts mentioned in the MP table as delivered to IOAPIC #0 intpin Y
is handled by low level interrupt handler Y (Xintr${Y}) but don't assume
that low level interrupt handler 0 (Xintr0) is used.

Don't allocate two low level interrupt handlers for the system clock.
Reviewed by: NOKUBI Hirotaka <hnokubi@yyy.or.jp>


55411 04-Jan-2000 mjacob

add wx0 driver


55312 02-Jan-2000 phk

Move the "sti" instruction to right before the "hlt" to close a tiny
race condition.

Obtained from: bde and/or obrien


55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


55162 28-Dec-1999 wpaul

This commit adds device driver support for the ADMtek AN986 Pegasus
USB ethernet chip. Adapters that use this chip include the LinkSys
USB100TX. There are a few others, but I'm not certain of their
availability in the U.S. I used an ADMtek eval board for development.
Note that while the ADMtek chip is a 100Mbps device, you can't really
get 100Mbps speeds over USB. Regardless, this driver uses miibus to
allow speed and duplex mode selection as well as autonegotiation.
Building and kldloading the driver as a module is also supported.

Note that in order to make this driver work, I had to make what some
may consider an ugly hack to sys/dev/usb/usbdi.c. The usbd_transfer()
function will use tsleep() for synchronous transfers that don't complete
right away. This is a problem since there are times when we need to
do sync transfers from an interrupt context (i.e. when reading registers
from the MAC via the control endpoint), where tsleep() us a no-no.
My hack allows the driver to have the code poll for transfer completion
subject to the xfer->timeout timeout rather that calling tsleep().
This hack is controlled by a quirk entry and is only enabled for the
ADMtek device.

Now, I'm sure there are a few of you out there ready to jump on me
and suggest some other approach that doesn't involve a busy wait. The
only solution that might work is to handle the interrupts in a kernel
thread, where you may have something resembling a process context that
makes it okay to tsleep(). This is lovely, except we don't have any
mechanism like that now, and I'm not about to implement such a thing
myself since it's beyond the scope of driver development. (Translation:
I'll be damned if I know how to do it.) If FreeBSD ever aquires such
a mechanism, I'll be glad to revisit the driver to take advantage of
it. In the meantime, I settled for what I perceived to be the solution
that involved the least amount of code changes. In general, the hit
is pretty light.

Also note that my only USB test box has a UHCI controller: I haven't
I don't have a machine with an OHCI controller available.

Highlights:

- Updated usb_quirks.* to add UQ_NO_TSLEEP quirk for ADMtek part.
- Updated usbdevs and regenerated generated files
- Updated HARDWARE.TXT and RELNOTES.TXT files
- Updated sysinstall/device.c and userconfig.c
- Updated kernel configs -- device aue0 is commented out by default
- Updated /sys/conf/files
- Added new kld module directory


55117 26-Dec-1999 bde

Don't include <isa/isavar.h> or compile code depending on it when isa
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.


55111 26-Dec-1999 bde

Replaced the INTRMASK and INTRUNMASK macros by "|" and "&~" operations.
Some interface botches went away, leaving the macros unused outside of
the implementation of interrupt masking, and it was silly for the
implementation to use the macros in only one place each.


55110 26-Dec-1999 bde

Fixed breakage of read-only opening of /dev/*mem at securelevel > 0 in
previous pair of commits.

Spell the "securelevel > 0" check consistently.

Use the proc arg instead of curproc in mmopen() and mmclose().


55098 25-Dec-1999 bde

Fixed races accessing the RTC. The races apparently caused
apm_default_resume() to sometimes set a very wrong time.
(1) Accesses to the RTC index and data registers were not atomic enough.
Interrupts were not masked. This was only good enough until an
interrupt handler (rtcintr()) started accessing the RTC in FreeBSD-2.0.
(2) Access to the block of time registers in inittodr() was not atomic
enough. inittodr() has 244us to read the time registers. Interrupts
were not masked. This was only good enough until something (apm)
started calling inittodr() after boot time in FreeBSD-2.0.
The fix for (2) also makes the timecounter update more atomic, although
this is currently unimportant due to the low resolution of the RTC.

Problem reported by: mckay


55018 23-Dec-1999 wpaul

Fix minor typo in comments about WaveLAN/IEEE driver: 802.1 -> 802.11


54992 22-Dec-1999 obrien

Turn on the `sym' driver by default. It lives well beside the `ncr' driver
now. On one machine with <825a> and <875> controllers, `sym' correctly
attached. On another one with only a <ncr 53c810 fast10 scsi>, the `ncr'
driver correctly attached.


54967 21-Dec-1999 eivind

Use the correct return value for MCA NMIs.

Reviewed by: mdodd


54952 21-Dec-1999 eivind

Change incorrect NULLs to 0s


54890 20-Dec-1999 peter

Remove references to register_intr() etc in comments.


54889 20-Dec-1999 peter

Zap the old isa_device specific register_intr() and unregister_intr()
emulations. Thankfully, nothing is left in the tree that uses them.


54838 19-Dec-1999 billf

Borrow phk's axe and chop off the old soundcard-CDROM devices. We get
about 40k of savings from this, and these abominations are still in LINT
if anyone needs to use them.

Reviewed by: jkh


54830 19-Dec-1999 markm

Comment and order to reduce diffs. No functional change.


54777 18-Dec-1999 imp

spell isa right on sn0 line


54773 18-Dec-1999 imp

Driver for the smc91xx series of ethernet chips. Ported from PAO to
3.3R and then to -current. The pccard support has been left in the
driver, but is presently non-functional because we are using the
isa_compat layer for the moment.

Obtained From: PAO
Sponsored by: Timing Solutions


54425 11-Dec-1999 peter

Reclaim UPAGES_HOLE (8k) that was chopped out of process address space.
The UPAGES have not been there since Jan '96, but the hole was preserved
for BSD/OS binary compatability. This has been fixed other ways (%ebx
now has a pointer to PS_STRINGS), and the stack is nowhere near where
it used to be so this hack isn't required anymore.


54391 10-Dec-1999 phk

Remove the if_ze and if_zp drivers.

These drivers were cloned from the ed and ep drivers back in 1994
when PCMCIA cards were a very new thing and we had no other support
for such devices. They treated the PCIC (the chip which controls the
PCCARD slot) as part of their device and generally hacked their way
to success. They have significantly bit-rotted relative to their
ancestor drivers (ed & ep) and they were a dead-end on the evolution
path to proper PCCARD support in FreeBSD.

They have been terminally broken since August 18 where mdodd forgot
them and nobody seems to have missed them enough to fix them since.

I found no outstanding PRs against these drivers.


54293 08-Dec-1999 sos

Finally use the new ata driver.


54208 06-Dec-1999 peter

Fold the pnp code into the base isa system to pave the way for PNPBIOS.

Reviewed by: dfr (a few weeks ago)


54192 06-Dec-1999 luoqi

Need header <machine/smp.h> for prototype declaration of smp_rendezvous()
in my previous commit.


54188 06-Dec-1999 luoqi

User ldt sharing.


54150 05-Dec-1999 dfr

Don't use a bogus bus number for Ross host-pci bridges.

PR: kern/15278
Submitted by: Ahmed Benani <ahmed_benani@urbanet.ch>


54141 05-Dec-1999 luoqi

Reinstate the aic driver.

PR: conf/15187


54134 04-Dec-1999 wpaul

Add the if_dc driver and remove all of the al, ax, dm, pn and mx drivers
which it replaces. The new driver supports all of the chips supported
by the ones it replaces, as well as many DEC/Intel 21143 10/100 cards.

This also completes my quest to convert things to miibus and add
Alpha support.


54128 04-Dec-1999 kato

The address 0x472 is used for the SCSI HDD geometry information on
PC-98. Therefore, the PC-98 kernel should not modify it.


54121 04-Dec-1999 marcel

oszsigcode -> szosigcode

Pointed out by: bde


54120 04-Dec-1999 marcel

Fix type of sf_addr.

Pointed out by: bde


54073 03-Dec-1999 mdodd

Remove the 'ivars' arguement to device_add_child() and
device_add_child_ordered(). 'ivars' may now be set using the
device_set_ivars() function.

This makes it easier for us to change how arbitrary data structures are
associated with a device_t. Eventually we won't be modifying device_t
to add additional pointers for ivars, softc data etc.

Despite my best efforts I've probably forgotten something so let me know
if this breaks anything. I've been running with this change for months
and its been quite involved actually isolating all the changes from
the rest of the local changes in my tree.

Reviewed by: peter, dfr


54046 03-Dec-1999 msmith

Remove the 'gzip' image activator. We're not using a.out anymore, so save
ourselves just over 8k.


54017 02-Dec-1999 jlemon

Remove code to select APM version with flags to the apm0 device. This
code has been disabled for the last 4 months.

Prodded into action by: n_hibma


54013 02-Dec-1999 msmith

Add the AMI MegaRAID and Mylex DAC960 drivers. Installation to arrays
on these controllers is now no different to the process for any other
supported disk controller.


53888 29-Nov-1999 dillon

Make BOOTP work again.

Submitted by: Doug Ambrisko <ambrisko@whistle.com>


53805 28-Nov-1999 obrien

Sort PCI SCSI controlers.


53804 28-Nov-1999 obrien

/sys adjustments to add the `sym' controler driver.

This is commented out in GENERIC as you cannot mix `sym' with `ncr' right now.
Note that LINT is no more broken by this commit.


53789 27-Nov-1999 obrien

Add a commented out 'ATA' driver config block to help assist -CURRENT
people to migrate to this driver since it will be the default IDE/ATA/ATAPI
driver in 4.0-R.


53745 27-Nov-1999 bde

Moved scheduling-related code to kern_synch.c so that it is easier to fix
and extend. The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).

Agreed with in principle by: dufault


53722 26-Nov-1999 phk

Retire MFS_ROOT and MFS_ROOT_SIZE options from the MFS implementation.

Add MD_ROOT and MD_ROOT_SIZE options to the md driver.

Make the md driver handle MFS_ROOT and MFS_ROOT_SIZE options for compatibility.

Add md driver to GENERIC, PCCARD and LINT.

This is a cleanup which removes the need for some of the worse hacks in
MFS: We really want to have a rootvnode but MFS on a preloaded image
doesn't really have one. md is a true device, so it is less trouble.

This has been tested with make release, and if people remember to add
the "md" pseudo-device to their kernels, PicoBSD should be just fine
as well. If people have no other use for MFS, it can be removed from
the kernel.


53706 26-Nov-1999 julian

Fix out-of-date comment


53648 24-Nov-1999 archie

Change the prototype of the strto* routines to make the second
parameter a char ** instead of a const char **. This make these
kernel routines consistent with the corresponding libc userland
routines.

Which is actually 'correct' is debatable, but consistency and
following the spec was deemed more important in this case.

Reviewed by (in concept): phk, bde


53624 23-Nov-1999 green

Fix a confusion between osigcontext and ucontext_t in the previous commit.
Since an osigcontext is smaller, if you check for a valid (much larger sized)
ucontext_t and it fails, we bogusly would reject the osigcontext as per
rev 1.378. Instead, check for osigcontext range validity first, and
ucontext_t later. This unbreaks Netscape.

Pointed to the right commit by: peter


53580 22-Nov-1999 shin

move INET6 option from GENERIC to LINT.

Thanks for Brian Fundakowski Feldman and Dag-Erling Smorgrav,
to give me the comment and the patch.

Submitted by:Dag-Erling Smorgrav


53541 22-Nov-1999 shin

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


53504 21-Nov-1999 pho

Moved useracc() to top of sigreturn as to avoid panic
caused by invalid arguments to rutine.

Reviewed by: marcel, phk


53503 21-Nov-1999 phk

s/p_cred->pc_ucred/p_ucred/g


53433 19-Nov-1999 phk

Use LIST_FOREACH to traverse the allproc list.

Submitted by: Jake Burkholder jake@checker.org


53425 19-Nov-1999 dillon

Optimize two cases in the MP locking code. First, it is not necessary
to use a locked cmpexg when unlocking a lock that we already hold, since
nobody else can touch the lock while we hold it. Second, it is not
necessary to use a locked cmpexg when locking a lock that we already
hold, for the same reason. These changes will allow MP locks to be used
recursively without impacting performance.

Modify two procedures that are called only by assembly and are already
NOPROF entries to pass a critical argument in %edx instead of on the
stack, removing a significant amount of code from the critical path
as a consequence.

Reviewed by: Alfred Perlstein <bright@wintelcom.net>, Peter Wemm <peter@netplex.com.au>


53363 18-Nov-1999 peter

If we have found pci devices via pci_cfgopen(), but don't find a
host->pci bridge specifically, then add a pcib0 device on the motherboard
for the pci bus to hang off.

Requested by: Anders Andersson <anders@sanyusan.se>
Obtained from: dfr


53189 15-Nov-1999 luoqi

Segment registers can be read(write) to(from) memory locations as well as
general registers.


53139 14-Nov-1999 obrien

Fix clobbers so that GENERIC may compile with GCC 2.95.2.

Historically, the documentation of extended asm was lacking, namely you
should NOT specify the same register as an input, and a clobber.
If the register is clobbered, it should be specified as an output as well,
e.g., by linking input and output through the "number" notation.
(Beware of lvalues, some local variables needed...)

URL:http://egcs.cygnus.com/faq.html

In versions up to egcs-1.1.1, the compiler did not even warn about it,
but it was liable to output bad code. Newer egcs are pickier and simply
refuse to swallow such code.

Note, since *addr changes, it needs to be an output operand.
We might be excessive in saying that all memory has changed.

Obtained from: OpenBSD
w/extra thanks to Marc Espie <Marc.Espie@liafa.jussieu.fr>


53108 12-Nov-1999 marcel

Reserve space for FPU state in struct sigcontext. Fix some style bugs
and comments while there.

Submitted by: bde


53106 12-Nov-1999 marcel

Change the type of sf_addr in struct {o}sigframe from char* to
register_t.

Fix some style bugs and bitrotted comments.

Submitted by: bde


53045 09-Nov-1999 alc

Passing "0" or "FALSE" as the fourth argument to vm_fault is wrong. It
should be "VM_FAULT_NORMAL".


52968 07-Nov-1999 phk

Patch got this one wrong, we want to check securelevel in open()


52967 07-Nov-1999 phk

Remove the iskmemdev() function. Make it the responsibility of the mem.c
drivers to enforce the securelevel checks.


52805 02-Nov-1999 jhb

Remove the prototypes for two functions that were removed when the
CD9660_ROOT option was axed.


52778 01-Nov-1999 msmith

This is a complete rewrite of vfs_conf.c, which changes the way the root
filesystem is discovered. Preference is given to using the kernel
environment variable vfs.root.mountfrom, which is set by the loader
according to the contents of /etc/fstab. Changes in the MD code
provide fallback mechanisms for systems not using the loader.

A more robust fallback path is also provided, with the last recourse
being to prompt on the console for a root device.

These changes drastically simplify the machine-dependant parts of
the root configuration process. In addition, support for CDROM root
devices has been removed; it was a nasty hack and didn't work.


52730 01-Nov-1999 peter

Update examples using 'disk' and 'tape' - they used to have magic meaning
to config(8) for static device tables that have not existed for quite
some time. They have been aliases for 'device' for a while, and "tape"
went away entirely as it wasn't used anywhere (except in an example
in LINT.. "fixed").


52720 31-Oct-1999 alc

The useracc() calls in osigreturn() and sigreturn() should specify
VM_PROT_READ rather than VM_PROT_WRITE. (This mistake predates
the B_READ/B_WRITE -> VM_PROT_READ/VM_PROT_WRITE change.)

Submitted by: bde


52669 30-Oct-1999 iwasaki

i8254_restore is called from apm_default_resume() to reload
the countdown register.
this should not be necessary but there are broken laptops that
do not restore the countdown register on resume.
when it happnes, it messes up the hardclock interval and system clock,
which leads to the infamous "calcru: negative time" problem.

Submitted by: kjc, iwasaki
Reviewed by: Steve O'Hara-Smith <steveo@eircom.net> and committers.
Obtained from: PAO3


52647 30-Oct-1999 alc

The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to
eliminate an extra (useless) level of indirection in half of the page
queue accesses and (2) to use a single name for each queue throughout,
instead of, e.g., "vm_page_queue_active" in some places and
"vm_page_queues[PQ_ACTIVE]" in others.

Reviewed by: dillon


52644 30-Oct-1999 phk

Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the
"rw" argument, rather than hijacking B_{READ|WRITE}.

Fix two bugs (physio & cam) resulting by the confusion caused by this.

Submitted by: Tor.Egge@fast.no
Reviewed by: alc, ken (partly)


52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


52625 29-Oct-1999 phk

Remove #ifdef notyet code for doing I/O in a way we never will do it.


52550 27-Oct-1999 mdodd

Modify the entries regarding the 'ep' driver to take into account
my recent changes to that driver.


52480 25-Oct-1999 alc

Add text for the AMD-751 host-to-PCI and PCI-to-PCI (AGP) bridges.


52470 25-Oct-1999 imp

Massive rewrite of pccard to convert it to newbus.
o Gut the compatibility interface, you now must attach with newbus.
o Unit numbers from pccardd are now ignored. This may change the units
assigned to a card. It now uses the first available unit.
o kill old skeleton code that is now obsolete.
o Use newbus attachment code.
o cleanup interfile dependencies some.
o kill list of devices per slot. we use the device tree for what we need.
o Remove now obsolete code.
o The ep driver (and maybe ed) may need some config file tweaks to
allow it to attach. See config files that were committed for examples
on how to do this.

Drivers to be commited shortly.

This is an interrum fix until the new pccard. ed, ep and sio will be
supported by me with this release, although others are welcome to try
to support other devices before new pccard is working.

I plan on doing minimal further work on this code base. Be careful
when upgrading, since this code is known to work on my laptop and
those of a couple others as well, but your milage may vary.

BUGS TO BE FIXED:

o system memory isn't allocated yet, it will be soon.
o No devices actually have a pccard newbus attach in the tree.

BUGS THAT MIGHT BE FIXED:

o card removal, including suspend, usually hangs the system.

Many thanks to Peter Wemm and Doug Rabson for helping me to fill in
the missing bits of New Bus understanding at FreeBSD Con '99.


52469 24-Oct-1999 alc

Add text for the Athlon's MMX and 3DNow! (DSP) instruction extensions
to print_AMD_features.


52452 24-Oct-1999 dillon

Adjust the buffer cache to better handle small-memory machines. A
slightly older version of this code was tested by BDE and I.

Also fixes a lockup situation when kva gets too fragmented.

Remove the maxvmiobufspace variable and sysctl, they are no longer
used. Also cleanup (remove) #if 0 sections from prior commits.

This code is more of a hack, but presumably the whole buffer cache
implementation is going to be rewritten in the next year so it's no
big deal.


52436 22-Oct-1999 n_hibma

From: src/sys/i386/conf/PCCARD

revision 1.21
date: 1999/10/15 17:29:20; author: imp; state: Exp; lines: +3 -3
Reorganize the attachement point for pcic (it was unattached and
floating before). Attach pccard devices to pcic, one per slot
(although this may change to one per pcic). pcic is now attached to
isa (to act as a bridge) and pccard is attached to pcic, cbb and
pc98ic (the last two are card bus bridge and the pc98ic version of
pcic, neither of which are in the tree yet). Move pccard compat code
into pccard/pccard_compat.c.

THIS REQUIRES A CONFIG FILE CHANGE. You must change your pcic/card
entries to be:
# PCCARD (PCMCIA) support
controller pcic0 at isa?
controller pcic1 at isa?
controller card0

The old system was upside down and this corrects that problem. It
will make it easier to add support for YENTA pccard/card bus bridges.

Much more cleanup needs to happen before newbus devices can have
pccard attachments. My previous commit's comments were premature.

Forgotten by: imp


52397 19-Oct-1999 peter

Remove pccard attachment stub, this caused pccard unit 0 to be allocated
and unusable by the pccard system since pccard doesn't attach to the
nexus any more. This was stopping my 3c589D from working as pccard unit
0 is used directly for resource allocation and this fails when unit 0
isn't actually attached to anything.


52282 16-Oct-1999 wpaul

Convert the mx driver to miibus.

In order to make this work, I created a pseudo-PHY driver to deal with
Macronix chips that use the built-in NWAY support and symbol mode port.
This is actually all of them, with the exception of the original MX98713
which presents its NWAY support via the MII serial interface.

The mxphy driver actually manipulates the controller registers directly
rather than using the miibus_readreg()/miibus_writereg() bus interface
since there are no MII registers to read. The mx driver itself pretends
that the NWAY interface is a PHY locayed at MII address 31 for the sole
purpose of allowing the mxphy_probe() routine to know when it needs to
attach to a host controller.


52271 15-Oct-1999 tegge

Eliminate remaining part of incorrect PCI bus numbering sanity check on systems with more than one PCI bus.


52250 15-Oct-1999 obrien

Acutally our style is "options\x20\x09".

As BDE says: "options\x09\x09foo" looks quite different from
"options\x20\x09foo" after adding a one or two character prefix.

Notice by: BDE


52243 14-Oct-1999 dfr

* Implement bus_set/get/delete_resource for pci.
* Change the hack used on the alpha for mapping devices into DENSE or
BWX memory spaces to a simpler one. Its still a hack and should be
a seperate api to explicitly map the resource.
* Add $FreeBSD$ as necessary.


52241 14-Oct-1999 dfr

* Add some verbose logging to the PnP parser and fix a couple of bugs.
* Move pnp_eisaformat() to pnp.c, declared in <isa/pnpvar.h>.
* Turn the pnpbios code into an enumerator for the isa bus. This allows
all devices known to the bios to be probed automatically.

Currently the pnpbios code is dependant on the PNPBIOS option. As the code
is tested more and when more drivers are converted this will be made the
default. I have PnP changes in the wings for fdc, atkbd, psm, pcaudio, and
joy. Sio already works with pnpbios.


52237 14-Oct-1999 kato

Recognize Pentium II w/ CPUID = 0x6XX and Pentium III Xeon w/ CPUID =
0x7XX.

Pointed out by: Brian Somers <brian@Awfulhak.org>


52235 14-Oct-1999 obrien

Like it or not, we use ^I's not 0x20 to align things in this file.


52199 13-Oct-1999 marcel

Fix a security bug. eflags was copied verbatim from userland.

Submitted by: bde


52177 12-Oct-1999 green

Enable MTRR support for K7 (Athlon) processors, which happens to have the
same interface as Intel's P6 family has. Incidentally, I had disabled
it in the first place since I knew the K7s were coming out soon but
did not want to assume they'd have the same MTRR interface as Intel's
chips.

Submitted by: Ville-Pertti Keinonen <will@iki.fi>


52174 12-Oct-1999 dfr

* Add struct resource_list* argument to resource_list_alloc and
resource_list_release. This removes the dependancy on the
layout of ivars.

* Move set_resource, get_resource and delete_resource from
isa_if.m to bus_if.m.

* Simplify driver code by providing wrappers to those methods:

bus_set_resource(dev, type, rid, start, count);
bus_get_resource(dev, type, rid, startp, countp);
bus_get_resource_start(dev, type, rid);
bus_get_resource_count(dev, type, rid);
bus_delete_resource(dev, type, rid);

* Delete isa_get_rsrc and use bus_get_resource_start instead.

* Fix a stupid typo in isa_alloc_resource reported by Takahashi
Yoshihiro <nyan@FreeBSD.org>.

* Print a diagnostic message if we can't assign resources to a PnP
device.

* Change device_print_prettyname() so that it doesn't print
"(no driver assigned)-1" for anonymous devices.


52150 12-Oct-1999 marcel

Now that userland, including modules don't use the osig* syscalls
and the kernel itself doesn't use any SYS_osig* constants, change
the syscalls to be of type COMPAT.


52140 11-Oct-1999 luoqi

Add a per-signal flag to mark handlers registered with osigaction, so we
can provide the correct context to each signal handler.

Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).

Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.

Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.

Reviewed by: marcel, jdp, bde


52123 11-Oct-1999 peter

Trim some unused #includes

Submitted by: phk


52121 11-Oct-1999 peter

Zap unneeded #includes

Submitted by: phk


51984 07-Oct-1999 marcel

Simplification of the signal trampoline and other cleanups.

o Remove unused defines from genassym.c that were needed
by the trampoline.
o Add load_gs_param function to support.s that catches
a fault when %gs is loaded with an invalid descriptor.
The function returns EFAULT in that case.
o Remove struct trapframe from mcontext_t and replace it
with the list of registers.
o Modify sendsig and sigreturn accordingly.

This commit contains a patch by bde.

Reviewed by: luoqi, jdp


51942 04-Oct-1999 marcel

Re-introduction of sigcontext.

struct sigcontext and ucontext_t/mcontext_t are defined in such
a way that both (ie struct sigcontext and ucontext_t) can be
passed on to sigreturn. The signal handler is still given a
ucontext_t for maximum flexibility.

For backward compatibility sigreturn restores the state for the
alternate signal stack from sigcontext.sc_onstack and not from
ucontext_t.uc_stack. A good way to determine which value the
application has set and thus which value to use, is still open
for discussion.

NOTE: This change should only affect those binaries that use
sigcontext and/or ucontext_t. In the source tree itself
this is only doscmd. Recompilation is required for those
applications.

This commit also fixes a lot of style bugs without hopefully
adding new ones.

NOTE: struct sigaltstack.ss_size now has type size_t again. For
some reason I changed that into unsigned int.

Parts submitted by: bde
sigaltstack bug found by: bde


51938 04-Oct-1999 peter

Use the rev 1.1.2.1 code from RELENG_3 for atomic operations rather
than the non-atomic C macros.


51937 04-Oct-1999 peter

Typo: s/__GNUC_MINOR_/__GNUC_MINOR__/
(__GNUC_MINOR__ on egcs in -current is "91" and is going to be "95" soon)


51931 04-Oct-1999 marcel

Fix style bug: order includes

Submitted by: bde


51917 03-Oct-1999 eivind

Allow compilation with older versions of GCC, in order to make it possible
to bootstrap and work with -current from older versions of FreeBSD.


51908 03-Oct-1999 marcel

Reinstate the 4th argument to old signal handlers. Don't set it
when the handler uses siginfo_t.


51907 03-Oct-1999 marcel

Fix style bugs caused by using the wrong file to copy from. That one
gets fixed later on.

Reinstate the mysterious 4th argument to signal handlers and add some
comments on that.


51834 01-Oct-1999 marcel

Implement the use of si_addr in siginfo_t.

Suggested by: jdp


51833 01-Oct-1999 marcel

Don't check %cs *after* it has being set in sigreturn. If the check
fails, applications could end up running in kernel mode (oops).

Submitted by: bde


51792 29-Sep-1999 marcel

sigset_t change (part 3 of 5)
-----------------------------

By introducing a new sigframe so that the signal handler operates
on the new siginfo_t and on ucontext_t instead of sigcontext, we
now need two version of sendsig and sigreturn.

A flag in struct proc determines whether the process expects an
old sigframe or a new sigframe. The signal trampoline handles
which sigreturn to call. It does this by testing for a magic
cookie in the frame.

The alpha uses osigreturn to implement longjmp. This means that
osigreturn is not only used for compatibility with existing
binaries. To handle the new sigset_t, setjmp saves it in
sc_reserved (see NOTE).

the struct sigframe has been moved from frame.h to sigframe.h
to handle the complex header dependencies that was caused by
the new sigframe.

NOTE: For the i386, the size of jmp_buf has been increased to hold
the new sigset_t. On the alpha this has been prevented by
using sc_reserved in sigcontext.


51661 25-Sep-1999 mjacob

Fix from Tor so that if we enter the debugger in the tristate going to
SMP (other CPUs stopped but SMP mode not really started).

Obtained from:Tor.Egge@fast.no


51659 25-Sep-1999 mjacob

Fix from Tor so that if we enter the debugger in the tristate going to
SMP (other CPUs stopped but SMP mode not really started).

Obtained from: Tor.Egge@fast.no


51658 25-Sep-1999 phk

Remove five now unused fields from struct cdevsw. They should never
have been there in the first place. A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags


51561 22-Sep-1999 luoqi

Display CPU (BSP) clock speed on SMP systems.


51530 22-Sep-1999 wpaul

Spruce up the ADMtek driver: conver to newbus, miibus and add support
for the AN985 "Centaur" chip, which is apparently the next genetation
of the "Comet." The AN985 is also a tulip clone and is similar to the
AL981 except that it uses a 99C66 EEPROM and a serial MII interface
(instead of direct access to the PHY registers).

Also updated various documentation to mention the AN985 and created
a loadable module.

I don't think there are any cards that use this chip on the market yet:
the datasheet I got from ADMtek has boxes with big X's in them where the
diagrams should be, and the sample boards I got have chips without any
artwork on them.


51498 21-Sep-1999 phk

Print out flags value


51474 20-Sep-1999 dillon

Fix bug in pipe code relating to writes of mmap'd but illegal address
spaces which cross a segment boundry in the page table. pmap_kextract()
is not designed for access to the user space portion of the page
table and cannot handle the null-page-directory-entry case.

The fix is to have vm_fault_quick() return a success or failure which
is then used to avoid calling pmap_kextract().


51432 19-Sep-1999 wpaul

Convert the VIA Rhine driver to miibus.


51211 12-Sep-1999 green

Correction: mem.c devices are "D_MEM" (and D_MEM is added.)

Taken issue with by: phk


51207 12-Sep-1999 green

Mainly stylistic fixes:
1. return( -> return (
2. inappropriate ENODEV -> ENOTTY
3. some unreachable cases removed


51206 12-Sep-1999 green

Make the d_flags of mem devices D_DISK to signify that they are disk-like
random-seekable devices. This lets dd(1) know it can seek on them. It
also affects spec_vnopen() (IIRC), but only makes the path of execution smaller,
and does not change its behavior. This is when securelevel >= 2.


51193 12-Sep-1999 msmith

Some PnP BIOSsen return garbage in the high byte of the number-of-devices
field (or don't set the high byte at all). Clear it to avoid reporting
a silly number of devices.

Reported by: phk


51183 11-Sep-1999 peter

Make pmap_mapdev() deal with non-page-aligned requests.
Add a corresponding pmap_unmapdev() to release the KVM back to kernel_map.


51165 11-Sep-1999 gibbs

Add the AMD driver.


51130 10-Sep-1999 phk

System clock don't update, because C6's TSC stop count up when run
HALT instruction.

PR: 13683
Submitted by: IMAI Takeshi <take-i@ceres.dti.ne.jp>
Reviewed by: phk


51127 10-Sep-1999 peter

Add the CR4 values for P3 SIMD enabling support. FXSR tells the cpu that
the OS does FXSAVE/FXRESTOR instructions (fast FPU save/restore) during
context switching and also enables SIMD since this enables saving the
extra CPU context that isn't saved with normal FPU regs. The other
enables the SIMD instructions to use exception 16 (FPU) error reporting.
Note, this doesn't turn on SIMD, just defines the bits.


51126 10-Sep-1999 peter

Add text for the PN (Processor serial number) and XMM (extended SIMD/MMX2/
support), as well as a bunch of comments for what the various bits mean
(those that I remember anyway).


51121 10-Sep-1999 msmith

Look for the right ACPI signature.

Submitted by: dfr


51114 10-Sep-1999 msmith

Invoke smp_rendezvous_action() using the a.out compatible asnames.h
technique (bleagh).


51109 09-Sep-1999 peter

Seperate the miibus pci ethernet drivers from the non-miibus drivers so
it's a little clearer which is which from just looking at GENERIC.


51065 07-Sep-1999 luoqi

Save %gs in sigcontext when delivering a signal and restore them upon
return (in signal trampoline code). I plan to do the same on -stable,
so that we have a consistent interface to userland applications.

Reviewed by: bde


50992 06-Sep-1999 imp

Add pccard child to nexus. A better version would take care of this
with an identify method, but that has not been implemented.

Forgotten by: imp


50986 06-Sep-1999 wpaul

This commit adds driver support for PCI fast ethernet NICs based on
the Davicom DM9100 and DM9102 chipsets, including the Jaton Corporation
XPressNet. Datasheet is available from www.davicom8.com.

The DM910x chips are still more tulip clones. The API is reproduced
pretty faithfully, unfortunately the performance is pretty bad. The
transmitter seems to have a lot of problems DMAing multi-fragment
packets. The only way to make it work reliably is to coalesce transmitted
packets into a single contiguous buffer. The Linux driver (written by
Davicom) actually does something similar to this. I can't recomment this
NIC as anything more than a "connectivity solution."

This driver uses newbus and miibus and is supported on both i386
and alpha platforms.


50974 05-Sep-1999 wpaul

This commit adds driver support for the Silicon Integrated Systems
SiS 900 and SiS 7016 PCI fast ethernet chipsets. Full manuals for the
SiS chips can be found at www.sis.com.tw.

This is a fairly simple chipset. The receiver uses a 128-bit multicast
hash table and single perfect entry for the station address. Transmit and
receive DMA and FIFO thresholds are easily tuneable. Documentation is
pretty decent and performance is not bad, even on my crufty 486. This
driver uses newbus and miibus and is supported on both the i386 and
alpha architectures.


50972 05-Sep-1999 peter

Set up FPU state on the AP.

Tested by: phk


50823 03-Sep-1999 mdodd

This adds the i386 specific support for systems with a MicroChannel
Architecture bus.

Reviewed by: msmith


50816 02-Sep-1999 luoqi

Some reorganization of sysarch() interface:
1. Move definitions of struct i386_*_args to the header file sysarch.h,
since they are part of the sysarch API. struct i386_get_ldt_args and
i386_set_ldt_args were identical, therefore make them into one
struct i386_ldt_args. Libc should use these definitions as well.
2. Return a more sensible EOPNOTSUPP for unknown operations.

Reviewed by: marcel


50788 02-Sep-1999 peter

Update for new pnp includes


50769 01-Sep-1999 dfr

This represents essentially a complete rewrite of the ISA PnP code. The
new system is integrated with the ISA bus code more cleanly and allows
the future addition of more enumerators such as PnPBIOS and ACPI.

This commit also enables the new pcm driver since it is somewhat tied to
the new PnP code.


50736 01-Sep-1999 jkh

Try and commit the tun comment fix again; I have no idea why there
was a clash the last time, leading me to think that it had already
been fixed.


50732 01-Sep-1999 peter

Eliminate some magic numbers.


50719 01-Sep-1999 brian

ppp(1) -> ppp(8)


50677 31-Aug-1999 msmith

Make the error return from mem_range_attr_get actually do something useful
(return an error to the caller)


50674 30-Aug-1999 msmith

Check that there is memory range support before attempting to perform such
an operation, as a kernel client may not have previously checked the CPU
type (it may not be able to).

Also correct the function declaration style for the mem_range functions to
match the rest of this file (oops).

Submitted by: gibbs


50511 28-Aug-1999 phk

We don't need to pass the diskname argument all over the diskslice/label
code, we can find the name from any convenient dev_t


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50463 27-Aug-1999 jlemon

Reference the correct gdt[] entry on SMP. Remove the `generation' flag,
and always reload the selectors for every bios call.


50379 25-Aug-1999 peter

Use .p2align to ensure consistant a.out/elf alignment. I'd have used
SUPERALIGN_TEXT, but this is inline assembler and after cpp has run.
Inspired by bde's comments on linux_locore.s.


50337 25-Aug-1999 msmith

Rename 'bios_jmp' to 'bios16_jmp' to make it clear what it's related to.


50336 25-Aug-1999 peter

Use the far jump for the base of the page arithmatic rather than the
calling function, otherwise Bad Things Happen(tm) when bios16_call is
not in the same page as bios_jmp.

Reviewed by: msmith


50313 24-Aug-1999 msmith

Work around a bad design in some PnP BIOS code whereby the BIOS can reach
off the top of our constructed stack segment while it's trying to copy a
maximally-sized PnP argument frame around.


50303 24-Aug-1999 alc

Cosmetic: Correct the Id string.

Submitted by: Peter Jeremy <jeremyp@gsmx07.alcatel.com.au>


50271 24-Aug-1999 bde

Fixed a misplaced cast to uintptr_t. Cosmetic.

Use device_get_nameunit() instead of rolling our own.


50268 23-Aug-1999 bde

`bootdev' is an ordinary u_long, so don't cast it to a pointer to print it.
gcc warns about the cast on i386's with 64-bit longs.

Print `bootdev' in all cases when we bail out because it is unreasonable.


50257 23-Aug-1999 phk

Now that we can bind cdevsw to the individual dev_t, divorce the PERFMON
stuff from mem.c. If PERFMON is there, it will "steal" a minor from
mem.c, but mem.c doesn't need to know about this.

Fixed type of cmd argument in perfmon_ioctl().


50254 23-Aug-1999 phk

Convert DEVFS hooks in (most) drivers to make_dev().

Diskslice/label code not yet handled.

Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)

Add the correct hook for devfs to kern_conf.c

The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.

A few drivers had minor additional cleanups performed relating to cdevsw
registration.

A few drivers don't register a cdevsw{} anymore, but only use make_dev().


50252 23-Aug-1999 peter

The nexus_attach() code works a lot better if it's actually connected to
the device methods... Also, don't fail to add eisa/isa because a previous
device failed to attach.


50251 23-Aug-1999 alc

Modify the macros IMASK_UNLOCK, CPL_UNLOCK, and REL_FAST_INTR_LOCK
to perform the s_unlock inline.


50197 22-Aug-1999 peter

The previous fix didn't do anything if you didn't have pnp. The ICU
macros are only called in the !APIC_IO case, include icu.h there.


50196 22-Aug-1999 green

Finish unbreaking autoconf.c includes (for non-SMP.)


50185 22-Aug-1999 peter

Oops, that wasn't so clever after all. struct isa_device is still a
prerequisite for this old pnp.h.


50184 22-Aug-1999 peter

Zap a heap of unused cruft now. We don't need the ISA/EISA/PCI hooks
here any more as they are self identifying. Only PNP remains but that
will be replaced any day now.
Also reword a comment that had been XXX'ed to death to make it clear[er]
why we don't enable interrupts before probing.
PCIBIOS interrupt routing controls may make this possible to fix one day.


50183 22-Aug-1999 peter

Take advantage of the apm/npx code and let them identify themselves rather
than having explicit hooks here.
Treat the eisa/isa attach a little differently so that we defer the
decision about to attach eisa/isa to the motherboard directly only if
the PCI probe (if it exists) fails to turn up a PCI->EISA/ISA bridge.
This restores the original device geometry where ISA and/or EISA attach
to their bridge rather than bypassing and going to the root.


50182 22-Aug-1999 peter

Make the identify routine add itself with priority 100 to make sure it
goes after the npx/apm devices and any other motherboard devices that
may get added down the track.


50181 22-Aug-1999 peter

Add an identify method to allow npx to arrange itself to be attached to
the nexus without explicit code in the nexus to do so.


50128 21-Aug-1999 wpaul

This commit adds device driver support for the Sundance Technologies ST201
PCI fast ethernet controller. Currently, the only card I know that uses
this chip is the D-Link DFE-550TX. (Don't ask me where to buy these: the
only cards I have are samples sent to me by D-Link.)

This driver is the first to make use of the miibus code once I'm sure
it all works together nicely, I'll start converting the other drivers.

The Sundance chip is a clone of the 3Com 3c90x Etherlink XL design
only with its own register layout. Support is provided for ifmedia,
hardware multicast filtering, bridging and promiscuous mode.


50094 20-Aug-1999 msmith

Loosen up the constructed argument segment generation slightly; rather than
trying to size it intelligently just make it 64k and leave it up to the caller
to ensure that the arguments all fit within that range.

This should resolve the issue that some people were seeing with the PnP BIOS
scan crashing on a large PnP node.


50081 20-Aug-1999 kato

There may exist two kinds of IBM BlueLightning CPU. One is that 5/2
test does not change undefined flag like Cyrix CPUs. Another is that
5/2 test changes undefined flag like Intel CPUs. Latter one could not
be detected and was recognized 486DX CPU. To solve this,
finishidentcpu() calls identblue() when cpu_vendor is null string
(that is, CPUID instruction is not supported) and cpu == CPU_486.
Tests have been done on IBM BlueLightning CPUs, i486SX and i486DX.


50054 19-Aug-1999 peter

Undo my previous commit and do it differently. Break the ffs() etc macros
into two parts - one to do the bsfl and the other to convert the result
(base 0) to ffs()-like (base 1) in inline C. This enables the optimizer
to be a lot smarter in certain cases, like where it knows that the argument
is non-zero and we want ffs(known non zero arg) - 1. This appears to
produce identical code to the old inline when the argument is unknown.


50038 19-Aug-1999 peter

Try using the builtin ffs() for egcs, it (by random inspection)
generates slightly better code and avoids the incl then subl when
using ffs(foo) - 1.


50037 19-Aug-1999 peter

Update for MI switch code, and trim a heap of unused (I believe) entries.


50036 19-Aug-1999 peter

Use the MI process selection. We use a quick routine to decide whether
to get the mplock and enter the kernel to run a process in the SMP case.


49999 18-Aug-1999 alc

Create callable (non-inline) versions of the atomic_OP_TYPE functions
that are linked into the kernel. The KLD compilation options are
changed to call these functions, rather than in-lining the
atomic operations.

This approach makes atomic operations from KLDs significantly
faster on UP systems (though somewhat slower on SMP systems).

PR: i386/13111
Submitted by: peter.jeremy@alcatel.com.au


49996 18-Aug-1999 msmith

Remove the SMBIOS detection and definitions; this should be handled in a
loadable module (under development).


49953 17-Aug-1999 msmith

Search for and interrogate the PnP BIOS if found. This code just prints
the PnP device IDs in verbose mode; it does not (yet) save any resource
data or contribute to the PnP process nor resource management.


49952 17-Aug-1999 msmith

Mindbogglingly, many BIOS vendors expect to be able to load %ds with
0x40 and then access data stored in real-mode segment 0x40, even when
called in protected mode. Microsoft unfortunately coddle these individuals,
and so must we if we want to run their code.

This change works around GPFs in some APM and PnP BIOS implementations.

Obtained from: Linux


49859 16-Aug-1999 gibbs

Fix a bug in busdma_mem_free() where we were improperly checking
the map associated with the region to free.


49829 15-Aug-1999 phk

Give if_tun the "almost clone" makeover.


49827 15-Aug-1999 phk

Give BPF the "almost-clone" update. If you need more of them, make
more entries in /dev and be happy you don't need to recompile your
kernel.


49679 13-Aug-1999 phk

The bdevsw() and cdevsw() are now identical, so kill the former.


49635 11-Aug-1999 alc

_pmap_allocpte:
If the pte page isn't PQ_NONE, panic rather than silently
covering up the problem.


49601 10-Aug-1999 peter

Hopefully fix the previous commit, it caused *all* bridges to be detected
as PCI->HOST bridges on my (440BX) box.

My change is to remove the test at the beginning entirely, letting the
switch on the device ID happen first. If the device ID is unknown, then
(in the default case) check for the generic PCIS_BRIDGE_HOST tag. This
should allow wierd cases (eg: wpaul's IMS VL bridge) to work by using the
id override. This strategy is more in line with the other PCI match
methods we use elsewhere,

I only have a limited testbed, but having my USB etc devices detected as
PCI->HOST bridges doesn't look good.


49591 10-Aug-1999 alc

pmap_remove_pages:
Add KASSERT to detect out of range access to the pv_table and
report the errant pte before it's overwritten.


49580 09-Aug-1999 wpaul

Fix nexus_pcib_is_host_bridge() so that it detects my 486's PCI bus
correctly. It has the following code:

if (class != PCIC_BRIDGE || subclass != PCIS_BRIDGE_HOST)
return NULL;

My 486 has an Integrated Micro Solutions PCI bridge which identifies
itself as subclass PCIS_BRIDGE_OTHER, not PCIS_BRIDGE_HOST. Consequently,
it gets ignored. In my opinion, the correct test should be:

if ((class != PCIC_BRIDGE) && (subclass != PCIS_BRIDGE_HOST))
return NULL;

That way the test still succeeds because the chip's class is PCIC_BRIDGE.
Clearly it's not reasonable to expect all host to PCI bridges to always
have a subclass of PCIS_BRIDGE_HOST since I've got one that doesn't.
This way the sanity test should remain relatively sane while still allowing
some oddball yet correct hardware to work. If somebody has a better way
to do it, go ahead and tweak the test, but be aware that
class == PCIC_BRIDGE and subclass == PCIS_BRIDGE_OTHER is a valid case.

While I was here, I also added an explicit ID string for the IMS chipset.
I also dealt with a minor style nit: it's bad karma not to have a default
case for your switch statements, but the one in this routine doesn't have
one. The default string of "Host to PCI bridge" is now assigned in a
default case of the switch statement instead of initializing "s" with the
string before the switch and then not having any default case.


49558 09-Aug-1999 phk

Merge the cons.c and cons.h to the best of my ability. alpha may or
may not compile, I can't test it.


49536 08-Aug-1999 phk

Make the pty driver as close to a cloning device as we can get for now,
we create the pty on the fly when it is first opened.

If you run out of ptys now, just MAKEDEV some more.

This also demonstrate the use of dev_t->si_tty_tty and dev_t->si_drv1
in a device driver.


49476 07-Aug-1999 jkh

Enable bpf by default. There was no significant dissention to my proposal
of 2 weeks ago that this be done, and anyone who wishes to make bpf more
selective according to securelevel or compile-time options is more
than free to do so.


49474 06-Aug-1999 phk

Forgot the "bsd" slice, now setrootbyname() understands "wd0s1a".


49421 04-Aug-1999 msmith

Fix typo which would have caused MTRR support on non-SMP systems to
behave in an utterly random fashion.

Submitted by: gibbs


49404 04-Aug-1999 peter

Don't probe if pci_cfgopen() fails to find pci hardware, like we used to
to. This might have caused interesting things on non-PCI hardware if
PCI was compiled in.


49337 31-Jul-1999 alc

pmap_object_init_pt:
Verify that object != NULL.


49326 31-Jul-1999 alc

Change the type of vpgqueues::lcnt from "int *" to "int". The indirection
served no purpose.


49304 31-Jul-1999 alc

Add parentheses for clarity.

Submitted by: dillon


49223 29-Jul-1999 msmith

Formatting-only cleanup accidentally omitted from the patch merge in the
previous major update. Bring new code into style alignment with the
existing code. No functional changes.


49207 29-Jul-1999 peter

GBIOSSTACK_SEL is undefined, but OTOH, BSSSEL apparently isn't used either.


49204 29-Jul-1999 msmith

Remove some duplicate definitions, as suggested by Alan Cox.


49203 29-Jul-1999 msmith

Fix for vmspace sharing as per Alan Cox. Thanks!


49197 29-Jul-1999 msmith

Major update to the kernel's BIOS-calling ability.

- Add support for calling 32-bit code in other segments
- Add support for calling 16-bit protected mode code

Update APM to use this facility.

Submitted by: jlemon


49196 29-Jul-1999 green

Remove XXX from the headers (broke the build, I'm betting.)


49195 29-Jul-1999 mdodd

Alter the behavior of sys/kern/subr_bus.c:device_print_child()

- device_print_child() either lets the BUS_PRINT_CHILD
method produce the entire device announcement message or
it prints "foo0: not found\n"

Alter sys/kern/subr_bus.c:bus_generic_print_child() to take on
the previous behavior of device_print_child() (printing the
"foo0: <FooDevice 1.1>" bit of the announce message.)

Provide bus_print_child_header() and bus_print_child_footer()
to actually print the output for bus_generic_print_child().
These functions should be used whenever possible (unless you can
just use bus_generic_print_child())

The BUS_PRINT_CHILD method now returns int instead of void.

Modify everything else that defines or uses a BUS_PRINT_CHILD
method to comply with the above changes.

- Devices are 'on' a bus, not 'at' it.
- If a custom BUS_PRINT_CHILD method does the same thing
as bus_generic_print_child(), use bus_generic_print_child()
- Use device_get_nameunit() instead of both
device_get_name() and device_get_unit()
- All BUS_PRINT_CHILD methods return the number of
characters output.

Reviewed by: dfr, peter


49186 28-Jul-1999 msmith

We're called too early to have any idea whether APM is going to be
active or not. The only sane thing we can do here is assume that if
APM is supported it might be active at some point, and bail.

In reality, even this isn't good enough; regardless of whether we support
APM or not, the system may well futz with the CPU's clock speed and throw
the TSC off. We need to stop using it for timekeeping except under
controlled circumstances. Curse the lack of a dependable high-resolution
timer.


49178 28-Jul-1999 msmith

Remove some droppings left over from the removal of the APM hooks.


49157 28-Jul-1999 dfr

Add support for SYS_RES_DENSE and SYS_RES_BWX resource types. These are
equivalent to SYS_RES_MEMORY for x86 but for alpha, the rman_get_virtual()
address of the resource is initialised to point into either dense-mapped
or bwx-mapped space respectively, allowing direct memory pointers to be
used to device memory.

Reviewed by: Andrew Gallatin <gallatin@cs.duke.edu>


49098 26-Jul-1999 cracauer

Various formatting fixes on my FPE trapcode commit.

Submitted by: BDE


49081 25-Jul-1999 cracauer

On FPU exceptions, pass a useful error code (one of the FPE_...
macros) to the signal handler, for old-style BSD signal handlers as
the second (int) argument, for SA_SIGINFO signal handlers as
siginfo_t->si_code. This is source-compatible with Solaris, except
that we have no <siginfo.h> (which isn't even mentioned in POSIX
1003.1b).

An rather complete example program is at
http://www3.cons.org/cracauer/freebsd-signal.c
This will be added to the regression tests in src/.

This commit also adds code to disable the (hardware) FPU from
userconfig, so that you can use a software FP emulator on a machine
that has hardware floating point. See LINT.


49076 25-Jul-1999 wpaul

This commit adds device driver support for Adaptec Duralink PCI fast
ethernet controllers based on the AIC-6915 "Starfire" controller chip.
There are single port, dual port and quad port cards, plus one 100baseFX
card. All are 64-bit PCI devices, except one single port model.

The Starfire would be a very nice chip were it not for the fact that
receive buffers have to be longword aligned. This requires buffer
copying in order to achieve proper payload alignment on the alpha.
Payload alignment is enforced on both the alpha and x86 platforms.
The Starfire has several different DMA descriptor formats and transfer
mechanisms. This driver uses frame descriptors for transmission which
can address up to 14 packet fragments, and a single fragment descriptor
for receive. It also uses the producer/consumer model and completion
queues for both transmit and receive. The transmit ring has 128
descriptors and the receive ring has 256.

This driver supports both FreeBSD/i386 and FreeBSD/alpha, and uses newbus
so that it can be compiled as a loadable kernel module. Support for BPF
and hardware multicast filtering is included.


49068 24-Jul-1999 dg

Increased max kmem to 200MB. This should fix some out-of-kmem panics on
large systems.


49043 23-Jul-1999 alc

atomic.h:
Change "void *" to "volatile TYPE *", improving type safety
and eliminating some warnings (e.g., mp_machdep.c rev 1.106).

cpufunc.h:
Eliminate setbits. As defined, it's not precisely correct;
and it's redundant. (Use atomic_set_int instead.)

ipl_funcs.c:
Use atomic_set_int instead of setbits.

systm.h:
Include atomic.h.

Reviewed by: bde


48974 22-Jul-1999 alc

Reduce the number of "magic constants" used for page coloring
by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same
thing at different places in the kernel. Drop PQ_PRIME3.


48963 21-Jul-1999 alc

Fix the following problem:

When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR: kern/12378
Submitted by: tegge
Reviewed by: dfr


48925 20-Jul-1999 msmith

Update of the i686 MTRR/memory range support.

- Support for setting memory range attributes on SMP systems using the
new SMP rendezvous function
- Don't print the confusing default memory type message.
- Allow legal overlapping range types.
- Turn interrupts back on after setting MTRRs in UP mode (whoops)
- Don't waste time calling invltlb() after wbinvd(); it's not
SMP-compatible (interrupts are off) and unncessary because
wbinvd already flushes the TLB.

This code is now essentially feature-complete.


48924 20-Jul-1999 msmith

Implement an all-CPU shootdown-style rendezvous facility. This allows
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.

The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.

Reviewed by: peter (partially)


48918 19-Jul-1999 peter

Fix a page size vs. KB mixup. The extra buffers allocated at a reduced
rate is meant to kick in at 64MB, not 256MB.

Reviewed by: Matthew Dillon <dillon@backplane.com>


48889 18-Jul-1999 bde

Updated acquire_timer2()'s state machine to work when the i8254 is
being used for timecounting. Fixed a race or two in it. Undisabled
it.

PR: 10455


48888 18-Jul-1999 bde

Don't let the machdep.tsc_freq sysctl proceed if the TSC is present
but broken, since tsc_timecounter is not initialised in that case,
and updating an uninitialised timecounter is fatal.

Fixed style bugs in the machdep.i8254_freq and machdep.tsc_freq
sysctls.

Reviewed by: phk


48868 17-Jul-1999 phk

Centralize dumpdev handling.


48832 16-Jul-1999 msmith

Add support for multiple PCI busses directly connected to the nexus.
This is only partially complete, but allows 450NX-based systems with
more than one PCI bus to be used again.

Submitted by: dfr


48798 13-Jul-1999 obrien

Move the xe0 driver back where it was. It was misleading where it was as it
does not take over the PCIC, it does require PCCARD support, and it doesn't
replace any existing driver.


48797 13-Jul-1999 alc

Commit the correct patch, i.e., the one that actually corresponds
to the rev 1.2 log entry.


48796 13-Jul-1999 alc

Changed the implementation of the primitives to guarantee atomicity
with respect to interrupts on UP systems. (The upgrade from gcc 2.7.x
to egcs 1.1.2 produced at least one non-atomic code sequence in
swap_pager_getpages.)

In addition, the primitives are now SMP-safe, but only on SMPs. (For
portability between SMPs and UPs, modules are compiled with the SMP-safe
versions.)

Submitted by: dillon and myself
Reviewed by: bde


48729 10-Jul-1999 bde

Go back to the old (icu.s rev.1.7 1993) way of keeping the AST-pending
bit separate from ipending, since this is simpler and/or necessary for
SMP and may even be better for UP.

Reviewed by: alc, luoqi, tegge


48727 10-Jul-1999 bde

Fixed a longstanding scheduling bug. ASTs and softclock interrupts were
not masked during handling of shared PCI interrupts. This resulted in
ASTs sometimes being discarded and softclock interrupts sometimes being
handled prematurely (sometimes = quite often on systems with shared PCI
interrupts, never on other systems).

Debugged by: gibbs and other people at plutotech.com
PR: 6944, maybe 12381


48691 09-Jul-1999 jlemon

Implement support for hardware debug registers on the i386.

Submitted by: Brian Dean <brdean@unx.sas.com>


48677 08-Jul-1999 mckusick

These changes appear to give us benefits with both small (32MB) and
large (1G) memory machine configurations. I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.

* buffer cache hash table now dynamically allocated. This will
have no effect on memory consumption for smaller systems and
will help scale the buffer cache for larger systems.

* minor enhancement to pmap_clearbit(). I noticed that
all the calls to it used constant arguments. Making
it an inline allows the constants to propogate to
deeper inlines and should produce better code.

* removal of inherent vfs_ioopt support through the emplacement
of appropriate #ifdef's, with John's permission. If we do not
find a use for it by the end of the year we will remove it entirely.

* removal of getnewbufloops* counters & sysctl's - no longer
necessary for debugging, getnewbuf() is now optimal.

* buffer hash table functions removed from sys/buf.h and localized
to vfs_bio.c

* VFS_BIO_NEED_DIRTYFLUSH flag and support code added
( bwillwrite() ), allowing processes to block when too many dirty
buffers are present in the system.

* removal of a softdep test in bdwrite() that is no longer necessary
now that bdwrite() no longer attempts to flush dirty buffers.

* slight optimization added to bqrelse() - there is no reason
to test for available buffer space on B_DELWRI buffers.

* addition of reverse-scanning code to vfs_bio_awrite().
vfs_bio_awrite() will attempt to locate clusterable areas
in both the forward and reverse direction relative to the
offset of the buffer passed to it. This will probably not
make much of a difference now, but I believe we will start
to rely on it heavily in the future if we decide to shift
some of the burden of the clustering closer to the actual
I/O initiation.

* Removal of the newbufcnt and lastnewbuf counters that Kirk
added. They do not fix any race conditions that haven't already
been fixed by the gbincore() test done after the only call
to getnewbuf(). getnewbuf() is a static, so there is no chance
of it being misused by other modules. ( Unless Kirk can think
of a specific thing that this code fixes. I went through it
very carefully and didn't see anything ).

* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not
think this check is necessary, the buffer should flush properly
whether the vnode is locked or not. ( yes? ).

* removal of extra arguments passed to getnewbuf() that are not
necessary.

* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
vfs_cluster.c

* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
which should greatly aid flushing operations in heavy load
situations - both the pageout and update daemons will be able
to operate more efficiently.

* removal of b_usecount. We may add it back in later but for now
it is useless. Prior implementations of the buffer cache never
had enough buffers for it to be useful, and current implementations
which make more buffers available might not benefit relative to
the amount of sophistication required to implement a b_usecount.
Straight LRU should work just as well, especially when most things
are VMIO backed. I expect that (even though John will not like
this assumption) directories will become VMIO backed some point soon.

Submitted by: Matthew Dillon <dillon@backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


48645 06-Jul-1999 des

Rename bpfilter to bpf.


48636 06-Jul-1999 peter

Quieten gcc paranoia.


48632 06-Jul-1999 peter

Typo: s/0ff0/0xff0/


48621 06-Jul-1999 cracauer

Implement SA_SIGINFO for i386. Thanks to Bruce Evans for much more
than a review, this was a nice puzzle.

This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.

Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.

Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.

Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c

Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):

Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)

New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */

The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.

FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.

Reviewed by: BDE


48618 06-Jul-1999 green

Add Centaur/IDT WinChip support.

Why in the world do people put breaks at the end of a switch's default case?


48615 06-Jul-1999 green

I made some cleanups, rearranged things a bit, and made AMD Features default
printing on CPUs that have it.
If there are no objections, I'll MFC all recent changes (harmless, really)
to 3.2 and PAO.


48579 05-Jul-1999 msmith

Move the initialisation/tuning of nmbclusters from param.c/machdep.c
into uipc_mbuf.c. This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.

Make NMBUFs tunable as well.

Move the nmbclusters sysctl here as well.

Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.

Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).


48572 05-Jul-1999 green

Add an extra space to " AMD Features=" to make it line up well.


48571 05-Jul-1999 green

K6-III CPUs are now case:d in the appropriate switch; also, in
print_AMD_info(), L2 internal cache is shown, as are AMD's special CPUID
infos:

CPU: AMD-K6(tm) 3D processor (350.81-MHz 586-class CPU)
Origin = "AuthenticAMD" Id = 0x58c Stepping=12
Features=0x8021bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX>
AMD Features=0x808029bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SYSCALL,PGE,MMX,3DNow!>

PR: kern/12512
Submitted by: Louis A. Mamakos <louie@TransSys.COM>


48546 04-Jul-1999 jlemon

Some cleanup and rearrangement. hw.physmem is now an absolute quantity;
we will never use more memory than this value (if specified), but will always
check memory for validity up to this amount.

Get rid of the speculative_mprobe option; the memory amount can now be
specified by hw.physmem.


48544 04-Jul-1999 mckusick

The buffer queue mechanism has been reformulated. Instead of having
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean
and dirty buffers have been separated. Empty buffers with KVM
assignments have been separated from truely empty buffers. getnewbuf()
has been rewritten and now operates in a 100% optimal fashion. That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).

Buffer flushing has been reorganized. Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur. This resulted in processes blocking on conditions
unrelated to what they were doing. This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits. We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.

The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat. The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.

reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed. This
algorithm has been changed. reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list. The new algorithm is deterministic but
not perfect. The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.

The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit. This bit allows processes working with filesystem
buffers to use available emergency reserves. Normal processes do not set
this bit and are not allowed to dig into emergency reserves. The purpose
of this bit is to avoid low-memory deadlocks.

A small race condition was fixed in getpbuf() in vm/vm_pager.c.

Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


48531 03-Jul-1999 peter

printf int/dev_t (pointer) warning


48527 03-Jul-1999 imp

Improve compatibility with other systems by changing the default
behavior slightly.

If machine/bus.h is included, but neither bus_memio.h nor bus_pio.h
are included, then behave as if both were included.

This won't change existing drivers, all of which include one or more
of bus_{p,mem}io.h, but will allow drivers from other systems to come
over with fewer changes. I freely admit that this might not be
optimal for some drivers, but those drivers can be optimized for
FreeBSD after the initial bringup happens.

Without the change, there is a bug that preclude drivers from
compiling with strange warning/errors.

I've been running this here for a while now w/o ill effects.

Reviewed by: gibbs
Not objected to by: bde, arch@ list.


48521 03-Jul-1999 peter

Fix warnings in last commit (dev_t is not an int, and not even int
compatable in arg lists on the Alpha)


48512 03-Jul-1999 phk

Be more informative and try to ask the user in some instances if we can't
figure out the root device.


48505 03-Jul-1999 alc

An SMP-specific change: Add the lock prefix to RMW operations
on ipending.


48476 02-Jul-1999 msmith

Lightly overhaul the memory sizing code again.

- The kernel environment variable 'hw.physmem' can be used to set the
amount of physical memory space, based at 0, that FreeBSD will use.
Any memory detected over this limit is ignored. Documentation for
this is available under 'help set tunables' in the loader.

- In the case where system memory size can't be accurately determined,
hw.physmem is used as a best-guess memory size, but speculative
probing will be used to determine actual memory size if any of the
guesses or hints are 16M or more.

- If RB_VERBOSE, we list the memory regions as we test them.

- The compile-time option MAXMEM supplies a default value for
'hw.physmem'.


48449 02-Jul-1999 mjacob

Correct some ugly formatting. Remember to initialize the alignment tag.
Honor and pass a callers request to contigalloc if they had a non-zero
alignment constraint.


48445 02-Jul-1999 peter

Zap totally the npx0 memory size override. It only worked if statically
specified in the kernel config file - but setting options MAXMEM works
exactly the same. Userconfig overrides of this have not worked for
ages.

Also, change the getenv for the loader override to hw.physmem based on a
prior suggestion from Mike Smith. I think he still wants to change this
some, but this shouldn't get in his way. This is a forced setting of
the memory size, not a "cap". We probably should have a plain 'maxmem'
variable as well which does do a cap, without loosing the bios memory
configuration data.


48405 01-Jul-1999 peter

Look up the kernel environment for MAXMEM as a final override for the
memory size. If somebody wants to change the name, fine - I used this
since it's consistant with the config variable it replaces.
This is intended to replace the npx0 msize hack (which no longer works).


48404 01-Jul-1999 peter

Move kern_envp and preload initialization a little earlier so that we
can do a getenv_int() inside the memory sizing routines to override the
memory limit.


48391 01-Jul-1999 peter

Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface. kproc_start is still available
as a SYSINIT() hook. This allowed simplification of chunks of the
sysinit code in the process. This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process. It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit. This means that nfsiod
doesn't need to be in /sbin and is always "available". This is a fair bit
easier to do outside of the SYSINIT_KT() framework.


48345 29-Jun-1999 peter

Put on my asbestos suit and attempt to tidy up and add some simple docs
or notes to make it much more obvious what things are for people who
have not committed LINT to memory yet.


48327 28-Jun-1999 luoqi

Save common_tssd before it's loaded and the busy bit set.

Submitted by: bde


48308 28-Jun-1999 peter

Use the same -UKERNEL strategy as the alpha to avoid the inlines etc.


48288 27-Jun-1999 alc

An SMP-specific change: Remove an unnecessary lock acquire and release
from every system call. (Storing a 32-bit constant is inherently
atomic.)

Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>


48266 27-Jun-1999 peter

Shut up gcc.


48203 24-Jun-1999 jlemon

Fix warning message; that was 4GB, not 2GB. I apparently can't do
arithmetic today.


48202 24-Jun-1999 jlemon

Explicitly ignore any memory > 2GB, we don't support it yet.


48200 24-Jun-1999 jlemon

Only include AMD wt_alloc routines if I586_CPU is defined. Fixes
CPU_WT_ALLOC for cyrix chips.

Submitted by: "Brian Smith" <dbsoft@technologist.com>


48160 24-Jun-1999 green

This commit gives support for the Rise mP6 CPU. It has two changes:
1. Rise is recognized in identdcpu.c.
2. The TSC is not written to. A workaround for the CPU bug is being
applied to clock.c (the bug being that the mP6 has TSC enabled
in its CPUID-capabilities, but it only supports reading it. If we
try to write to it (MSR 16), a GPF occurs.) The new behavior is that
FreeBSD will _not_ zero the TSC. Instead, we do a bit of 64-bit
arithmetic.

Reviewed by: msmith
Obtained from: unfurl & msmith


48145 23-Jun-1999 msmith

Changes in the way that the APs are started appears to have removed the
problem with having more CPUs than NCPU.

PR: kern/4255
Submitted by: peter


48144 23-Jun-1999 luoqi

Do not setup 4M pdir until all APs are up.


48119 22-Jun-1999 msmith

Remove an unnecessary panic when sparse PCI bus numbering is encountered.
This is found eg. on some Compaq Proliant systems.

Submitted by: peter


48043 20-Jun-1999 jkh

Clean up some of the documentation at the top.


48008 18-Jun-1999 green

Harmless change to prevent possible problems in the future. I made
sure that i686_mem was only used when
1. CPUID had MTRR set (this was there before)
2. the CPU was GenuineIntel (not there)
3. the CPU is a 686 (also not there)

This should prevent any problems with CPUs that set MTRR but aren't
compatibile with Intel's interface (none that I know of yet.)


48005 18-Jun-1999 bde

Changed the global `idt' from an array to a pointer so that npx.c
automatically hacks on the active copy of the IDT if f00f_hack()
has changed it. This also allows simplifications in setidt().
This fixes breakage of FP exception handling by rev.1.55 of
sys/kernel.h. FP exceptions were sent to npx.c's probe handlers
because npx.c "restored" the old handlers to the wrong copy of the
IDT. The SYSINIT for f00f_hack() was purposely run quite late to
avoid problems like this, but it is bogusly associated with the
SYSINIT for proc0 so it was moved with the latter.

Problem reported and fix tested by: Martin Cracauer <cracauer@cons.org>


47942 16-Jun-1999 tegge

Clean up bitrot in interrupt tracing code.


47926 15-Jun-1999 des

Kill option FAILSAFE.

PR: i386/12187
Approved by: bde


47892 13-Jun-1999 alc

Use pmap_kenter instead of pmap_enter to map the message buffer.


47862 10-Jun-1999 jlemon

Change variable used for calculating ending address of physical memory
from 'int' to 'vm_offset_t'.

Spotted by: Richard Cownie <tich@ma.ikos.com>


47842 08-Jun-1999 dt

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() to allocate
kernel virtual address space for UPAGES.


47763 05-Jun-1999 luoqi

Fix an accounting problem when prefaulting 4M pages.

PR: kern/11948


47716 03-Jun-1999 peter

remove references to isa_device, it's no longer associated with interrupts.


47688 01-Jun-1999 jlemon

Unbreak memory sizing for SMP.


47680 01-Jun-1999 phk

Introduce the makebdev() function, it does the same as the makedev()
function for now, but that will change.


47679 01-Jun-1999 jlemon

Null commit; note that there is a new memory sizing routine that uses
the BIOS calls to determine the memory configuration. This should fix
problems with >64M for good.

Reviewed by: Mike Smith


47678 01-Jun-1999 jlemon

Unifdef VM86.

Reviewed by: silence on on -current


47642 31-May-1999 dfr

Remove fd driver from its old home and change files which include rtc.h
to account for its new location.


47640 31-May-1999 phk

Simplify cdevsw registration.

The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.


47625 30-May-1999 phk

This commit should be a extensive NO-OP:

Reformat and initialize correctly all "struct cdevsw".

Initialize the d_maj and d_bmaj fields.

The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.


47616 30-May-1999 dfr

Allow up to 8 ports, 4 memory regions and two irqs and drqs.


47611 30-May-1999 dfr

Activate/deactivate resources by calling the method, not through the
resource manager automatic handling of RF_ACTIVE.


47592 29-May-1999 phk

Stop the TSC from being used as timecounter on K5/step0 machines.


47588 28-May-1999 bde

Fixed glitches (jumps) of about 1/HZ seconds for the i8254 timecounter.
The old version only worked right when the time was read strictly
more often than every 1/HZ seconds, but we only guarantee reading
it every (1/HZ + epsilon) seconds. Part of rev.1.126-1.127 attempted
to fix this but didn't succeed. Detect counter rollover using the
heuristic from the old version of microtime() with additional
complications for supporting calls from fast interrupt handlers.
This works provided i8254 interrupts are not delayed by more than
1/(2*HZ) seconds.

This needs more comments, and cleanups for the SMP case, and more
testing of the SMP case before it is merged into RELENG_3.

Tested by: jhay


47575 28-May-1999 alc

pmap_object_init_pt:
The size of vm_object::memq is vm_object::resident_page_count,
not vm_object::size.


47444 24-May-1999 jb

- Make setroot() conditional on FFS etc, to avoid a compiler warning
on systems with no FFS.
- Remove all references to mfs from cpu_rootconf(). mfs_init is
called prior to cpu_rootconf(), so it can set mountrootfsname to mfs
and (more imporantly) set rootdev using the (bogus in Bruce's opinion)
special major number of 255.


47398 22-May-1999 dfr

* Factor out the common code between the isa bus drivers for i386 and alpha.
* Re-work the resource allocation code to use helper functions in subr_bus.c.
* Add simple isa interface for manipulating the resource ranges which can be
allocated and remove the code from isa_write_ivar() which was previously
used for this purpose.


47390 22-May-1999 peter

Recover from removing the last (unshared) interrupt handler.

PR: 11806
Submitted by: Assar Westerlund <assar@sics.se>


47350 21-May-1999 wpaul

This commit adds driver support for PCI fast ethernet cards based on the
ADMtek AL981 "Comet" chipset. The AL981 is yet another DEC tulip clone,
except with simpler receive filter options. The AL981 has a built-in
transceiver, power management support, wake on LAN and flow control.
This chip performs extremely well; it's on par with the ASIX chipset
in terms of speed, which is pretty good (it can do 11.5MB/sec with TCP
easily).

I would have committed this driver sooner, except I ran into one problem
with the AL981 that required a workaround. When the chip is transmitting
at full speed, it will sometimes wedge if you queue a series of packets
that wrap from the end of the transmit descriptor list back to the
beginning. I can't explain why this happens, and none of the other tulip
clones behave this way. The workaround this is to just watch for the end
of the transmit ring and make sure that al_start() breaks out of its
packet queuing loop and waiting until the current batch of transmissions
completes before wrapping back to the start of the ring. Fortunately, this
does not significantly impact transmit performance.

This is one of those things that takes weeks of analysis just to come
up with two or three lines of code changes.


47343 20-May-1999 n_hibma

usbdi.h:
Implement priorities.
GENERIC, LINT, files:
Remove remarks about ordering of device names.
GENERIC, LINT:
Sort the devices alphabetically in LINT and GENERIC.


47307 18-May-1999 peter

Move pcibus (host -> pci bus) probe/attach routines from nexus
to pcibus.c. pci_cfgopen() becomes static and there are no more
bus #ifdef's in nexus.c.


47292 18-May-1999 alc

pmap_qremove:
Eliminate unnecessary TLB shootdowns.


47226 15-May-1999 peter

Don't hardcode IRQ 13 for NPX. It's as good as hardwired in the hardware
though, on systems (386 mostly) that still have a seperate fpu, but it
might be possible to find systems where the FPU coprocessor is wired to
a different IRQ pin.


47178 14-May-1999 dfr

* Define a new static method DEVICE_IDENTIFY which is called to add device
instances to a parent bus.
* Define a new method BUS_ADD_CHILD which can be called from DEVICE_IDENTIFY
to add new instances.
* Add a generic implementation of DEVICE_PROBE which calls DEVICE_IDENTIFY
for each driver attached to the parent's devclass.
* Move the hint-based isa probe from the isa driver to a new isahint driver
which can be shared between i386 and alpha.


47155 14-May-1999 obrien

Add the `xe' Xircom PC Card driver.


47101 13-May-1999 bde

Renamed the private copies of strlen and strcpy to gdb_strlen and
gdb_strcpy, respectively. This saves fixing the wrong return type
of the private strlen and makes the addresses of strlen and strcpy
unambiguous.


47081 12-May-1999 luoqi

Unbreak VESA on SMP.


47080 12-May-1999 luoqi

VM86_FRAMESIZE is now the size of vm86 frame, not the number of 4-byte words.

Requested by: Bruce


47048 12-May-1999 phk

Fix dumpon. It passes a udev_t from userland to kernel, that needs a
udev2dev() before we use it.

It really should pass a name like swapon does.


47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


47024 11-May-1999 luoqi

Yet another place I missed when increasing trapframe size, which causes problem
to SIGFPE handling.

Reviewed by: Bruce Evans <bde@zeta.org.au>


47022 11-May-1999 luoqi

Do not hardcode size of struct vm86frame.

Submitted by: Jonathan Lemon <jlemon@americantv.com>


46917 10-May-1999 dfr

Add missing suspend/resume methods.


46915 10-May-1999 peter

Move the mfs_getimage() prototype to mfs_extern.h duplicating it
everywhere.


46881 10-May-1999 bde

[Forgot to commit this in the batch a few days ago.]

Fixed profiling of elf kernels. Made high resolution profiling compile
for elf kernels (it is broken for all kernels due to lack of egcs support).

Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others. A few type declarations are still missing, mainly for SMP.

PR: 9413
Submitted by: Assar Westerlund <assar@sics.se> (initial parts)


46849 09-May-1999 peter

Clean out some unused leftovers from before the split from the old isa.c.


46847 09-May-1999 peter

For what it's worth, idelayed is declared as a volatile in the headers,
and even though it's not used in this file make it a volatile here too.


46846 09-May-1999 peter

loadandclear() uses an atomic instruction (even on SMP, where it's an
implicitly LOCK'ed instruction), so there shouldn't be any harm in making
it volatile pointer compatable for one of the users of it. It seems to
generate the same code regardless.


46823 09-May-1999 peter

s/main/mi_startup/ for the kernel entry point so that egcs doesn't get
upset about it (and generate things like __main() calls that are reserved
for main()). Renaming was phk's suggestion, but I'd already thought about
it too. (phk liked my suggested name tada() but I decided against it :-)

Reviewed by: phk


46808 09-May-1999 phk

Oops. If ROOTDEVNAME isn't defined, have -r call -a.


46806 09-May-1999 phk

Major lobotomy of config(8). The

config kernel mumble mumble

line has been obsoleted and removed and with it went all knowledge of
devices on the part of config.

You can still configure a root device (which is used if you give
the "-r" flag) but now with an option:

options ROOTDEVNAME=\"da0s2e\"

The string is parsed by the same code as at the "boot -a" prompt.

At the same time, make the "boot -a" prompt both more able and more
informative.

ALPHA/PC98 people: You will have to adapt a few simple changes
(defining rootdev and dumpdev somewhere else) before config works
for you again, sorry, but it's all in the name of progress.


46783 09-May-1999 phk

add some amount of sanity to the way the gdb stuff finds its device.

I'm not too happy about the result either, but at least it has less
chance of backfiring.

This particular feature could be called "a mess" without offending
anybody.


46773 09-May-1999 phk

Duh, bdevsw() takes dev_t arg.


46743 08-May-1999 dfr

Move the declaration of the interrupt type from the driver structure
to the BUS_SETUP_INTR call.


46737 08-May-1999 peter

Add some notes about the globalness of certain things like interrupts
and ISA DMA channels (ie: on most PCI systems, they are not.. they are
on the ISA side of the PCI-ISA bridge and could be duplicated if there
were multiple PCI-ISA bridges, say in a laptop docking station), while
the APIC resources would be global on SMP systems.
Also, revert a previous change, change some printfs back to panics.


46734 08-May-1999 peter

GC some #if 0 junk


46728 08-May-1999 peter

Don't print 'interrupting at irq nn' on the x86 family, it's not all
that big a deal just yet and isn't worth a whole line on the boot screen.
This could change later in the face of multi-ISA-bus (eg: laptop docking
stations with two independent ISA busses) and SMP/APIC systems. The Alpha
already has multiple interrupt destinations to deal with.


46718 08-May-1999 peter

Look up the sensitive flag better, allowing interoperation between old and
new isa drivers with sensitive flags. If the resource_find() code
is meant to "find" the wildcard sensitive flag for a driver even though
a unit is supplied, this can be simplified.


46717 08-May-1999 peter

Fix unused variable "flags". (only used if #ifdef I586_CPU)


46703 08-May-1999 peter

Make sure the mem_range_AP_init() prototype is seen where it's needed, and
#ifdef SMP around it for fun.


46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


46635 07-May-1999 phk

Continue where Julian left off in July 1998:

Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.

Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)

Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)


46624 07-May-1999 mckusick

Generalize to allow any serial port to be used as the GDB port.
Mark the GDB port in the config file with flags 0x80. Currently
only the sio driver checks these flags and sets up a GDB port,
but adding similar code to other serial drivers would be easy.
For backward compatibility, if an sio port is marked as the console
and no port is marked as the gdb port, the GDB port will be mapped
to the console port. This hack should go away at some point.


46600 06-May-1999 peter

Ensure prototype for pnp_configure() is visible.


46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


46555 06-May-1999 peter

I'm not sure why the #ifdef SMP became #if 1 (this overrode the npx probe
and always succeeded as is required on SMP). Anyway, reverting this
still compiles and appears ok.


46548 06-May-1999 bde

Fixed profiling of elf kernels. Made high resolution profiling compile
for elf kernels (it is broken for all kernels due to lack of egcs support).

Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others. A few type declarations are still missing, mainly for SMP.

PR: 9413
Submitted by: Assar Westerlund <assar@sics.se> (initial parts)


46539 06-May-1999 luoqi

Initialize dblfault_tss.tss_fs to the per-cpu private data segment selector.


46537 06-May-1999 luoqi

Do not set curproc until proc0 is fully initialized (in proc0_init()).


46454 04-May-1999 dfr

Use unit, not device_id as an argument to an old-style ISA interrupt
handler. This fixes pnp interrupts and would have fixed pccard interrupts
but a workaround has been applied there.

This the sound driver problems which people have reported with new-bus.


46382 04-May-1999 msmith

Disable the ppc chipset-specific probes by default.


46381 03-May-1999 billf

Add sysctl descriptions to many SYSCTL_XXXs

PR: kern/11197
Submitted by: Adrian Chadd <adrian@FreeBSD.org>
Reviewed by: billf(spelling/style/minor nits)
Looked at by: bde(style)


46357 03-May-1999 peter

Don't deref a NULL mem_range_softc.mr_op pointer on non-MTRR systems when
starting the AP.


46346 02-May-1999 n_hibma

Add driver for the Iomega Zip 100 drive.


46245 02-May-1999 msmith

Whoops, not all SMP systems have memory range attribute support. Don't
try to set it up on an AP unless we do.

Submitted by: dave adkins <adkin003@tc.umn.edu>


46215 30-Apr-1999 msmith

Add a hook that can be called to initialise a slave processor's memory
range attributes after they have been extracted from the master.

Hook up the i686 MP code to do this for each AP.

Be more careful about printing the default memory type for the i686.

Suggestions from: luoqi


46129 28-Apr-1999 luoqi

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


46089 26-Apr-1999 peter

Register the netisr's via SYSINIT rather than linker sets.


46072 25-Apr-1999 alc

pmap_dispose_proc and pmap_copy_page:
Conditionally compile 386-specific code.

pmap_enter:
Eliminate unnecessary TLB shootdowns.

pmap_zero_page and pmap_zero_page_area:
Use invltlb_1pg instead of duplicating the code.


46054 25-Apr-1999 phk

Make the machdep.i8254_freq and machdep.tsc_freq sysctls modify the
timecounter as well

Asked for by: bde, jhay


46037 24-Apr-1999 peter

De-quote where possible and minor tweaks. depends on a current config(8).


46015 24-Apr-1999 kato

Changed the type of id_port from short into int to avoid wrong
conversion from short to unsigned long which is an argument of
bus_alloc_resource. Since the value -1 is used to indicate no port
reousece, id_port need to be signed (suggested by Doug Rabson and
Peter Wemm.)


45999 24-Apr-1999 peter

Drop the tty/net/bio/cam interrupt class labels, they are meaninless here
now.


45985 24-Apr-1999 peter

Don't clear the hints on release, just the resource containers.


45980 24-Apr-1999 kato

1MB is not 1024 * 1024 * 1024 but 1024 * 1024.


45962 23-Apr-1999 peter

Make the register_intr() glue actually have a chance of working...


45960 23-Apr-1999 dt

Make pmap_collect() an official pmap interface.


45959 23-Apr-1999 dt

Moved cpu_set_fork_handler's prototype from <machine/cpu.h> to <sys/proc.h>.

Suggested by: bde


45900 21-Apr-1999 peter

oops, SMP was missing includes for a typedef.


45897 21-Apr-1999 peter

Stage 1 of a cleanup of the i386 interrupt registration mechanism.
Interrupts under the new scheme are managed by the i386 nexus with the
awareness of the resource manager. There is further room for optimizing
the interfaces still. All the users of register_intr()/intr_create()
should be gone, with the exception of pcic and i386/isa/clock.c.


45833 19-Apr-1999 alc

_pmap_unwire_pte_hold and pmap_remove_page:
Use pmap_TLB_invalidate instead of invltlb_1pg to eliminate
unnecessary IPIs.

pmap_remove, pmap_protect and pmap_remove_pages:
Use pmap_TLB_invalidate_all instead of invltlb to eliminate
unnecessary IPIs.

pmap_copy:
Use cpu_invltlb instead of invltlb when updating APTDpde.

pmap_changebit:
Rather than deleting the unused "set bit" option (which may be
useful later), make pmap_changebit an inline that is used
by the new pmap_clearbit procedure.

Collectively, the first three changes reduce the number of TLB shootdown
IPIs by 1/3 for a kernel compile.


45821 19-Apr-1999 peter

unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.


45817 19-Apr-1999 peter

Drop the 'at nexus?' from the busses, it's not used.
Reactivate eisa0 and pnp0 in GENERIC, they work.. (eisa has been converted
but pnp still (for the most part) works the old way).


45813 19-Apr-1999 brian

Spelling police


45808 19-Apr-1999 peter

Always create attach points for the various child busses that can be
attached to the nexus. With one exception, this (for example) allows
you to do wierd things like kldload the eisa bus on the fly and then
drivers, and have it auto probe the eisa bus when the drivers come online.

The one exception being pci, it only adds the pcib after the presence of
the pci bus is detected and that's #if'ed code.

A side effect of this is that isa and eisa will be attached to the nexus
directly rather than the PCI->ISA or PCI->EISA bridges. I'm not sure if
this is good or bad at this point, but it seems to be closer to the way
things are for the i386 family... This is likely to be followed up.

This also fixes compilation without a PCI bus configured and will allow
eisa to work without PCI too.


45791 18-Apr-1999 peter

Implement an EISA new-bus framework. The old driver probe mechanism
had a quirk that made a shim rather hard to implement properly and it was
just easier to convert the drivers in one go. The changes to the
buslogic driver go beyond just this - the whole driver was new-bus'ed
including pci and isa. I have only tested the EISA part of this so far.

Submitted by: Doug Rabson <dfr@nlsystems.com>


45779 18-Apr-1999 kato

Added PC98 code.

Submitted by: Takahashi Yoshihiro <nyan@wyvern.cc.kogakuin.ac.jp>


45723 16-Apr-1999 peter

As a temporary anti-foot-shooting measure, don't let the user attach
the atkbd device to isa, as was in the old (and 3.x) GENERIC config.


45720 16-Apr-1999 peter

Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by: core


45718 16-Apr-1999 jkh

Add SYSVSEM so that newer versions of Xaccel don't require a kernel
compile just to work. We have the room now, so what the heck.

Reqested by: Thomas Roell <roell@xig.com>


45715 16-Apr-1999 n_hibma

Remove the entries for umodem and ucom. These drivers only probe
and attach, nothing else. This is confusing to people.


45703 15-Apr-1999 bde

Made booting with -a work for all configurations. Previously it
only worked for configurations with "swap on generic".

usr.sbin/config/config.y:
- ignore all "swap [on] device ...' specifications except for
warning about them. They haven't done anything related to swap
for almost 4 years, and were previously silently ignored,
except for "swap on generic" which stopped swap${KERNEL}.c
from being generated. Code to support swapping is now deader
than before.

usr.sbin/config/mkswapconf.c:
- don't generate a dummy setconf() function in swap${KERNEL}.c.

sys/i386/conf/files.i386:
- swapgeneric.c is now standard. It should be merged into autoconf.c
so that it doesn't conflict with swap${KERNEL}.c for kernels named
"generic".

sys/i386/i386/autoconf.c:
- don't call setroot() for mfs roots. Since setroot() doesn't do anything
harmful, this was just a waste of time, except possibly for booting with
-a it may have helped prevent an undesireable call to setconf() by
finding a bogus rootdev.
- honor -a for ffs roots. -a now overrides all other ways of specifying
the root device. Previously, -r had precedence over -a, and the -a
handling was usually a no-op.
- don't honor -a for non-ffs roots, since it would currently just get in
the way of a clean panic.

sys/i386/i386/swapgeneric.c:
- don't declare things that are now always declared in swap${KERNEL}.c.
Don't decide things that are now decided in autoconf.c. Code to
support the "generic" case is now dead instead of useless.


45676 14-Apr-1999 bde

Generate intrnames[] dynamically. This should be new-bus friendly.

Old version reviewed by: se


45668 13-Apr-1999 peter

Add a commented-out example on using the makeoptions command to get a
kernel.debug.


45666 13-Apr-1999 peter

Shoot the LKM support in the old wd/wdc/atapi driver set in the head and
perform a cleanup/unifdef sweep over it to tidy things up. The atapi
code is permanently attached to the wd driver and is always probed.

I will add an extra option bit in the flags to disable an atapi probe on
either the master or slave if needed, if people want this.

Remember, this driver is destined to die some time. It's possible that
it will loose all atapi support down the track and only be used for
dumb non-ATA disks and all ata/atapi devices will be handled by the new
ata system.

ATAPI, ATAPI_STATIC and CMD640 are no longer options, all are implicit.

Previously discussed with: sos


45643 13-Apr-1999 tegge

Backout early start of APs since it caused some machines to hang.


45605 11-Apr-1999 n_hibma

Make debugging more selective.
Remove debugging options from GENERIC


45597 11-Apr-1999 peter

Move initialization of SWI's in the tty|net|bio masks from isa.c into
the static initializers in ipl.s.


45566 11-Apr-1999 tegge

Add prototype for wait_ap().


45562 10-Apr-1999 tegge

Let BSP wait until all APs are initialized.


45555 10-Apr-1999 tegge

Test CF after a btrl operation instead of testing ZF (which is undefined).


45524 10-Apr-1999 alc

pmap_remove_pte:
Use "loadandclear" to update the pte.

pmap_changebit and pmap_ts_referenced:
Switch to pmap_TLB_invalidate from invltlb.


45436 07-Apr-1999 peter

Disable the mtrr copy calls, it doesn't work with the i686_mem.c stuff.
This should make it compile/link again.


45406 07-Apr-1999 msmith

Add defines for the P6 model-specific registers.


45405 07-Apr-1999 msmith

mem.c
Split out ioctl handler a little more cleanly, add memory
range attribute handling for both kernel and user-space
consumers.

pmap.c
Remove obsolete P6 MTRR-related code.

i686_mem.c
Map generic memory-range attribute interface to the P6 MTRR
model.


45370 06-Apr-1999 alc

Two changes to pmap_remove_all:

1. Switch to pmap_TLB_invalidate from invltlb, eliminating a full TLB
flush where a single-page flush suffices. (Also, this eliminates some
unnecessary IPIs.)

2. Use "loadandclear" to update the pte, eliminating a race condition
on SMPs.

Change #2 should be committed to -STABLE.


45347 05-Apr-1999 julian

Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>


45270 03-Apr-1999 jdp

Restore support for executing BSD/OS binaries on the i386 by passing
the address of the ps_strings structure to the process via %ebx.
For other kinds of binaries, %ebx is still zeroed as before.

Submitted by: Thomas Stephens <tas@stephens.org>
Reviewed by: jdp


45252 02-Apr-1999 alc

Put in place the infrastructure for improved UP and SMP TLB management.

In particular, replace the unused field pmap::pm_flag by pmap::pm_active,
which is a bit mask representing which processors have the pmap activated.
(Thus, it is a simple Boolean on UPs.)

Also, eliminate an unnecessary memory reference from cpu_switch()
in swtch.s.

Assisted by: John S. Dyson <dyson@iquest.net>
Tested by: Luoqi Chen <luoqi@watermarkgroup.com>,
Poul-Henning Kamp <phk@critter.freebsd.dk>


45140 30-Mar-1999 phk

Purging lint from the Bruce filter.


45122 29-Mar-1999 ken

Delete all references to the "aic" driver. It isn't in the tree, and
may not show up for a while, and I'm tired of people asking about it.

Perhaps this will eliminate some of the confusion.


45100 28-Mar-1999 dt

Ifdef declaration of a conditionally defined function "timezero".


44924 21-Mar-1999 phk

Link the bb structures together as we find them.


44917 20-Mar-1999 alc

Eliminate a pointless TLB flush from the SMP idle loop.

Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
Reviewed by: "John S. Dyson" <toor@dyson.iquest.net>


44844 18-Mar-1999 jlemon

Change btrl/btsl to cmpl/movl, since each cpu now has their own copy
of private_tss, and there's no need to use a bit array. Also fixes
the problem of using `je' after btrl, since cmpl sets ZF.

Noticed by: Luoqi, on -current


44807 16-Mar-1999 msmith

Look for the right ACPI table signature.

PR: i386/10587
Submitted by: Takanori Watanabe <takawata@shidahara1.planet.sci.kobe-u.ac.jp>


44801 16-Mar-1999 sos

Rewert the atapi CDROM driver's name to wcd.
This is to avoid confusion with the new system.
Also provide real entires in MAKEDEV for the new system.


44704 13-Mar-1999 alc

pmap_qenter/pmap_qremove:
Use the pmap_kenter/pmap_kremove inline functions
instead of duplicating them.

pmap_remove_all:
Eliminate an unused (but initialized) variable.

pmap_ts_reference:
Change the implementation. The new implementation is much smaller
and simpler, but functionally identical. (Reviewed by
"John S. Dyson" <dyson@iquest.net>.)


44670 11-Mar-1999 dg

Increased kernel virtual address space to 1GB. NOTE: You MUST have fixed
bootblocks in order to boot the kernel after this! Also note that this
change breaks BSDI BSD/OS compatibility.
Also increased default NKPT to 17 so that FreeBSD can boot on machines
with >=2GB of RAM. Booting on machines with exactly 4GB requires other
patches, not included.


44645 10-Mar-1999 roberto

Fix two tests against hex. values for CPUID.

PR: i386/10050
Submitted by: Kevin Day <toasty@dragondata.com>


44611 09-Mar-1999 phk

Make TIMER_FREQ a normal, undocumented option. Raise confusion to
a higher level with example in LINT.

Clarify comment about PPS_SYNC. Ignore for now that it doesn't
work in FLL mode, it will in a few days.


44510 06-Mar-1999 wollman

Expose a slightly-lower-level interface to timeouts which allows callers
to manage their own memory. Tested on my machine (make buildworld).
I've made analogous changes on the alpha, but don't have a machine
to test.

Not-objected-to by: dg, gibbs


44487 05-Mar-1999 bde

The magic "no-cpu" cpu number is 0xff. Don't misrepresent cpu
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them. In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.


44470 05-Mar-1999 alc

Fix an SMP-only TLB invalidation bug. Specifically, disable
a TLB invalidation optimization that won't work given the
limitations of our current SMP support.

This patch should be applied to -stable ASAP.

Thanks to John Capo <jc@irbs.com>,
Steve Kargl <sgk@troutmask.apl.washington.edu>, and
Chuck Robey <chuckr@mat.net>
for testing.


44429 02-Mar-1999 dg

Correct casts in vtophys and avtophys to be vm_offset_t.


44344 28-Feb-1999 mckusick

Update to know about current kernel directory layout.
Add ability to build links as well as tags.


44327 28-Feb-1999 bde

Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process. Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by: dfr, phk


44289 26-Feb-1999 tegge

Don't call assign_apic_irq with a value for irq that is out of range.


44256 25-Feb-1999 bde

Don't forget to update `switchticks' in corner cases (except for
the alpha fork_trampoline(), forget it because it I believe it is
only necessary for the unsupported SMP case).


44215 22-Feb-1999 bde

Added a per-cpu variable `switchticks' for use in scheduling.


44188 21-Feb-1999 n_hibma

Rename hid device to uhid (HID: Human Interface Device)


44175 20-Feb-1999 n_hibma

Removed uhub from list. Mandatory with usb device and this was already
forced in conf/files. Unneccessary entry.


44170 20-Feb-1999 obrien

Really make the "Rename nlpt to lpt." purported to have been made in
rev 1.149.


44168 20-Feb-1999 roberto

Bit 24 of the Feature Flag is FXSR (for Fast FP Save and Restore).

Reminded by: Francis Dupont <Francis.Dupont@inria.fr>


44157 19-Feb-1999 luoqi

Introduce machine-dependent macro pgtok() to convert page count to number
of kilobytes. Its definition for each architecture could be optimized to
avoid potential numerical overflows.


44146 19-Feb-1999 luoqi

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


44131 19-Feb-1999 jdp

On the i386, load the ELF dynamic linker where an mmap(0, ...) would
put it, just like on the Alpha. It was wrong to load it at the
fixed address 0x08000000. That should only be done if the dynamic
linker is an executable (not a shared object) with a specific load
address encoded in the object file itself.

This fixes the recent breakage in the Linux emulator.


44108 18-Feb-1999 wollman

Add a little bit more identifying information to the myriad PCI network
drivers.


44078 16-Feb-1999 dfr

* Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)


44014 14-Feb-1999 jkh

MF3: add SYSVMSG


43989 14-Feb-1999 nsouch

Rename nlpt to lpt.

Remove from ppi.c the old depreciated module stuff.
Print info when if_plip can't use interrupts.


43970 13-Feb-1999 bde

Don't pass PSL_NT to vm86 signal handlers. Some vm86/real mode
programs, including msdos, set PSL_NT in probes for old cpu types,
although PSL_NT doesn't do anything useful in vm86 or real mode.
PSL_NT is even less useful in the signal handlers. It just causes
T_TSSFLT faults on return from syscalls made by the handlers.
These faults are fixed up lazily so that Xsyscall() doesn't have
to be slowed down to prevent them. The fault handler recently
started complaining about these faults occurring "with interrupts
disabled". It should not have, but the complaints pointed to this
bug.

PR: 9211


43887 11-Feb-1999 msmith

Zero p->retval[1] when starting a process. This value ends up in %edx
when the process starts, and having it nonzero causes statically-linked
Linux binaries to fail.

PR: i386/10015
Submitted by: Marcel Moolenaar <marcel@scc.nl>


43869 11-Feb-1999 jkoshy

Fix typos


43824 10-Feb-1999 des

Use ppbus instead of the lpt driver. Throw in a (commented-out) vpo entry
for good measure.


43758 08-Feb-1999 dillon

Adjust idle zero-page fill hysteresis based on tests. Use 2/3 and 4/5
zero-fill levels.

Adjust comment for ozfod in vmmeter.h - this counter represents
non-optimal ( on the fly ) zero fills, not prefills.


43752 08-Feb-1999 dillon

Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with
PQ_FREE. There is little operational difference other then the kernel
being a few kilobytes smaller and the code being more readable.

* vm_page_select_free() has been *greatly* simplified.
* The PQ_ZERO page queue and supporting structures have been removed
* vm_page_zero_idle() revamped (see below)

PG_ZERO setting and clearing has been migrated from vm_page_alloc()
to vm_page_free[_zero]() and will eventually be guarenteed to remain
tracked throughout a page's life ( if it isn't already ).

When a page is freed, PG_ZERO pages are appended to the appropriate
tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended.
When locating a new free page, PG_ZERO selection operates from within
vm_page_list_find() ( get page from end of queue instead of beginning
of queue ) and then only occurs in the nominal critical path case. If
the nominal case misses, both normal and zero-page allocation devolves
into the same _vm_page_list_find() select code without any specific
zero-page optimizations.

Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been
added and zero-page tracking adjusted to conform with the other changes.
Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free
pages. We may wish to increase both parameters as time permits. The
hysteresis is designed to avoid silly zeroing in borderline allocation/free
situations.


43750 07-Feb-1999 jdp

Change the load address of the ELF dynamic linker from "2L*MAXDSIZ"
to an architecture-specific value defined in <machine/elf.h>. This
solves problems on large-memory systems that have a high value for
MAXDSIZ.

The load address is controlled by a new macro ELF_RTLD_ADDR(vmspace).
On the i386 it is hard-wired to 0x08000000, which is the standard
SVR4 location for the dynamic linker.

On the Alpha, the dynamic linker is loaded MAXDSIZ bytes beyond
the start of the program's data segment. This is the same place
a userland mmap(0, ...) call would put it, so it ends up just below
all the shared libraries. The rationale behind the calculation is
that it allows room for the data segment to grow to its maximum
possible size.

These changes have been tested on the i386 for several months
without problems. They have been tested on the Alpha as well,
though not for nearly as long. I would like to merge the changes
into 3.1 within a week if no problems have surfaced as a result of
them.


43622 04-Feb-1999 adam

replace previous stupid comment with one more appropriate
where it will be easily found


43617 04-Feb-1999 adam

remind that apm is required in order for timekeeping to work


43612 04-Feb-1999 kato

Recognize Pentium II Xeon, Celeron and Pentium III cpus. Because CPU
names are printed on their packages and shown by BIOS, kernel does not
need to show details.

PR: 8751, 9320 and 9463


43564 03-Feb-1999 dg

Fixed the type of target_page to vm_offset_t (unsigned). This fixes a
panic during boot on machines with >=2GB of RAM. Also changed some
incorrect printf conversion specifiers from %d to %u (signed to unsigned).
This fixes bugs when printing the amount of memory on machines with >=2GB
of RAM.


43530 02-Feb-1999 bde

Check for signals while reading /dev/urandom. Reading 10MB from
/dev/urandom takes about 38 seconds on a P5/133. It is useful
to be able to kill such reads almost immediately. Processes
doing such reads are now scheduled so their denial of service
is no worse than that of processes looping in user mode.


43524 02-Feb-1999 bde

Added a hopefully-machine-independent macro for determining if a
reschedule is pending.


43447 31-Jan-1999 kato

Use offset to _pc98_system_parameter instead of immediate value which
assumes KERNBASE=0x100000.


43434 30-Jan-1999 kato

Moved pc98_system_parameter from .text to .data to make ELF kernel
work.


43403 29-Jan-1999 dillon

More const fixes for -Wall, -Wcast-qual


43387 29-Jan-1999 dillon

More -Wall / -Wcast-qual cleanup. Also, EXEC_SET can't use
C_DECLARE_MODULE due to the linker_file_sysinit() function
making modifications to the data.


43340 28-Jan-1999 newton

Sun Bug ID 1251858 (on http://sunsolve1.sun.com) discusses the way that
Sun implemented iBCS2 compatibility on Solaris >= 2.6: The emulator
runs in user-mode, patching the LDT so that client programs making
syscalls through the old iBCS2 call gate get handled by the emulator
process. Unemulated syscalls therefore need their own call-gate that
bypasses the emulator. Sun chose LDT entry 4 to implement this, which
is what we've been using as LUDATA_SEL, so we need to change LUDATA_SEL
if we want to run Solaris executables.

Discussed with: Mike Smith


43314 28-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


43309 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile.

This commit includes significant work to proper handle const arguments
for the DDB symbol routines.


43286 27-Jan-1999 eivind

Add ISA PnP support, now that we have the space for it.


43138 24-Jan-1999 dillon

Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

The inline can thus do sanity checks ( or not ) over all cases.


42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


42880 20-Jan-1999 jkh

Make more messages conditional on bootverbose


42817 19-Jan-1999 peter

Break configure() into a couple of stages to allow insertion of
hooks (eg: by drivers or (pre)loadable modules into a convenient spot.


42732 16-Jan-1999 kato

There are two models of AMD K6-2 Model 8 (c.f. AMD's document), so the
CPU stepping must be checked. Also, fixed print_AMD_info.

Submitted by: Akio Morita <amorita@meadow.scphys.kyoto-u.ac.jp>


42705 15-Jan-1999 msmith

Fetch an overide for NMBCLUSTERS from the kernel environment. Never allow
the value to be reduced below that defined when the kernel was built.


42543 12-Jan-1999 eivind

Silence warnings.


42542 12-Jan-1999 eivind

Silence warnings by removing unused convenience function and
globalizing debugging functions.


42504 11-Jan-1999 yokota

The first stage of console driver reorganization: activate new
keyboard and video card drivers.

Because of the changes, you are required to update your kernel
configuration file now!

The files in sys/dev/syscons are still i386-specific (but less so than
before), and won't compile for alpha and PC98 yet.

syscons still directly accesses the video card registers here and
there; this will be rectified in the later stages.


42448 09-Jan-1999 dt

Oops --<, replace 1.216 with a version that actually check pv_entries (and
was tested for month or two in production).

Noticed by: Stephen McKay

Stephen also suggested to remove the complication at all. I don't do it as
it would be backout of a large part of 1.190 (from 1998/03/16)...


42444 09-Jan-1999 wpaul

Add driver support (and man page) for PCI fast ethernet cards based
on the ASIX AX88140A chip. Update /sys/conf/files, RELNOTES.TXT,
/sys/i388/i386/userconfig.c, sysinstall/devices.c, GENERIC and LINT
accordingly.

For now, the only board that I know of that uses this chip is the
Alfa Inc. GFC2204. (Its predecessor, the GFC2202, was a DEC tulip card.)
Thanks again to Ulf for obtaining the board for me. If anyone runs
across another, please feel free to update the man page and/or the
release notes. (The same applies for the other drivers.)

FreeBSD should now have support for all of the DEC tulip workalike
chipsets currently on the market (Macronix, Lite-On, Winbond, ASIX).
And unless I'm mistaken, it should also have support for all PCI fast
ethernet chipsets in general (except maybe the SMC FEAST chip, which
nobody seems to ever use, including SMC). Now if only we could convince
3Com, Intel or whoever to cough up some documentation for gigabit
ethernet hardware.

Also updated RELNOTEX.TXT to mention that the SVEC PN102TX is supported
by the Macronix driver (assuming you actually have an SVEC PN102TX with
a Macronix chip on it; I tried to order a PN102TX once and got a box
labeled 'Hawking Technology PN102TX' that had a VIA Rhine board inside
it).


42440 09-Jan-1999 bde

Removed a stray label that broke compiling in the (elf && profiling) case.

PR: 9369
Submitted by: Assar Westerlund <assar@sics.se>


42437 09-Jan-1999 bde

Fixed switching between consoles (sc0, vt0 or sioN) in userconfig.

Broken in: rev.1.315


42428 09-Jan-1999 bde

Don't put operands in clobber lists, since this is dubious for old
versions of gcc and broken for current versions of egcs.

Cleaned up the asm statement for do_cpuid() a little.

Submitted by: "John S. Dyson" <dyson@iquest.net> but rewritten by me


42427 09-Jan-1999 bde

Don't put operands in clobber lists, since this is dubious for old
versions of gcc and broken for current versions of egcs.

Submitted by: "John S. Dyson" <dyson@iquest.net> but rewritten by me


42411 08-Jan-1999 bde

Fixed some style bugs. Clarified a comment.


42410 08-Jan-1999 bde

Unspammed includes in <machine/cpufunc.h> in the !SMP case. Partially
unspammed them in the SMP case.


42406 08-Jan-1999 bde

Moved declarations related to copying and zeroing to the right place.


42396 08-Jan-1999 luoqi

Allocate kernel page table object (kptobj) before any kmem_alloc calls.
On a system with a large amount of ram (e.g. 2G), allocation of per-page
data structures (512K physical pages) could easily bust the initial kernel
page table (36M), and growth of kernel page table requires kptobj.


42381 07-Jan-1999 dt

Make pmap_ts_referenced check more than 1 pv_entry. (One should be carefull
when move elements to the tail of a list in a loop...)


42373 07-Jan-1999 yokota

Remove a hard-coded table of kernel console I/O functions exported
from sc, vt and sio drivers. Use instead a linker_set to collect them.

Staticize ??cngetc(), ??cnputc(), etc functions in sc and vt drivers.
We must still have siocngetc() and siocnputc() as globals because they
are directly referred to by i386-gdbstub.c :-(

Oked by: bde


42360 06-Jan-1999 julian

Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>


42332 06-Jan-1999 yokota

Move IO_PSMSIZE from kbdio.h to isa.h. I thought I did this a long time
ago...

While I am here, correct the values for IO_MDASIZE and IO_CGASIZE; they
should be 12 rather than 16.


42218 01-Jan-1999 peter

Part 1 of pcvt/voxware revival. I hope I have not clobbered any other
deltas, but it is possible since I had a few merge conflicts over the last
few days while this has been sitting ready to go.

Approved by: core


42135 28-Dec-1998 msmith

Improved DDB_UNATTENDED behaviour. From the submitter:

There's something that's been bugging me for a while, so I decided to fix it.
FreeBSD now will DTRT WRT DDB and DDB_UNATTENDED (!debugger_on_panic), at least
in my opinion. The behavior change is such that:

1. Nothing changes when debugger_on_panic != 0.
2. When DDB_UNATTENDED (!debugger_on_panic), if a panic occurs, the
machine will reboot. Also, if a trap occurs, the machine will
panic and reboot, unlike how it broke to DDB before. HOWEVER,
a trap inside DDB will not cause a panic, allowing full use
of DDB without having to worry about the machine being stuck
at a DDB prompt if something goes wrong during the day.
Patches for this behavior follow my signature, and it would
be a boon to anyone (like me) who uses DDB_UNATTENDED, but
actually wants the machine to panic on a trap (otherwise,
what's the use, if the machine causes a fatal trap rather than
a true panic, of debugger_on_panic?). The changes cause no
adverse behavior, but do involve two symbols becoming global

Submitted by: Brian Feldman <green@unixhelp.org>


42112 27-Dec-1998 msmith

From the submitter:

CPU_WT_ALLOC does not work correctly for K6-2s of model 8+ and
probably K6-3s (when they appear on the market soon). In addition,
print_AMD_info() incorrectly printfs write allocation's size. I've
fixed them, so they now Do The Right Thing, and added a
"NO_MEMORY_HOLE" option to easily allow 15-16mb range handling for us
K6 and K6-2 users.

Submitted by: Brian Feldman <green@unixhelp.org>


42084 27-Dec-1998 sos

Pre 3.0 branch cleanup sos#1: wcd

Superceded by acd driver...


42083 27-Dec-1998 phk

Pre 3.0 branch cleanup casualty #6: ft


42080 27-Dec-1998 phk

Add commented out SMP stuff in GENERIC, remove stale configs.


42079 27-Dec-1998 phk

Pre 3.0 branch cleanup casualty #5: nca, sea, wds, uha

No CAM drivers available. If somebody CAMifies one of these, they
will be welcome back in the tree


42078 27-Dec-1998 phk

Pre 3.0 branch cleanup casualty #4: pcvt


41871 16-Dec-1998 bde

Removed the cast to a pointer in the definition of PS_STRINGS and
adjusted related casts to match (only in the kernel in this commit).
The pointer was only wanted in one place in kern_exec.c. Applications
should use the kern.ps_strings sysctl instead of PS_STRINGS, so they
shouldn't notice this change.


41868 16-Dec-1998 bde

Removed bogus casts of USRSTACK and/or the other operand in binary
expressions involving USRSTACK.


41797 14-Dec-1998 bde

Moved the declaration of another non-SMP variable into the non-SMP section.


41794 14-Dec-1998 bde

Ifdefed the declarations of conditionally used variables.


41787 14-Dec-1998 mckay

Fix tabs that should have been spaces. Some were in kernel error messages.


41770 14-Dec-1998 dillon

Get rid of uninitialized variable warnings. No bugs found, just
preinitializing some locals to 0 to get rid of the compiler warnings.


41764 14-Dec-1998 dillon

author was assuming that nextpaddr declared *inside* the do loop would
survive within the loop. This is not guarenteed by C. I have moved
the nextpaddr declaration to outside the do loop.


41763 14-Dec-1998 dillon

Change local ddb_mode variable to volatile to handle GCC warning about
the variable possibly being clobbered by setjmp/longjmp.


41739 13-Dec-1998 n_hibma

dded the stubs for umodem and ucom (communications class driver). They are nothing other than
the ugen driver with different variable names.


41629 10-Dec-1998 steve

Cleanup up the wording for the F00F bug workaround message.

PR: 8041
Submitted by: Dan Nelson <dnelson@emsphone.com>


41624 09-Dec-1998 n_hibma

Preliminary support for OHCI motherboards


41599 08-Dec-1998 kato

Use CNAME macro for pc98_system_parameter, which is referenced from C
source.

Submitted by: Masanori Kanaoka <kana@saijo.mke.mei.co.jp>


41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


41547 06-Dec-1998 archie

Avoid compiler warning (printf arg type mismatch) when compiling #ifdef DEBUG


41541 05-Dec-1998 kato

Print out information for write-allocate of AMD CPUs.

Submitted by: Akio Morita <amorita@meadow.scphys.kyoto-u.ac.jp>


41514 04-Dec-1998 archie

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


41502 04-Dec-1998 wpaul

An early Christmas present: add driver support for a whole bunch of
PCI fast ethernet adapters, plus man pages.

if_pn.c: Netgear FA310TX model D1, LinkSys LNE100TX, Matrox FastNIC 10/100,
various other PNIC devices

if_mx.c: NDC Communications SOHOware SFA100 (Macronix 98713A), various
other boards based on the Macronix 98713, 98713A, 98715, 98715A
and 98725 chips

if_vr.c: D-Link DFE530-TX, other boards based on the VIA Rhine and
Rhine II chips (note: the D-Link and certain other cards
that actually use a Rhine II chip still return the PCI
device ID of the Rhine I. I don't know why, and it doesn't
really matter since the driver treats both chips the same
anyway.)

if_wb.c: Trendware TE100-PCIE and various other cards based on the
Winbond W89C840F chip (the Trendware card is identical to
the sample boards Winbond sent me, so who knows how many
clones there are running around)

All drivers include support for ifmedia, BPF and hardware multicast
filtering.

Also updated GENERIC, LINT, RELNOTES.TXT, userconfig and
sysinstall device list.

I also have a driver for the ASIX AX88140A in the works.


41454 02-Dec-1998 kato

- For some old Cyrix CPUs, %cr2 is clobbered by interrupts. This
problem is worked around by using an interrupt gate for the page
fault handler. This code was originally made for NetBSD/pc98 by
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp> and has already
been in PC98 tree. Because of this bug, trap_fatal cannot show
correct page fault address if %cr2 is obtained in this function.
Therefore, trap_fatal uses the value from trap() function.
- The trap handler always enables interruption when buggy application
or kernel code has disabled interrupts and then trapped. This code
was prepared by Bruce Evans <bde@FreeBSD.org>.

Submitted by: Bruce Evans <bde@FreeBSD.org>
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp>


41414 29-Nov-1998 phk

don't print '?' for ioaddr the device may legitimately not have an
ioaddr.


41370 27-Nov-1998 tegge

Don't forget to update the pmap associated with aio daemons when adding
new page directory entries for a growing kernel virtual address space.


41367 26-Nov-1998 tegge

Attempt to handle interrupts delivered to all IO APICs by using the first
IO APIC with a sufficient number of pins.


41366 26-Nov-1998 n_hibma

Initial commit of ported NetBSD USB stack


41362 26-Nov-1998 eivind

Staticize.


41318 24-Nov-1998 eivind

Move the declaration of PPro_vmtrr from the header file to pmap.c,
replacing the one in the header file with a definition. This makes it
easier to work with tools that grok ANSI C only.


41111 12-Nov-1998 obrien

Remove `amd', `nca' SCSI devices to match Mike's LINT commit.


41004 08-Nov-1998 dfr

* Fix a couple of places in the device pager where an address was
truncated to 32 bits.
* Change the calling convention of the device mmap entry point to
pass a vm_offset_t instead of an int for the offset allowing
devices with a larger memory map than (1<<32) to be supported
on the alpha (/dev/mem is one such).

These changes are required to allow the X server to mmap the various
I/O regions used for device port and memory access on the alpha.


40998 08-Nov-1998 msmith

Enable 686 class optimisations for all 686-class processors, not just the
Pentium Pro. This resolves the "Dog slow SMP" issue for Pentium II
systems.


40870 03-Nov-1998 des

Back out previous commit. The bpfilter -> bpf transition will have to be a
flag day unless we can hack config(8) to smooth things over.


40869 03-Nov-1998 des

Rename the 'bpfilter' pseudo-device to 'bpf'. The old syntax is still legal
and will stick around for a while.


40866 03-Nov-1998 msmith

Remove the USERCONFIG_BOOT option. Userconfig script data is searched
for in a loaded module of type "userconfig_script". The RB_CONFIG
flag will always result in the user being left inside userconfig at
the end of the script's execution, regardless of 'quit' commands in
the script. If the RB_CONFIG flag is not specified, the user will
never be left inside userconfig, even if the script does not have an
explicit exit command.

Add the INTRO_USERCONFIG option. This option forces the userconfig 'intro'
screen (after a script has optionally been executed). There is no longer
a need to queue an 'intro' command.


40794 31-Oct-1998 peter

Add John Dyson's SYSCTL descriptions, and an export of more stats to
a sysctl hierarchy (vm.stats.*). SYSCTL descriptions are only present
in source, they do not get compiled into the binaries taking up memory.


40751 30-Oct-1998 msmith

Add the ability to specify where on the at_shutdown queue a handler is
installed.

Remove cpu_power_down, and replace it with an entry at the end of the
SHUTDOWN_FINAL queue in the only place it's used (APM).

Submitted by: Some ideas from Bruce Walter <walter@fortean.com>


40713 29-Oct-1998 wollman

A small fragment of new ISA framework: manifest constants for the resources
implemented by the i386 root nexus.


40700 28-Oct-1998 dg

Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.


40658 26-Oct-1998 bde

Check the major number of the boot device more carefully. There was only
a problem if the boot blocks passed bad data.

Check the major number of the dump device consistently.


40610 23-Oct-1998 phk

Update timecounters to new interface.


40577 22-Oct-1998 bde

Quote port names that have a digit in them. IO_TIMER1 was lexed as
{ port_name = "IO_TIMER", port_number = 1 } and only worked because
it was reassembled to "IO_TIMER1". Trailing digits always work, but
this is too magic to depend on.

Don't quote port names that don't have a digit in them.


40574 22-Oct-1998 bde

Removed all `vector xxxintr' specifications. Interrupt handlers are now
configured in drivers.


40565 22-Oct-1998 bde

Initialize isa_devtab entries for interrupt handlers in individual
device drivers, not in ioconf.c. Use a different hack in isa_device.h
so that a new config(8) is not required yet.

pc98 parts approved by: kato


40545 21-Oct-1998 dg

Decrement the now unused page table page's wire_count prior to freeing it.
It will soon be required that pages have a zero wire_count when being
freed.


40516 18-Oct-1998 wpaul

Add driver support for PCI fast ethernet adapters based on the
RealTek 8129/8139 chipset like I've been threatening. Update kernel
configs, userconfig.c, relnotes and sysinstall. No man page yet;
comming soon.

I consider this driver stable enough that I want to give it some
exposure in -current.


40513 18-Oct-1998 peter

Add an ELF_MACHINE_OK() macro for compatability with the Alpha version.


40435 16-Oct-1998 peter

*gulp*. Jordan specifically OK'ed this..

This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.


40425 16-Oct-1998 obrien

Add commented out bpf entry. (DHCP is popular here, and this is required).

Ok'ed by: jkh


40286 13-Oct-1998 dg

Fixed two potentially serious classes of bugs:

1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.


40259 12-Oct-1998 bde

Don't print conflict messages in haveseen_isadev() if CC_QUIET is
specified. This makes haveseen_isadev() useful for searching for a
free resource. This increases the bitrot in the pci RESOURCE_CHECK
code.

Fixed the pre-attach conflict message. The flag for distinguishing
pre-attach conflict checks from pre-probe ones was never set.


40179 10-Oct-1998 kato

mp_machdep.c: Set a vector to boot code (PC-98).
locore.s: Tell the bios to warmboot next time (PC-98).


40173 10-Oct-1998 kato

PC-98 doesn't have CMOS ram.


40169 10-Oct-1998 kato

PC-98 doesn't have CMOS ram.


40165 10-Oct-1998 jkh

Add entries for MFS which are consistent with the others, now that
Peter has made this more selectable.


40164 10-Oct-1998 jkh

Allow more flexible use of MFS root.
Submitted by: peter


40152 09-Oct-1998 peter

Relocate the preload module info from machdep specifically rather than
trying to do it in locore. We also walk through the module table
and relocate any MODINFO_ADDR pointers so that they become KVM relative
rather than physical addresses. This means that hacks for adding
0xf0000000 in places like MFS go away.


40130 09-Oct-1998 peter

Null commit.. CVS aborted on freefall last time (reaonly file).

An elf_reloc() function for the i386. Based on alpha/alpha/elf_machdep.c
and rtld-elf/i386/reloc.c.


40129 09-Oct-1998 peter

An elf_reloc() function for the i386. Based on alpha/alpha/elf_machdep.c
and rtld-elf/i386/reloc.c.


40089 09-Oct-1998 msmith

Initialise kernel environment and module metadata pointers.


40081 08-Oct-1998 msmith

Fix up the kernel environment and module data pointers in the bootinfo if
they are present.
If we are told where the end of the loaded kernel image is, believe it.


40067 08-Oct-1998 kato

BIOS ROM base address is 0xe8000 on PC-98.


40037 07-Oct-1998 obrien

Fix syntax errors I introduced.


40031 07-Oct-1998 gibbs

Add entries for the adw device driver.


40029 07-Oct-1998 gibbs

Fix a parent tag reference count bug during tag teardown.

Enable optimization for nobounce_dmamap clients by setting the map
held by the client to NULL. This allows the macros in bus.h to check
against a constant to avoid function calls.

Don't attempt to 'free()' contigmalloced pages in bus_dmamem_free().
We currently leak these pages, which is not ideal, but is better than
a panic. The leak will be fixed when contigmalloc is merged into the
bus dma framework after 3.0R.


40003 06-Oct-1998 kato

- Implement enabling write allocate on AMD K5/K6/K6-2 cpus.
The code was originaly contributed by Kelly Yancey
<kbyanc@freedomnet.com> in PR i386/6269 and revised by Akio Morita
<amorita@meadow.scphys.kyoto-u.ac.jp> and me. Test was performed by
Akio Morita and Toshiomi Moriki <moriki@db.is.kyushu-u.ac.jp>.
- Fix stylistic bug in identcpu.c.
- Update copyright in initcpu.c
- Fix typo in LINT.

PR: 6269 and 6270


39982 05-Oct-1998 obrien

Undo most of the previous commit.


39976 05-Oct-1998 obrien

Now require *FS_ROOT to enable the ability to mount a *FS /.
Previously one could config(8) a kernel that would not link.


39969 05-Oct-1998 obrien

Document that ``options xFS_ROOT'' requires the associated ``options xFS''.
Reordered xFS_ROOT's to be below the associated xFS.


39926 03-Oct-1998 jkh

Add dpt driver back to GENERIC and adjust a stale comment.


39869 01-Oct-1998 msmith

Remove lpt1 - we have userconfig if you have a weird port.
Remove mse0 - the Microsoft Bus Mouse is a dinosaur. There are probably
more Pintos on the road than these on peoples' desks.


39760 29-Sep-1998 abial

Add sysctl 'machdep.msgbuf_clear'. Setting it to anything causes the
kernel message buffer to be cleared. It comes handy in situations when
the only logging facility you have is the msgbuf.

Reviewed by: jkh


39755 29-Sep-1998 bde

Don't pretend to support ix86's with 16-bit ints by using longs just
to ensure 32-bit variables. Doing so broke ix86's with 64-bit longs.


39703 28-Sep-1998 tegge

Initialize pcb_mpnest to 1 in the child process in cpu_fork(). This should
fix the 50% idle problem that the ELF /sbin/init triggered. The problem
appeared when the last context switch before a fork() call was due to
the kernel faulting in user pages via normal page faults (e.g. copyin).
Reviewed by: Peter Wemm <peter@netplex.com.au>


39702 28-Sep-1998 tegge

Use correct virtual address when configuring the per CPU idle page directory
for a vm86 call under SMP.


39648 25-Sep-1998 peter

Goodbye BOUNCE_BUFFERS, for a hack it has served us well.

The last consumer of this code (the old SCSI system) has left us and
the CAM code does it's own bouncing. The isa dma system has been
doing it's own bouncing for a while too.

Reviewed by: core


39613 24-Sep-1998 bde

Don't redefine kernel. Makefile.i386 now defines it.
Removed some unused includes.


39526 20-Sep-1998 bde

Attempt to work around a bug in the previous commit related to
non-reentrancy of SMP clock locking. Depend on the giant lock
protecting clkintr().


39503 20-Sep-1998 bde

Ensure that the i8254 timecounter doesn't go backards. It sometimes
went backwards when interrupts were masked for more than one i8254
interrupt period. It sometimes went backwards when the i8254 counter
was reprogrammed. Neither of these should happen in normal operation.

Update the i8254 timecounter support variables atomically. Calling
timecounter functions from fast interrupt handlers may actually work
in all cases now.


39445 18-Sep-1998 mjacob

(requested by gibbs) Remove the SCSI_CAM option (and rework the isp driver
that had depended on it for compilation within or without CAM to use
__FreeBSD_version instead).


39400 17-Sep-1998 msmith

Mark the syscons and pcvt drivers as being allowed to conflict, so that
well-meaning but uneducated users don't exterminate the psm driver in
their zeal to achieve zero conflicts.


39243 15-Sep-1998 gibbs

autoconf.c:
Convert autoconf hooks from old SCSI system to CAM.

busdma_machdep.c:
bus_dmamap_free() should expect the nobounce map, not a NULL one.

mountroot.c:
swapgeneric.c:
da and od changes.

symbols.raw:
Nuke the old disk stat symbols.

userconfig.c:
Disable the SCSI listing code until it can be converted to CAM.


39242 15-Sep-1998 gibbs

sd->da, od is gone, no SCSI control devices.
new pass, xpt, and targ devices.

Nuke no longer used AHC options.


39197 14-Sep-1998 jdp

Add new functions fill_fpregs() and set_fpregs(), like fill_regs()
and set_regs() but for the floating point register state. The code
is stolen from procfs_machdep.c, and moved out of there into
machdep.c.

These functions are needed for generating ELF core dumps.


39189 14-Sep-1998 jdp

Add generic defines ELF_ARCH, ELF_CLASS, and ELF_DATA. These give
the relevant characteristics of the native machine, for building
and checking Elf_Ehdr structures.

Add structures to represent ELF "note" headers. Add defines for the
note types used in ELF core files.


39187 14-Sep-1998 sos

Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.


39176 14-Sep-1998 abial

This implements retrieving the contents of message buffer via sysctl(3)
as "machdep.msgbuf". It's needed in case of using stripped kernels, where
normal dmesg (which has to use kvm) doesn't work.

The buffer is unwound, meaning that the data will be linear, possibly
with some leading NULLs.

Reviewed by: Jordan K. Hubbard <jkh@freebsd.org>


38928 07-Sep-1998 jdp

Make profiling work for ELF. gprof now autodetects the format of
the executable file, so it will work for both a.out and ELF format
files. I have split the object format specific code into separate
source files. It's cleaner than it was before, but it's still
pretty crufty.

Don't cheat on your make world for this update. A lot of things
have to be rebuilt for it to work, including the compiler and all
of the profiled libraries.


38893 06-Sep-1998 tegge

Don't go below the low water mark of free pages due to optional prefaulting
of pages.
PR: 2431


38888 06-Sep-1998 tegge

Maintain a mapping from irq number to (ioapic number, int pin) tuple,
and use this when masking/unmasking interrupts.

Maintain a mapping from (iopaic number, int pin) tuple to irq number,
and use this when configuring devices and programming the ioapics.

Previous code assumed that irq number was equal to int pin number, and
that the ioapic number was 0.

Don't let an AP enter _cpu_switch before all local apics are initialized.


38824 04-Sep-1998 luoqi

Make irq forwarding truely functional.


38819 04-Sep-1998 msmith

Increase 'maxusers' to 32; with the number of people using GENERIC as
their one-size-fits-all kernel, this should help reduce the "out of foo"
reports.

Reviewed by: jkh


38807 04-Sep-1998 ache

PAGE_WAKEUP -> vm_page_wakeup


38779 03-Sep-1998 nsouch

Reviewed by: Doug Rabson
Submitted by: nsouch
root_bus_configure() call to initialize new bus arch in i386 env.


38717 01-Sep-1998 kato

- Fix style bug.
- hw.ispc98 -> machdep.ispc98.

Submitted by: Garrett Wollman (hw -> machdep)


38700 31-Aug-1998 luoqi

Use 16bit register in inline asm code to set segment registers.


38673 31-Aug-1998 kato

- hw.machine_arch returns cpu architecture type.
- moved definition of MACHINE_ARCH from cpu.h to parm.h as alpha.
- Added definitions of _MACHINE and _MACHINE_ARCH.
- Added hw.ispc98. The hw.ispc98 is 1 in PC98 kernel and is 0 in
IBM-PC kernel.

Discussed with: John Birrell <jb@FreeBSD.ORG>


38517 24-Aug-1998 dfr

Change various syscalls to use size_t arguments instead of u_int.

Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>


38505 24-Aug-1998 bde

Fixed printf format errors. Only one left in LINT on i386's.


38490 23-Aug-1998 des

Don't check minor number of dump device at all.

Discussed-with: Jörg Wunsch


38488 23-Aug-1998 bde

Fixed printf format errors.


38422 18-Aug-1998 msmith

Presently there is only one `currentldt' variable for all cpus
in a SMP system. Unexpected things could happen if each cpu
has a different ldt setting and one cpu tries to use value
of currentldt set by another cpu.

The fix is to move currentldt to the per-cpu area. It includes
patches I filed in PR i386/6219 which are also user ldt related.

PR: i386/7591, i386/6219
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>


38400 17-Aug-1998 bde

FIxed typo (syntax error) in previous commit.


38392 17-Aug-1998 dfr

Add macros for accessing device memory.


38363 16-Aug-1998 wpaul

Import the (Fast) Etherlink XL driver. I'm reasonally confident in its
stability now. ALso modify /sys/conf/files, /sys/i386/conf/GENERIC
and /sys/i386/conf/LINT to add entries for the XL driver. Deactivate
support for the XL adapters in the vortex driver. LAstly, add a man
page.

(Also added an MLINKS entry for the ThunderLAN man page which I forgot
previously.)


38357 16-Aug-1998 jdp

Revamp the ELF include files to better support architecture-independent
applications. Here's how it works.

The kernel should include <machine/elf.h> to get the definitions
for the native architecture. It can reference the various ELF
structures with generic names like Elf_Sym, Elf_Shdr, etc. A define
__ELF_WORD_SIZE is also available with the value 32 or 64 as
appropriate for the native architecture.

Generic applications should include <elf.h>, which is just a wrapper
for <machine/elf.h>.

Applications such as object file dumpers that need to deal with
foreign ELF files can include <sys/elf32.h> and/or <sys/elf64.h>.
Both can be included from the same source file if desired. The
structure names must be referenced using wordsize-specific names
like Elf32_Sym, Elf64_Shdr, etc.

I haven't change the alpha stuff, but I haven't broken it either.


38349 16-Aug-1998 bde

pmap.c:
Cast pointers to (vm_offset_t) instead of to (u_long) (as before) or to
(uintptr_t)(void *) (as would be more correct). Don't cast vm_offset_t's
to (u_long) just to do arithmetic on them.

mp_machdep.c:
Cast pointers to (uintptr_t) instead of to (u_long). Don't forget
to cast pointers to (void *) first or to recover from integral
possible integral promotions, although this is too much work for
machine-dependent code.

vm code generally avoids warnings for pointer vs long size mismatches
by using vm_offset_t to represent pointers; pmap.c often uses plain
`unsigned int' instead of vm_offset_t and didn't use u_long elsewhere,
but this style was messed up by code apparently imported from mp_machdep.c.


38246 11-Aug-1998 bde

Register tty software interrupt handlers at run time using register_swi()
instead of at compile time using ifdefs.

Use _swi_null instead of dummycamisr. CAM and dpt should call
register_swi() instead of hacking on ihandlers[] directly.


38244 11-Aug-1998 bde

Implemented dynamic registration of software interrupt handlers. Not
used yet.

Use dummy SWI handlers to avoid some checks for null pointers.


38233 10-Aug-1998 bde

Fixed restoring of cpl after trap handling. The wrong cpl (SWI_AST_MASK
instead of 0) was "restored" after handling a trap that occurred while
returning to user mode. This bug was most noticeable for VM86 and is
still detected and fixed up (on return from the next exception) in doreti
if VM86 is configured.


38063 03-Aug-1998 msmith

Copy in the nfs_diskless structure if NFS_ROOT is defined. A previous
change to include nfs_root.h precluded NFS from being defined.
Submitted by: Parag Patel <parag@cgt.com>


37920 28-Jul-1998 bde

Set p->p_switchtime to switchtime instead of to the current time in
fork_trampoline() if switchtime is valid. This fixes not accounting
for the time between the previous context switch and and the current
time (when the forked child starts up here) in most cases - the time
is now counted in the child's runtime. I think it actually fixes
all cases, and switchtime is always valid here, since there must have
been a context switch just before the forked child starts up. Some
code should be removed if this is correct. The check that switchtime
is valid sometimes gives a false negative because the check isn't
correct until the after the first context switch after the system
has been up for >= 1 second.


37919 28-Jul-1998 bde

Micro-optimized and cleaned up the clearing of switchtime in idle().

Cleaned up the conditionals in the disgusting SMP ifdef in idle().


37917 28-Jul-1998 jlemon

u_int --> unsigned int, remove (now unneeded) <sys/types.h>


37903 28-Jul-1998 jlemon

Add wrappers for i386_*_ioperm, i386_vm86 so userland code does
not have to call sysarch() directly.
Added man pages for above, as well as sysarch()


37902 28-Jul-1998 jlemon

Fix an off-by-one error when setting the iomap bits.
Change struct i386_*_iomap to use ints instead of shorts/chars.
(pointed out by bde long ago, prodded into action by msmith)


37889 27-Jul-1998 jlemon

Re-arrange the page layout used by vm86_bioscall so that we can
potentially re-use the stack page.

Cosmetic cleanup of the code to de-obfuscate it and make it easier
to follow. There should be no functional changes in this commit.


37785 20-Jul-1998 msmith

Add the 'cs' driver for Crystal Semiconductor CS89x0 devices. This
supports PnP and if_media. I've been running a slightly older version
here for several weeks now.
Submitted by: Maxim Bolotin <max@rsu.ru>


37757 19-Jul-1998 jkh

A slap on the wrist to Dag-Erling, who plainly did not test this before
committing it. There was a large syntax error at line 404 which could
not possibly have allowed compilation. :)


37748 19-Jul-1998 bde

Stop physical DMA for the non-auto case in isa_dmadone(). This fixes a
small part of a bug suite beginning in the SLICE probes but mostly in the
floppy driver. This is a quick fix: the auto case shouldn't be special;
DMA should also be stopped in isa_dma_release(); isa_dmastop() probably
shouldn't exist; common DMA registers should not be accessed without
locking.


37743 18-Jul-1998 des

Allow dump devices with dkpart != SWAP_PART on devfs/slice
systems. This test should probably be removed altogether.

See CVS log entries for revisions 1.97 and 1.98.


37723 17-Jul-1998 joerg

Place a fat warning that floppy tapes should be configured as drive 2
only (normally).

PR: kern/7176


37679 15-Jul-1998 bde

%n in a comment was a poor abbreviation for Immediate-byte-signed,
especially now that %n format has almost gone away.


37651 15-Jul-1998 bde

Cast virtual addresses that happen to be represented as u_longs to
uintptr_t before casting them to pointers. Explicit u_longs should
never be used to represent virtual addresses... (vm_offset_t is
normally right).


37629 14-Jul-1998 bde

Changed to the C9x draft spelling of the (unsigned) integral type
suitable for holding object pointers (ptrint_t -> uintptr_t).
Added corresponding signed type (intptr_t). Changed/added
corresponding non-C9x types for function pointers to match. Don't
use nonstandard types to implement these types, and don't comment
on them in <machine/types.h>.


37564 11-Jul-1998 bde

Fixed printf format errors.

Use offsetof() instead of null pointer hacks.


37561 11-Jul-1998 bde

Fixed printf format errors.


37557 11-Jul-1998 phk

Don't disable pmap_setdevram() which isn't called, but which could be,
but instead disable pmap_setvidram() which is called, but probably
shouldn't be.

PR: 7227, 7240


37555 11-Jul-1998 bde

Fixed printf format errors.


37553 11-Jul-1998 bde

Don't pretend to support ix86's with 16-bit ints by using longs just to
ensure 32-bit variables. Doing so mainly bogotified some printf formats.

Fixed disorder in md_var.h.


37552 11-Jul-1998 bde

Don't pretend to support ix86's with 16-bit ints by using longs
just to ensure 32-bit variables. Doing so broke and/or pessimized
i386's with 64-bit longs (unnecessary use of 64-bit variables
caused remarkably few problems in C code, but the inline asm here
tended to fail because there are no 64-bit registers). Since the
interfaces here are very machine-dependent and shouldn't be used
outside of the kernel, use a standard types of "known" width instead
of fixed-width types.

Changed all quad_t's to u_int64_t's. quad_t isn't standard, and
using signed types for 64-bit registers was bogus (but made no
difference).


37542 10-Jul-1998 bde

Oops, fptrint_t still needs to be declared in <machine/profile.h> in the
!KERNEL case. The kludge to get it declared in libc/gmon/mcount.c wasn't
sufficient because fptrint_t is used in <sys/gmon.h>.


37540 10-Jul-1998 bde

Added a kernel-only typedef (ptrint_t) giving an integral type that is
least unsuitable for holding an object pointer. This should have been
used to fix warnings about casts between pointers and ints on alphas.

Moved corresponding existing general typedef (fptrint_t) for function
pointers from the i386 <machine/profile.h> to a kernel-only typedef
in <machine/types.h>. Kludged libc/gmon/mcount.c so that it can
still see this typedef.


37506 08-Jul-1998 bde

Use not-so-new printf formats %r and/or %z instead of %n and/or %+x.


37504 08-Jul-1998 bde

Fixed bogus type of valuep in struct db_variable. It was `int *' and
became `long *' for alpha, but should always have been `db_expr_t *'.
Fixed variable types to match.


37389 04-Jul-1998 julian

There is no such thing any more as "struct bdevsw".

There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries. The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).

rawread()/rawwrite() went away as part of this though it's not strictly
the same patch, just that it involves all the same lines in the drivers.

cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.

Reviewed by: Eivind Eklund and Mike Smith
Changes suggested by eivind.


37315 30-Jun-1998 phk

Add 3 sysctl variables for future use by ps)1_


37311 30-Jun-1998 phk

Add PSE36 to the bits we know by name.


37272 30-Jun-1998 jmg

convert some nfs tunables to options, these are:
NFS_MINATTRTIMO VREG attrib cache timeout in sec
NFS_MAXATTRTIMO
NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec
NFS_MAXDIRATTRTIMO
NFS_GATHERDELAY Default write gather delay (msec)
NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this
NFS_WDELAYHASHSIZ and with this
NFS_MUIDHASHSIZ Tune the size of nfsmount with this
NFS_NOSERVER (already documented in LINT)
NFS_DEBUG turn on NFS debugging

also, because NFS_ROOT is used by very different files, it has been
renamed to opt_nfsroot.h instead of the old opt_nfs.h....


37101 21-Jun-1998 bde

Removed unused includes.


37099 21-Jun-1998 bde

Removed unused includes.
Ifdefed conditionally used includes.


37091 21-Jun-1998 mckay

Remove bogus comment that teleported in from sys/i386/i386/mp_machdep.c.


37086 21-Jun-1998 bde

Converted add_interrupt_randomness() to take a `void *' arg. Rewrote
mmioctl() to fix hundreds of style bugs and a few error handling bugs
(don't check for superuser privilege for inappropriate ioctls, don't
check the input arg for the output-only MEM_RETURNIRQ ioctl, and don't
return EPERM for null changes).


37051 18-Jun-1998 bde

Converted isa_strayintr() to take a `void *' arg.


37050 18-Jun-1998 bde

Changed the type of an isa/general interrupt handler to take a
`void *' arg. Fixed or hid most of the resulting type mismatches.
Handlers can now be updated locally (except for reworking their
global declarations in isa_device.h).


37034 17-Jun-1998 bde

Don't declare isa device structs or isa interrupt handlers in <sys/conf>,
and don't depend on them being declared there. This will cause lots of
warnings for a few minutes until config is updated. Interrupt handlers
should never have been configured by config, and the machine generated
declarations get in the way of changing the arg type from int to void *.


36909 12-Jun-1998 dg

Increased MAXTSIZ to 128MB...there are binaries that get quite large.
Increased DFLDSIZ to 128MB, as it is a better default.
Reviewed by: jkh


36810 09-Jun-1998 phk

Add a tc_ prefix to struct timecounter members.

Urged by: bde


36809 09-Jun-1998 bde

Pass lists of possible root devices and their names up to the
machine-independent code and try mounting the devices in the
lists instead of guessing alternative root devices in a machine-
dependent way.

autoconf.c:
Reject preposterous slice numbers instead of silently converting
them to COMPATIBILITY_SLICE.

Don't forget to force slice = COMPATIBILITY_SLICE in the floppy
device name.

Eliminated most magic numbers and magic device names in setroot().

Fixed dozens of style bugs.

vfs_conf.c:
Put the actual root device name instead of "root_device" in the
mount struct if the actual name is available. This is useful after
booting with -s. If it were set in all cases then it could be used
to do mount(8)'s ROOTSLICE_HUNT and fsck(8)'s hotroot guess better.


36766 08-Jun-1998 dfr

Fix more of my DDB breakage.


36760 08-Jun-1998 dfr

Make DDB work again after I broke it :-(.


36741 07-Jun-1998 phk

Add a member function more to the timecounters, this one is for use
with latch based PPS implementations. The client that uses it will
be committed after more testing.


36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


36719 07-Jun-1998 phk

Add a "this" style argument and a "void *private" so timecounters can
figure out which instance to wount with.


36614 03-Jun-1998 jkh

Add the DPT driver here. It's kinda ironic that it got enabled in -stable
first. :)
PR: 6848


36605 03-Jun-1998 bde

Ifdefed the netisr support.

PR: 6760
Reviewed by: joerg


36596 03-Jun-1998 msmith

If vm86 services are available, use these to perform the APM BIOS
probe and intialisation. This will ultimately remove the grubby (but
functional) hack that copies a real-mode function into low memory
early in locore.s.


36544 31-May-1998 steve

Make this ${.OBJDIR} and ${.CURDIR} aware.

PR: 2565


36492 31-May-1998 bde

Converted the ICU-level interrupt tests (3, 5 and 8) in sioprobe() into
a test of the irq number, and made failure of this test non-fatal.
Removed related unused complications for the APIC_IO case. Removed the
no-test3 flag.

Deverbosified the failure messages for the other tests. Removed the
per-port verbose flag - just use the general verbose flag.


36441 28-May-1998 phk

Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.

Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING: Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by: bde


36303 22-May-1998 des

Use switch instead of if/else chain for 686 model identification.
Add precise model identification for 586-family CPUs.


36290 22-May-1998 des

Add CPU_PII to the list.


36286 21-May-1998 des

Correctly identify the precise CPU model within the 686 family: instead
of just printing "Pentium Pro", check the model (cpu_id & 0xf0) and print
the appropriate information.


36275 21-May-1998 dyson

Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


36273 21-May-1998 wpaul

And entries for ThunderLAN driver.


36214 19-May-1998 dufault

Remove option for SCHED_FIFO. With this optional, SCHED_FIFO
is the same as RTPRIO_IDLE when it falls through to the default.


36200 19-May-1998 peter

Missing parens caused cpu features not to be printed for cyrix >= M2/MX.
Althought the comments say the datasheet doesn't list the device ID
registers on the M2/MX, they seem to be there and quite alive.
(It's interesting to note that the M2/MX calls itself a 686 class cpu but
is missing a heck of a lot of features, including VME, PGE, PSE, etc)


36198 19-May-1998 phk

Change a data type internal to the timecounters, and remove the "delta"
function.

Reviewed, but not entirely approved by: bde


36179 19-May-1998 phk

Make the size of the msgbuf (dmesg) a "normal" option.


36169 19-May-1998 tegge

Back out part of revision 1.198 commit (clearing kernel stack pages).
By request from David Greenman <dg@root.com>


36168 19-May-1998 tegge

Disallow reading the current kernel stack. Only the user structure and
the current registers should be accessible.
Reviewed by: David Greenman <dg@root.com>


36138 17-May-1998 tegge

Change simple lock handling to not depend upon having a local apic
available. The per-cpu variable ss_tpr has been replaced by ss_eflags.
This reduced the number of interrupts sent to the wrong CPU, due to
the cpu having the global lock being inside a critical region.

Remove some unneeded manipulation of tpr register in mplock.s.

Adjust code in mplock.s to be aware of variables on the stack being
destroyed by MPgetlock if GRAB_LOPRIO is defined.


36135 17-May-1998 tegge

Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.


36132 17-May-1998 tegge

Use a higher priority interrupt vector for 8254 timer interrupts.


36125 17-May-1998 tegge

For SMP, use prv_PPAGE1/prv_PMAP1 instead of PADDR1/PMAP1.
get_ptbase and pmap_pte_quick no longer generates IPIs.
This should reduce the number of IPIs during heavy paging.


36121 17-May-1998 tegge

Clear kernel stack pages before usage.
Correct panic message in pmap_zero_page (s/CMAP /CMAP2 /).


36119 17-May-1998 phk

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


36095 16-May-1998 kato

Some of newer PC-98 may cause "Windows Protection Fault" when booting
Windows 95 after rebooting FreeBSD without power off. In PC-98
system, reboot mode is set via I/O port 0x37 in cpu_reset(), and
accessing of this port is the reason of the problem. To avnoid the
fault, current status of reboot mode should be checked before
accessing the I/O port.


36094 16-May-1998 kato

Disable local APIC in UP kernel. Intel specification update describes
that local APIC should be disabled in UP system. However, some of old
BIOS does not disable local APIC, and virtual wire mode through local
APIC may cause int 15.


36051 15-May-1998 dyson

Disable the auto-Write Combining setup for the pmap code. This
worked on a couple of machines of mine, but appears to cause problems
on others.


35977 12-May-1998 dyson

Some temporary fixes to SMP to make it more scheduling and signal friendly.
This is a result of discussions on the mailing lists. Kudos to those who
have found the issue and created work-arounds. I have chosen Tor's fix
for now, before we can all work the issue more completely.
Submitted by: Tor Egge


35976 12-May-1998 dyson

Fix alot of silly LINT that I left in the code.


35974 12-May-1998 bde

Backed out previous commit. It is invalid to call d_ioctl() on
possibly non-open devices, and we don't want to restrict dumping
to swap devices anwyay. It is especially invalid to call d_ioctl()
in non-process context for panics. d_psize() can be called on
non-open devices, at least on non-SLICED ones that support d_dump(),
and setdumpdev() has depended on this for a long time although it
is probably wrong, but even d_psize() can't be called in non-process
context - that's why dumpsys() depends on previously computed values
although these values may be stale. The historical restriction to
devices with dkpart(dev) == SWAP_PART should go away.


35940 11-May-1998 dyson

Change some tests from CPU_CLASS686 to CPU_686 as appropriate, and
also correct a serious ommision that would cause process faulures
due to forgetting an invltlb type operatino. This was just a
transcription problem.


35933 11-May-1998 dyson

Support better performance with P6 architectures and in SMP
mode. Unnecessary TLB flushes removed. More efficient
page zeroing on P6 (modify page only if non-zero.)


35932 11-May-1998 dyson

Attempt to set write combining mode for graphics devices.


35812 06-May-1998 julian

Add dump support to the DEVFS/slice code.
now we can actually catch our crashes :-)

Submitted by: Luoqi Chen <luoqi@chen.ml.org> (the man who's everywhere)


35767 06-May-1998 gibbs

Implement bus_dmamem_* functions and correct a few nits reported by Peter Wemm.


35496 28-Apr-1998 eivind

Translate T_PROTFLT to SIGSEGV instead of SIGBUS when running under
Linux emulation. This make Allegro Common Lisp 4.3 work under
FreeBSD!

Submitted by: Fred Gilham <gilham@csl.sri.com>
Commented on by: bde, dg, msmith, tg
Hoping he got everything right: eivind


35477 27-Apr-1998 des

Cast return values to the appropriate fp_*_t. Note that the man page
incorrectly refers to them as e.g. fp_except rather than fp_except_t.

PR: misc/6310
Submitted by: Niall Smart


35456 26-Apr-1998 dyson

Add the PAT cpuid feature.


35393 22-Apr-1998 tegge

Mask the interrupt before setting the corresponding bit in ipending if
the interrupt is already active.
Don't use lock prefix for operations on ipending.
Always use lock prefix for operations on iactive.


35390 22-Apr-1998 mjacob

Add support for the Qlogic ISP SCSI && FC/AL Adapters


35356 20-Apr-1998 julian

Remove an LFS clause, now that it is in the history,
anyone who wants to see what was needed cto revive LFS can see it.


35323 20-Apr-1998 julian

Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)


35319 19-Apr-1998 julian

Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.


35303 19-Apr-1998 bde

Support compiling with `gcc -pedantic' (don't use hard newlines in
(asm) string constants).


35302 19-Apr-1998 bde

Support compiling with `gcc -pedantic' (don't use hard newlines in
(asm) string constants or hard long long constants).


35300 19-Apr-1998 bde

Support compiling with `gcc-pedantic' (don't use hard newlines in
(asm) string constants or trailing commas in enum declarations).


35296 19-Apr-1998 bde

Support compiling with gcc -pedantic (don't use a bogus, null cast).


35256 17-Apr-1998 des

Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.


35215 15-Apr-1998 bde

Finish supporting compiling with `gcc -ansi'. Fix missing `volatile's
in __asm() statements while I'm here.


35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


35203 15-Apr-1998 bde

Fixed breakage of fork accounting in previous commit. A fork benchmark
reported about 15 times as much sys time as real time. getmicroruntime()
is confusing name.


35087 06-Apr-1998 peter

Fix VM86 compiles. a #include "opt_vm86.h" was missing, and the my_tr
variable was needed in the non-SMP case.

Submitted by: Jonathan Lemon <jlemon@americantv.com>


35079 06-Apr-1998 peter

remove #ifdef declaration of npxproc, use globals.s and the extern always.


35077 06-Apr-1998 peter

Use real types for the SMP pages being allocated rather than arrays of
ints. Remove some no longer needed casts. Initialize the per-cpu
global data area using the structs rather than knowing too much about
layout, alignment, etc.


35076 06-Apr-1998 peter

clean up #ifdefs, define the variables that have to be per-cpu on SMP
in globals.s only and use externs always.


35075 06-Apr-1998 peter

_curpcb is always defined in globals.s instead of here in #ifdefs


35074 06-Apr-1998 peter

Bogus casts


35072 06-Apr-1998 peter

Rather than filling this file up with SMP .sets, use those from
globals.s instead.
Initialize curproc in the same place for both UP and SMP.


35071 06-Apr-1998 peter

Generate #defines that the asm code can access for the per-cpu data
structures.


35069 06-Apr-1998 peter

A pair of C structures used for laying out the SMP per-cpu data space.


35058 06-Apr-1998 phk

Make a kernel version of the timer* functions called timerval* to be
more consistent.

OK'ed by: bde


35035 05-Apr-1998 tegge

Remove some unneeded statements that enabled interrupts.


35029 04-Apr-1998 phk

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


34990 01-Apr-1998 tegge

Add two workarounds for broken MP tables:

- Attempt to handle PCI devices where the interrupt is
an ISA/EISA interrupt according to the mp table.

- Attempt to handle multiple IO APIC pins connected to
the same PCI or ISA/EISA interrupt source. Print a
warning if this happens, since performance is suboptimal.
This workaround is only used for PCI devices.

With these two workarounds, the -SMP kernel is capable of running on
my Asus P/I-P65UP5 motherboard when version 1.4 of the MP table is disabled.


34989 01-Apr-1998 tegge

Declare some variables modified by interrupt handlers as volatile.


34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


34925 28-Mar-1998 dufault

Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work. Changes:

Change all "posix4" to "p1003_1b". Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.


34924 28-Mar-1998 bde

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


34840 23-Mar-1998 jlemon

Add the ability to make real-mode BIOS calls from the kernel. Currently,
everything is contained inside #ifdef VM86, so this option must be
present in the config file to use this functionality.

Thanks to Tor Egge, these changes should work on SMP machines. However,
it may not be throughly SMP-safe.

Currently, the only BIOS calls made are memory-sizing routines at bootup,
these replace reading the RTC values.


34643 17-Mar-1998 kato

Make EPSON_BOUNCEDMA a new-style option.


34636 17-Mar-1998 msmith

Add missing entry to list of major device names. This list should not
exist.


34624 16-Mar-1998 msmith

Spell 'compatibility' like everyone else.


34622 16-Mar-1998 msmith

Use dkmakeminor() rather than magic knowledge of the size and location of
the slice field. Handle incomprehensible slice numbers slightly better.
Suggested by: bde


34617 16-Mar-1998 phk

Be less draconian about the TSC if APM is configured, use it for
timecounting if APM-BIOS isn't found.
Be just as draconian about SMP as always, but explain it better.


34611 16-Mar-1998 dyson

Some VM improvements, including elimination of alot of Sig-11
problems. Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1) Create an object for kernel page table allocations. This
fixes a bogus allocation method previously used for such, by
grabbing pages from the kernel object, using bogus pindexes.
(This was a code cleanup, and perhaps a minor system stability
issue.)

pmap.c:
2) Pre-set the modify and accessed bits when prudent. This will
decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3) Rather than calculating the beginning virtual byte offset
multiple times, stick the offset into the buffer header, so
that the calculated offset can be reused. (Long long multiplies
are often expensive, and this is a probably unmeasurable performance
improvement, and code cleanup.)

vfs_bio.c:
4) Handle write recursion more intelligently (but not perfectly) so
that it is less likely to cause a system panic, and is also
much more robust.

vfs_bio.c:
5) getblk incorrectly wrote out blocks that are incorrectly sized.
The problem is fixed, and writes blocks out ONLY when B_DELWRI
is true.

vfs_bio.c:
6) Check that already constituted buffers have fully valid pages. If
not, then make sure that the B_CACHE bit is not set. (This was
a major source of Sig-11 type problems.)

vfs_bio.c:
7) Fix a potential system deadlock due to an incorrectly specified
sleep priority while waiting for a buffer write operation. The
change that I made opens the system up to serious problems, and
we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8) Make clustered reads work more correctly (and more completely)
when buffers are already constituted, but not fully valid.
(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9) Create a vtruncbuf function, which is used by filesystems that
can truncate files. The vinvalbuf forced a file sync type operation,
while vtruncbuf only invalidates the buffers past the new end of file,
and also invalidates the appropriate pages. (This was a system reliabiliy
and performance issue.)

10) Modify FFS to use vtruncbuf.

vm_object.c:
11) Make the object rundown mechanism for OBJT_VNODE type objects work
more correctly. Included in that fix, create pager entries for
the OBJT_DEAD pager type, so that paging requests that might slip
in during race conditions are properly handled. (This was a system
reliability issue.)

vm_page.c:
12) Make some of the page validation routines be a little less picky
about arguments passed to them. Also, support page invalidation
change the object generation count so that we handle generation
counts a little more robustly.

vm_pageout.c:
13) Further reduce pageout daemon activity when the system doesn't
need help from it. There should be no additional performance
decrease even when the pageout daemon is running. (This was
a significant performance issue.)

vnode_pager.c:
14) Teach the vnode pager to handle race conditions during vnode
deallocations.


34591 15-Mar-1998 msmith

Use dsname() to generate the disk region name for the "changing root
device to" message. Suppress this message if only the slice number
has changed.


34571 14-Mar-1998 tegge

On SMP systems, initially follow the MP spec with regard to which pin
on the IOAPIC being connected to the 8254 timer interrupt.
Verify that timer interrupts are delivered. If they aren't, attempt
a fallback to mixed mode (i.e. routing the timer interrupt via the 8259 PIC).


34569 14-Mar-1998 tegge

Don't use the standard macros for disabling/enabling interrupt.
On SMP systems, this left the mpintr_lock simplelock locked, causing
further calls to disable_intr to deadlock or panic.


34507 12-Mar-1998 bde

Fixed breakage of the !SMP case in vm_page_zero_idle() in the
previous commit. Opportunities to clean pages were often missed,
and leaving of the idle state was sometimes delayed until the next
interrupt (after any that occurred while cleaning).

Fixed an unstaticization, a syntax error and a style bug in the
previous commit.


34506 12-Mar-1998 bde

Don't depend on "implicit int" or bloat the data section in the
declaration of mem_devsw_installed.

Reduced include nesting.


34440 09-Mar-1998 eivind

Turn "PMAP_SHPGPERPROC" into a new-style option, add it to LINT, and
document it there.


34392 09-Mar-1998 msmith

"Correct behaviour" involves being consistent with the canonical names of
other partitions. In this case, they appear in the first slice in the
WHOLE_DISK_SLICE case.


34390 09-Mar-1998 msmith

Merge from 2.2; behave correctly in the presence of a slice number that
doesn't directly correspond to the slice field in the device minor number.


34309 08-Mar-1998 msmith

Construct the minor number for the root device taking into account the
slice number passed in by the bootblocks. This means the kernel will
not use the compatability slice to obtain the root filesystem when
booting from a sliced disk.

Use the extraction macros from reboot.h rather than stating them in full
again.


34206 07-Mar-1998 dyson

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


34197 07-Mar-1998 tegge

The APs now reload the interrupt descriptor table pointer after
f00f_hack has run.

Use the global r_idt descriptor in f00f_hack when in SMP mode,
so the APs find the relocated interrupt descriptor table.

Submitted by: Partially from David A Adkins <adkin003@tc.umn.edu>


34058 05-Mar-1998 tegge

Remove special handling for resuming clock interrupt when using APIC_IO.
The `generic' vector stubs do the right thing.


34057 05-Mar-1998 tegge

Use t_idt instead of idt inside setidt() if f00f_hack() has relocated the IDT.
Submitted by: Bruce Evans <bde@zeta.org.au>


34031 04-Mar-1998 kato

Defined CCR6 and CCR7 (configuration registers of M2 CPU.)


34028 04-Mar-1998 dufault

Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.


34021 03-Mar-1998 tegge

When entering the apic version of slow interrupt handler, level
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.

Clock interrupts now have higher priority than other slow interrupts.


34020 03-Mar-1998 tegge

Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.


34019 03-Mar-1998 tegge

Reduce timeout before assuming that forwarding of hardclock or softclock
failed. Don't complain on forwarding failure, unless
BETTER_CLOCK_DIAGNOSTIC is defined.


34017 03-Mar-1998 tegge

forward_statclock and forward_hardclock are located in mp_machdep.c.


33983 02-Mar-1998 peter

Update the ELF image activator to use some of the exec resources rather
than rolling it's own. This means that it now uses the "safe"
exec_map_first_page() to get the ld.so headers rather than risking a panic
on a page fault failure (eg: NFS server goes down).
Since all the ELF tools go to a lot of trouble to make sure everything
lives in the first page for executables, this is a win. I have not seen
any ELF executable on any system where all the headers didn't fit in the
first page with lots of room to spare.
I have been running variations of this code for some time on my pure ELF
systems.


33936 01-Mar-1998 dyson

1) Use a more consistent page wait methodology.
2) Do not unnecessarily force page blocking when paging
pages out.
3) Further improve swap pager performance and correctness,
including fixing the paging in progress deadlock (except
in severe I/O error conditions.)
4) Enable vfs_ioopt=1 as a default.
5) Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."


33929 28-Feb-1998 phk

Prevent the TSC from being used on APM machines, we have no idea if
it runs at a constant frequency. This was less of an issue before,
because the TSC only interpolated in the HZ intervals, but now where
the timecounter is used all the way, this becomes much more visible.

Nit: Fix a printf which triggered the bde-filter.


33817 25-Feb-1998 dyson

Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs. The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by: Partially from Tor Egge.


33809 25-Feb-1998 bde

Removed vestiges of previous microtime() implementation.


33757 23-Feb-1998 dyson

Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden
in a way identically as before.) I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE. Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.

Add some accouting for vm_zone memory allocations, and provide properly
for vm_zone allocations out of the kmem_map. Also move the vm_zone
allocation stats to the VM OID tree from the KERN OID tree.


33753 23-Feb-1998 bde

Quick fix for the i8254 timecounter often gaining 10 msec.


33727 21-Feb-1998 jkh

Add missing CLOCK_UNLOCK() before write_eflags().
Submitted by: dave adkins <adkin003@tc.umn.edu>


33690 20-Feb-1998 phk

Replace TOD clock code with more systematic approach.

Highlights:
* Simple model for underlying hardware.
* Hardware basis for timekeeping can be changed on the fly.
* Only one hardware clock responsible for TOD keeping.
* Provides a real nanotime() function.
* Time granularity: .232E-18 seconds.
* Frequency granularity: .238E-12 s/s
* Frequency adjustment is continuous in time.
* Less overhead for frequency adjustment.
* Improves xntpd performance.

Reviewed by: bde, bde, bde


33676 20-Feb-1998 bde

Removed unused #includes.


33444 16-Feb-1998 msmith

Remove DISABLE_PSE option which was masking (but not fixing) the problem.
A correct fix for execution off MFS filesystems has been committed.


33417 16-Feb-1998 msmith

TEMPORARILY disable support for the 4MB kernel page, as it appears to be
causing installation images for -current to be unbootable.

Submitted by: phk


33362 15-Feb-1998 bde

Removed a superstitious fnop() that broke the usefulness of the FPU's
"last instruction" pointer.


33320 13-Feb-1998 kato

Use RDMSR instruction instead of WRMSR.


33311 13-Feb-1998 bde

Ifdefed SMP-only declarations.


33309 13-Feb-1998 bde

Update timer0_prescaler_count before calling hardclock() while timer0
is "acquired". This fixes a TSC biasing error of about 10 msec when
pcaudio is active.

Update `time' before calling hardclock() when timer0 is being released.
This is not known to be important.

Added some delays in writertc(). Efficiency is not critical here, unlike
in rtcin(), and we already use conservative delays there.

Don't touch the hardware when machdep.i8254_freq is being changed but
the maximum count wouldn't change. This fixes jitter of up to 10 msec
for most small adjustments to machdep.i8254_freq. When the maximum
count needs to change, the hardware should be adjusted more carefully.


33307 13-Feb-1998 bde

Ifdefed some npx code. npx should be optional again.


33306 13-Feb-1998 bde

Fixed missing privilege checking and off-by-1 bounds checking in
i386_set_ioperm(). Don't use a magic number for the bound.

Fixed missing bounds checking in i386_get_ioperm(). Don't use a
magic number for the bound elsewhere in this function.

Removed some bogus initializers.


33282 12-Feb-1998 bde

Fixed initialization of the 4MB page. Kernels larger than about 2.75MB
(from _btext to _end) crashed in pmap_bootstrap(). Smaller kernels
worked accidentally.


33281 12-Feb-1998 bde

Only use the i586-optimized copying and zeroing functions if they are
actually faster (more than 20% faster for zeroing 1 MB at boot time).
This fixes pessimized copying and zeroing on K6's and perhaps on other
CPUs that are misclassified as i586's.


33221 10-Feb-1998 eivind

Fix warning after previous staticization.


33181 09-Feb-1998 eivind

Staticize.


33179 09-Feb-1998 eivind

Remove warnings from f00f_hack.


33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


33109 05-Feb-1998 dyson

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


33068 04-Feb-1998 eivind

Make FAILSAFE a new-style option.


33056 03-Feb-1998 bde

Converted DISABLE_PSE to a new-style option.

Fixed some formatting in options.i386.


33051 03-Feb-1998 bde

Ifdefed some SMP and VM86 code. Note that although VM86 is not a global
option, the ifdef on it in a header works because only the name of the
VM86 extension is hidden.


33049 03-Feb-1998 bde

Forward declare a union so that this file is self-sufficient.

Cleaned up ifdefs.


33047 03-Feb-1998 bde

Ifdefed use of a GNU feature.


33008 01-Feb-1998 bde

Fixed disordering of busdma* and swi_vm.


33007 01-Feb-1998 bde

Fixed a recently broken comment.


32992 01-Feb-1998 bde

Declare printf() instead of including <stdio.h>, so that this doesn't
depend on anything outside of "sys".

Removed an unused include.

Don't use `extern' in a function declaration.


32937 31-Jan-1998 dyson

Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY. It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed. The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use. There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages. I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise. For now, we are stuck
with the current morass.


32929 31-Jan-1998 eivind

Make the debug options new-style.

This also zaps a DPT option from lint; it wasn't referenced from
anywhere.


32925 31-Jan-1998 eivind

Make POWERFAIL_NMI, PPS_SYNC and NATM new style options.

This also fixes a couple of defunct options; submitted by bde.


32922 31-Jan-1998 eivind

Skip probing devices which have already probed true.


32917 31-Jan-1998 eivind

Include "opt_nfs.h"

Pointed out by: Eric L. Hernes <erich@lodgenet.com>


32889 30-Jan-1998 phk

Retire LFS.

If you want to play with it, you can find the final version of the
code in the repository the tag LFS_RETIREMENT.

If somebody makes LFS work again, adding it back is certainly
desireable, but as it is now nobody seems to care much about it,
and it has suffered considerable bitrot since its somewhat haphazard
integration.

R.I.P


32884 30-Jan-1998 dyson

Make the bounce buffer code a little more robust when space isn't
available. If there isn't bounce space available, the bounce code
is disabled. This will allow most large systems to run properly
when the bounce space is mistakenly allocated above 16MB.


32850 28-Jan-1998 phk

APM calls inittodr(0) which is stupid, but at least stop setting the
clock back to when Dennis had a good idea.


32820 27-Jan-1998 kato

Execute cpuid if BIOS disables cpuid instruction of Cyrix 6x86MX CPU.


32781 25-Jan-1998 kato

Undo previous commit. The cpuid symbol has been already used by SMP
stuff.

Pointed-out by: Manfred Antar <root@mantar.slip.netcom.com>


32772 25-Jan-1998 kato

Added cpuid instruction.


32771 25-Jan-1998 kato

Execute cpuid if BIOS disables cpuid instruction of Cyrix 6x86MX CPU,
and store its result into cpu_id and cpu_feature variables.

Tested by: Simon Coggins <chaos@ultra.net.au>


32765 25-Jan-1998 kato

Even though BIOS writer's guide recommends cpuid instruction of Cyrix
6x86MX CPU is enabled (BIOS should not disable it), some BIOS disables
it via CCR4. In this case, cpu variable becomes CPU_486 and
identblue() is called. Because Cyrix 6x86MX has MSR and doesn't have
MSR1002, wrmsr instruction generates general protection fault.

Tested by: Simon Coggins <chaos@ultra.net.au>


32726 24-Jan-1998 eivind

Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.

This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.


32724 24-Jan-1998 dyson

Add better support for larger I/O clusters, including larger physical
I/O. The support is not mature yet, and some of the underlying implementation
needs help. However, support does exist for IDE devices now.


32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


32680 21-Jan-1998 jkh

Add entries for tx card.


32677 21-Jan-1998 gibbs

Add prototypes for swi_vm, setsoftvm, schedsoftvm, and splsoftvm that were
missed when I originally committed the bus dma code.


32641 20-Jan-1998 jb

Suggested by: bde
Move sigjmp_buf and jmp_buf structure definitions to machine/setjmp.h
so that i386 can continue to use int as the basic register type and
alpha can use long. Bruce was concerned about possible differing
alignment. I've left the definition of _JBLEN in machine/setjmp.h
even though Bruce's example used the number directly. I don't know if
any other code relies on _JBLEN, so I left it to avoid potential
breakage.


32617 19-Jan-1998 tegge

The removal of a page from the free queue in vm_page_zero_idle was
imcomplete. Also set m->queue, in order to prevent vm_page_select_free
from selecting the page being zeroed.


32585 17-Jan-1998 dyson

Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.


32578 16-Jan-1998 pst

Bring in IDE ATAPI floppy support.
This is Junichi's v1.0 driver.

NOTE: Major device numbers have been changed to avoid conflict with other
FreeBSD 3.0 devices. The new numbers should be considered "official."
This driver is still considered "beta" quality, although we have been
playing with it. Please submit bugs to junichi and myself.

Submitted by: junichi@astec.co.jp


32518 15-Jan-1998 gibbs

Addition of splsoftvm and a VM SWI to handle bus dma related callbacks.
This SWI may be useful for other, defered, VM tasks.


32517 15-Jan-1998 gibbs

Implementation of Bus Space for FreeBSD-x86.

Obtained From: NetBSD


32516 15-Jan-1998 gibbs

Implementation of Bus DMA for FreeBSD-x86. This is sufficient to do
page level bounce buffering, but there are still some issues left to
address.


32464 12-Jan-1998 dyson

Adjust upwards the size of exec map in order to take into account the
additional PAGE_SIZE needed for exec operatino.


32413 11-Jan-1998 jkh

Add ppp, at long last, back to GENERIC. We have enough room in the
kernel for it and I'm tired of reading the "This system lacks kernel
support for PPP..." line in people's tech support messages.


32411 10-Jan-1998 jb

Add a machine dependent header for the i386 jmp_buf size instead of piling
machine dependent definitions into src/include/setjmp.h.


32358 09-Jan-1998 eivind

Make the BOOTP family new-style options (in opt_bootp.h)


32203 03-Jan-1998 obrien

AMD calls the PR166 and PR200, models 2 and 3 respectively.


32200 03-Jan-1998 obrien

Update AMD URL for CPU recognition docs.


32199 03-Jan-1998 kato

Fix typo. Option `CPU_SUSP_HLT' didn't work on Cyrix 486DX box.

Submitted by: nyan@wyvern.cc.kogakuin.ac.jp (Takahashi Yoshihiro)


32164 01-Jan-1998 msmith

Don't try to call into BIOS32 handlers outside the normal ROM
address range. They may have been trashed earlier in the boot
process, or the directory header may simply be bogus.

PR: 5140
Submitted by: Joel Faedi <Joel.Faedi@esial.u-nancy.fr>
Brought-to-attention-by: Derek Inksetter <derek@saidev.com>, bde


32151 01-Jan-1998 bde

Moved the SMP declarations of INTREN() and INTRDIS() to the correct header,
i.e., the same header as corresponding non-SMP #defines.


32054 28-Dec-1997 phk

More cleanup relating to our use of the TSC.
Look in the cpu_feature (CPUID output) to see if we have it.


32052 28-Dec-1997 phk

wash, sort and put in order various nits from the i586_ctr -> tsc
commit.

Pointed out by: bde


32012 27-Dec-1997 peter

Back out previous commit, the so-called "unused code" was most definately
used, and caused a reference to an uninitialised variable (state).
I think I've fixed it now, but since nothing in the tree seems to use it,
I'm not sure.


32010 27-Dec-1997 peter

#include "opt_user_ldt.h" so that the #ifdef USER_LDT checks can work, as
commented about at length in the PR audit trail.

PR: 2412


32005 26-Dec-1997 phk

Rename "i586_ctr" to "tsc" (both upper and lower case instances).
Fix a couple of printfs too.

Warning: This changes the names of a couple of kernel options!


31934 22-Dec-1997 dyson

Correct my previous fix for the UPAGES problem.


31930 22-Dec-1997 dyson

Hopefully fix the problem with the TLB not being updated correctly.
Problem tracked down by bde@freebsd.org, but this is an attempted
efficient fix.


31893 20-Dec-1997 se

Make the class code checks in function pci_cfgcheck less strict.
It failed to recognize the PCI bus in a system that had only an
old chip-set (class code 000000) and a Cyclom multiport serial
card on PCI bus 0, but no VGA card or disk or network controller.

PR: i386/5300
Submitted by: Nickolay N. Dudorov <nnd@itfs.nsk.su>


31723 15-Dec-1997 tegge

Add support for low resolution SMP kernel profiling.

- A nonprofiling version of s_lock (called s_lock_np) is used
by mcount.

- When profiling is active, more registers are clobbered in
seemingly simple assembly routines. This means that some
callers needed to save/restore extra registers.

- The stack pointer must have space for a 'fake' return address
in idle, to avoid stack underflow.


31720 15-Dec-1997 tegge

Don't forward hardclock or statclock to stopped cpus. Disable forwarding
when a panic has occured.


31709 14-Dec-1997 dyson

After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations. The number
of UP TLB flushes is decreased much more than significantly with these
changes. Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.


31689 12-Dec-1997 tegge

Add needed #include.

Problem found by: Bruce Evans <bde@zeta.org.au>


31639 08-Dec-1997 fsmp

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


31638 08-Dec-1997 fsmp

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

apic_vector.s also contains a small change I (smp) made to eliminate
the double level INT problem. It seems stable, but I haven't the tools
in place to prove it fixes the problem.

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


31564 06-Dec-1997 sef

Changes to allow event-based process monitoring and control.


31544 04-Dec-1997 jmg

document and make the NO_F00F_HACK a proper option...

also, sort some option includes while I'm here..

Forgotten by: sef


31535 04-Dec-1997 jkh

After consultation with David, change
#ifndef NO_F00F_HACK
to
#if defined(I586_CPU) && !defined(NO_F00F_HACK)


31515 03-Dec-1997 sef

Make has_f00f_bug extern, and get rid of some unused code in the f00f
code.

Submitted by: Mikael Karpberg & Cy Schubert


31507 03-Dec-1997 sef

Work around for the Intel Pentium F00F bug; this is Intel's recommended
workaround. Note that this currently eats up two pages extra in the system;
this could be alleviated by aligning idt correctly, and then only dealing with
that (as opposed to the current method of allocated two pages and copying the
IDT table to that, and then setting that to be the IDT table).


31457 30-Nov-1997 jmg

fix a few style nites...

make isa_dmacascade, isa_dmastart, isa_dmadone, and find_isadev MUCH
easier to be found by starting them at the beginging of the line...

remove braces inside of ifdef RESOURCE_CHECK... found by % in vi...


31424 26-Nov-1997 joerg

Removed an unused line of code, that caused an ``maybe used uninitialized''
warning.

Found by: Simon Shapiro


31397 24-Nov-1997 bde

Fixed multiple definitions of boothowto.


31395 24-Nov-1997 bde

Added a sysctl (machdep.cputime_clock) to select the clock used by
"high resolution" profiling. The available clocks are:
- the i8254 clock
- on non-SMP i586's and i686's: the TSC
- on systems with I586_PMC_GUPROF configured, and PERFMON configured
and available: all the performance counters.
This is unfinshed (there are problems with locking out the PERFMON
device driver, and with losing calibration after switching the clock),
but better than static configuration or writing to kmem.

Changed ifdefs to avoid generating code for non-working option
combinations.


31389 24-Nov-1997 bde

Fixed some #include messes.

Hid the check of the user %cs in syscall() under `#ifdef DIAGNOSTIC'.


31338 21-Nov-1997 jlemon

Correct CPU_CYRIX_NO_LOCK fix.
PR: 5121
Pointed out by: Matthew Hunt


31337 21-Nov-1997 bde

Fixed setting of `safepri'. It should be SWI_AST_MASK most of the
time, but was left at 0. This caused the "can't happen" case in
splz_swi to happen for panics when tsleep() calls splx(safepri)
and there is a SWI_AST pending. This was harmless because the
the error handling happens to be right. Debugging this was tricky
because debugger traps force SWI_AST_MASK on in `cpl'.


31336 21-Nov-1997 bde

Moved splhigh()/spl0() calls from isa_configure() to configure() so that
there is a natural place to initialize `safepri' in a future commit.
Spinoffs:
- spl0() gets called in the unlikely event that isa is not configured.
- configure() has better control over enabling interrupts.
- it is now less unclear that interrupts aren't actually enabled early.
Rev.1.48 of autoconf.c seems to have done the opposite of what was
intended - moving the isa_configure() call delayed the spl0() side
effect.
Added some comments about the bogons. Removed the splhigh() call since
it is a no-op.


31328 21-Nov-1997 peter

Previous commit refers to SWAP_PART, which is only defined if the include
file that it's in is #included...


31322 20-Nov-1997 bde

Removed a duplicate (sloppy common-style) definition.

Fixed some style bugs.


31321 20-Nov-1997 bde

Moved some extern declarations to header files (unused ones to /dev/null).


31319 20-Nov-1997 bde

Avoid passing some more `retval's.


31318 20-Nov-1997 bde

Fixed wrong limits for the kernel text in db_numargs(). The
interval [VM_MIN_KERNEL_ADDRESS, etext] was used instead of
[btext, etext). Added a comment about this being completely
wrong for LKMs. This only affects interpreting the instructions
after the return to attempt decide the number of args. The
attempt usually fails anyway.


31317 20-Nov-1997 bde

Fixed write enabling of the kernel text section. The overlap
checking was mostly wrong at the boundaries. For the lower limit,
VM_MIN_KERNEL_ADDRESS was used instead of btext and there was an
off-by-(`size' - 1) error. For the upper limit, &etext was used
instead of etext and there was an off-by-1 error. The bugs were
harmless because `size' is not too large and some memory is mapped
just beyond the ends. We still depend on the former to avoid
having to handle the case where the memory range covers the whole
text section, and on the latter to prevent problems when we map
just beyond an end to allow writing an address range that overlaps
the end.

Fixed placement of a nearby comment.


31316 20-Nov-1997 bde

Don't allow setting the dump device to any partition except the
one traditionally reserved for swap devices. The restrictions
should now be the same as the ones for dumpsys(). The restriction
on the partition should be removed someday, and dumpsys() shouldn't
repeat all the checks.


31255 18-Nov-1997 bde

Removed an unused #included.

Ifdefed #includes that are not used in the SMP case.


31253 18-Nov-1997 bde

Removed #unused includes.

Added a used #include (don't depend on yet to be fixed namespace pollution).


31249 18-Nov-1997 bde

Don't #include <machine/smp.h> even in the SMP case. Fixed the one
place that depended on it. The "bazillion warnings" mentioned in the
log for rev.1.45 apparently aren't a problem any more. It is hard
to be sure because the SIMPLELOCK_DEBUG option turns off (and breaks)
things in the SMP case.


31030 07-Nov-1997 tegge

Use UPAGES when setting up private pages for SMP (which includes idle stack).


31017 07-Nov-1997 phk

Rename some local variables to avoid shadowing other local variables.

Found by: -Wshadow


31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


30976 06-Nov-1997 kato

Identify MediaGX CPU correctly. Old MeidaGX CPU and GXm CPU are
distinguished. CPU-classes of MeidaGX CPU and GXm CPU are 486-class
and 586-class, respectively.

PR: 4936


30964 05-Nov-1997 kato

Fix rare 6x86 CPU whose DIR0 = 0x20 - 0x28 case.


30918 04-Nov-1997 kato

Use same address for USERCONFIG_BOOT on PC-98 as IBM-PC.

Submitted by: H. Nokubi <h-nokubi@nmit.tmg.nec.co.jp>
Forgotten by: kato


30875 31-Oct-1997 jseger

Change comments about ijppp to iijppp.

PR: conf/4905
Submitted by: takas-su@is.aist-nara.ac.jp


30813 28-Oct-1997 bde

Removed unused #includes.


30805 28-Oct-1997 bde

Don't include <machine/cputypes.h> or declare cputype/class interfaces
in <machine/cpu.h>. Moved the declarations to <machine/cputypes.h>.
Fixed style bugs in the moved code. Fixed everything that depended on
the nested include. Don't include <machine/cpu.h> (in the changed files)
unless something in it is used directly.


30797 28-Oct-1997 joerg

Remove the stale `log' non-pseudodevice.

Found by: the new config(8) ;-)


30789 27-Oct-1997 bde

Moved declaration of etext from <machine/md_var.h> to <machine/cpu.h>
and fixed everything that dependended on it being declared in the old
place. It is used in "machine-independent" code in subr_prof.c.

Moved declaration of btext from subr_prof.c to <machine/cpu.h>. It
is machine-dependent.


30788 27-Oct-1997 bde

Oops, <machine/psl.h> is used unconditionally in -current.


30786 27-Oct-1997 bde

Cleaned up #includes.

Ifdefed conditionally used includes.

Finished changing indentation of per-statement comments to 40.


30755 27-Oct-1997 jkh

Bump MAXDSIZ to 512MB so that soft limits have a chance to actually
regulate this.
Reviewed by: dyson


30754 27-Oct-1997 dyson

Check to see if the pv_limits are initialized before checking.


30732 26-Oct-1997 dyson

Change the initial amount of memory allocated for pv_entries to be proportional
to the amount of system memory. Also, clean-up some of the new pv_entry
mgmt code.


30720 26-Oct-1997 nate

- Do a bunch of gratuitous changes intended to make the code easier to
follow.
* Rename/reorder all of the pccard structures, change many of the member
names to be descriptive, and follow more closely other 'bus' drivers
naming schemes.
* Rename a bunch of parameter and local variable names to be more
consistant in the code.
* Renamed the PCCARD 'crd' device to be the 'card' device
* KNF and make the code consistant where it was obvious.
* ifdef'd out some unused code


30702 25-Oct-1997 dyson

Somehow an error crept in during the previous commit.


30701 25-Oct-1997 dyson

Support garbage collecting the pmap pv entries. The management doesn't
happen until the system would have nearly failed anyway, so no signficant
overhead is added. This helps large systems with lots of processes.


30700 24-Oct-1997 dyson

Decrease the initial allocation for the zone allocations.


30623 21-Oct-1997 msmith

Reference the DMI table inside the SMBIOS table correctly, not using a variable
that won't be initialised until a later test.
Submitted by: bde via -Wunused


30543 18-Oct-1997 joerg

Make all the documented (in pcvt(4)) options supported options. While
i was at it, do no longer insist on `PCVT_FREEBSD' being declared in
the config file, but default it to a reasonable value.

More cleanup to follow, but this part is safe for RELENG_2_2, too.


30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


30343 12-Oct-1997 peter

Try and fix some style problems


30320 12-Oct-1997 jkh

Allow "foo0: disabled, not probed" message to stay, but make it conditional
on bootverbose so that those who _really wanna know_ still can.
Compromise suggested by: joerg


30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


30275 10-Oct-1997 peter

Compensate for pcb.h tweaks.

(Bruce pointed out the nesting)


30274 10-Oct-1997 peter

Don't #include unneeded includes here. pcb_ext.h picks up lots of other
stuff with it.


30273 10-Oct-1997 peter

GPROC0_SEL isn't used in any *.s files it seems..


30265 10-Oct-1997 peter

Convert the VM86 option from a global option to an option only depended
on by the files that use it. Changing the VM86 option now only causes
a recompile of a dozen files or so rather than the entire kernel.


30236 08-Oct-1997 nate

- Enable PS/2 mouse support by default. Given that almost all new hardware
has a PS/2 port, this is a good thing. Note, older 386/486 boxes may
lockup the keyboard controller with this enabled, but most of these kinds
of machines don't run -current, so the benefits outweigh the downsides.

Discussed with: Kazutaka YOKOTA <yokota@zodiac.mech.utsunomiya-u.ac.jp>


30162 06-Oct-1997 kato

Added two Cyrix 6x86/6x86MX options.

- CPU_CYRIX_NO_LOCK enables weak locking. If this option is not set and
FAILESAFE is defined, NO_LOCK bit of CCR1 is cleared.
- CPU_WT_ALLOC enables write-through allocation.


30136 06-Oct-1997 dyson

It is possible that MB's with really broken bios's not set up more of
the mtrr registers. This just fills in more of the registers.


30112 05-Oct-1997 dyson

Make sure that the memory type registers are the same for each CPU
in a P6 SMP system. Some MB bios'es don't set the registers up correctly
for the AP's. Additionally, set the memory between 0xa0000 and 0xbffff
as write combining.


30082 03-Oct-1997 kato

Call identifycyrix() when 6x86MX CPU is found. The identifycyrix()
function sets cyrix_did. Old code could not display correct variable.

Reviewed by: Hideyuki Suzuki <hideyuki@sat.t.u-tokyo.ac.jp>


29945 28-Sep-1997 gibbs

Fix a serious bug I introduced while adding in support for CAM interrupts.
It seems I didn't count my 0's properly when adding the new masks into
icu_vector.s pushing SWI_AST_MASK off the end of the array and screwing
up the indexing for SWI_CLOCK_MASK.

Fix the bug icu_vector.s and also reformat the code in both icu_vector.s and
apic_vector.s so that it will be much harder to make the same mistake in
the future.

Submitted by: Bruce Evans <bde@zeta.org.au>


29936 28-Sep-1997 mckay

Add a small hack to support the strange antics of the Unisys ELI 4003. This
machine generates an NMI for each floating point error, just like an old XT.
Since it is ISA only, reading the EISA status port yields 0xff, which would
give a spurious EISA panic. The simplest thing to do is to ignore the 0xff.


29851 25-Sep-1997 dg

Fix a bug where the speculative memory probe wouldn't occur on systems that
report slightly more than 64MB of total memory. This can happen due to the
total being the sum of both base and extended memory.
Submitted by: Alan Cox <alc@cs.rice.edu>


29789 24-Sep-1997 phk

Look for another couple of magic bios things..


29742 23-Sep-1997 bde

Moved setconf() call after root configuration again. This fixes a
null pointer panic in the "generic" version of setconf().

Removed the resulting near-duplicate printf.


29702 22-Sep-1997 peter

Turn on CR4_VME on the AP's the same as the BSP. Note that we do not
[yet] probe the AP's for their cpuid/capabilities etc, so this is a fudge
at best.

Problem noted by: Jonathan Lemon <jlemon@americantv.com>


29695 22-Sep-1997 gibbs

Oops. This file shouldn't have been committed.


29677 21-Sep-1997 gibbs

aha1542.c aic6360.c cy.c fd.c ft.c
if_ie.c if_wl.c if_zp.c isa.c isa_device.h
labpc.c mcd.c ncr5380.c scd.c seagate.c si.c
sio.c tw.c ultra14f.c wcd.c wd.c:

Update for changes in the callout interface.

apic_vector.s icu_vector.s ipl.s ipl_funcs.c:

Add CAM software/hardware interrupt support.


29675 21-Sep-1997 gibbs

autoconf.c:
Add cpu_rootconf and cpu_dumpconf so that configuring these
two devices can be better controlled by the MI configuration
code.

machdep.c:
MD initialization code for the new callout interface.

trap.c:
Add support for printing out whether cam interrupts are masked
during a panic.


29673 21-Sep-1997 gibbs

Move the rules for aicasm to the MI conf file.


29663 21-Sep-1997 peter

Implement the parts needed for VM86 under SMP.


29655 21-Sep-1997 dyson

Add support for more than 1 page of idle process stack on SMP systems.


29639 20-Sep-1997 phk

For AMD chips, pick up the long description from the chip if
possible. (This is not really a typographical improvement in the
case of the K6 it seems, but AMD appearantly want it too look
that way). Also if bootverbose, dump some more info about the
chip.


29613 19-Sep-1997 jmg

teach pnp to keep isa_device structs around, and teach isa.c how to scan
these structs for conflics...

it still exist that two PnP cards can colide, but this is up to the user
to make sure it doesn't happen...

other modifications to pnp.c to format output properly, and hide more
output behind bootverbose flag...

fix some bugons in pnp.h that would of made it difficult for inclusion
in external programs (for import of pnpinfo)


29399 14-Sep-1997 joerg

Addf flags 0x10 to the sio0 line, so it is available as a potential
console. This features backwards-compatibility to the era when sio(4)
was always available for a console.


29368 14-Sep-1997 peter

Update select -> poll in drivers.


29330 13-Sep-1997 joerg

Revert the logic behind my last change, and use a function called
`is_physical_memory()' now for the decision whether to dump some
region of memory or not.

Suggested by: davidg


29280 10-Sep-1997 joerg

Do not ever try to coredump adapter memory regions.

PR: 4486
Submitted by: tegge@idi.ntnu.no (Tor Egge)

Implement a function is_adapter_memory() in order to determine what
should nto be dumped at all. Currently, only populated with the ``ISA
memory hole''. Adapter regions of other busses should be added.


29243 09-Sep-1997 jmg

add neccessary calls to autoconf for pnp,

also teach userconfig about the new pnp commands, for usage see pnp(4)


29219 08-Sep-1997 peter

Change an assemble-time divide into a shift. Under binutils-2.8 gas in elf
mode, the slash is a comment leader, while under non-elf it is a divide
symbol (what a concept! :-). Theoretically, #APP/#NO_APP can change this
but that doesn't seem to mesh too well with macros and line continuation.


29213 07-Sep-1997 fsmp

General cleanup of the lock pushdown code. They are grouped and enabled
from machine/smptests.h:

#define PUSHDOWN_LEVEL_1
#define PUSHDOWN_LEVEL_2
#define PUSHDOWN_LEVEL_3
#define PUSHDOWN_LEVEL_4_NOT


29174 07-Sep-1997 dyson

Fix an intermittent problem during SMP code operation. Not all of the
idle page table directories for all of the processors was being updated
during kernel grow operations. The problem appears to be gone now.


29151 05-Sep-1997 peter

Argh, what was I thinking?? Don't (yet) halt the CPU in the idle loop
while waiting for an interrupt (rather than spinning on the runqueue status
bits), since the other cpu can put stuff in there and the sleeping cpu may
not get an interrupt for a while. When we have a reschedule IPI, this can
come back.

Pointed out by: fsmp


29128 05-Sep-1997 peter

Cosmetic adjustment for the trap/double fault/panic cpu id listing.
It now prints the apic id in hex rather than decimal.


29110 04-Sep-1997 dg

Cosmetic change to last commit: speculative_mtest -> speculative_mprobe.


29109 04-Sep-1997 dg

Changed the memory sizing code so that if the following conditions
are met:

1) The BIOS indicates that there is exactly 64MB of RAM, and
2) The memory size isn't specified with the MAXMEM option or
the npx0 msize hack,

...then do a speculative memory probe beyond the 64MB's until the
first bad page is encountered. This is an admitted hack, but should
nonetheless deal with detecting the correct amount of memory in nearly
all of the modern systems with >64MB of RAM.
Also made a change that will cause the list of detected memory chunks
to be printed if bootverbose is set.


29041 02-Sep-1997 bde

Removed unused #includes.


29040 02-Sep-1997 fsmp

Removed the "globl" nature of the vec array. This was left over from the
time when icu.s was common between UP and SMP. It is not necessary for UP
and thus can be removed from icu_ipl.s.


29000 01-Sep-1997 fsmp

General cleanup of the sub-system locking macros.
Eliminated the RECURSIVE_MPINTRLOCK.
clock.c and microtime use clock_lock.
sio.c and cy.c use com_lock.

Suggestions by: Bruce Evans <bde@zeta.org.au>


28999 01-Sep-1997 fsmp

Cleanup.


28984 01-Sep-1997 bde

Move closer to supporting VM86 under SMP.

LINT now compiles but doesn't link. Other link-time breakage for LINT
is now visible (SMP is incompatible with SIMPLELOCK_DEBUG).
Submitted by: jlemon


28981 01-Sep-1997 bde

Removed unused #includes.


28976 31-Aug-1997 bde

Fixed options SHOW_BUSYBUFS and PANIC_REBOOT_WAIT_TIME which were broken
by incomplete cutting and pasting from machdep.c to kern_shutdown.c.

PR: 3953


28951 31-Aug-1997 fsmp

Debug version of simple_lock. This will store the CPU id of the
holding CPU along with the lock. When a CPU fails to get the lock
it compares its own id to the holder id. If they are the same it
panic()s, as simple locks are binary, and this would cause a deadlock.

Controlled by smptests.h: SL_DEBUG, ON by default.

Some minor cleanup.


28942 30-Aug-1997 peter

Define some machine characteristics using symbol naming on conventions
in place in the other BSD's.


28921 30-Aug-1997 fsmp

Another round of lock pushdown.
Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel.
UP kernel expects that this is enough to guarantee exclusive access to
regions of code bracketed by these 2 functions.
Add a simplelock to bracket clock accesses in clock.c: clock_lock.

Help from: Bruce Evans <bde@zeta.org.au>


28918 30-Aug-1997 kato

Move MACHINE_ARCH definition from <machine/param.h> to <machine/cpu.h>.

Submitted by: Bruce Evans <bde@zeta.org.au>


28910 29-Aug-1997 fsmp

Support for the new FAST_HI algorithm, enabled.

Preliminary support for the INTR_SIMPLELOCK algorithm, disabled.
Note that this code is NOT ready.


28909 29-Aug-1997 fsmp

Support for the new FAST_HI algorithm.
Improved interrupt handling, fewer silo overflows.

With help from: dave adkins <adkin003@gold.tc.umn.edu>


28872 28-Aug-1997 jlemon

Remove the vm86 support as an LKM, and link it directly into the kernel
if 'options "VM86"' is in the config file. The LKM was really for
development, and has probably outlived its usefulness.


28847 28-Aug-1997 msmith

Here is a patch to alleviate the current problem with the dma interface
and the sound driver which uses auto dma.

The dma interface functionality remains however it now checks
to see if a dma is operating in auto dma mode and if so it bypasses
the busy flag check . I have modified the sound driver 3.5 to
adjust for this new behavior and tested it under FreeBSD 3.0 -current

This patch also includes the new function isa_dmastop.

Submitted by: Amancio Hasty <hasty@rah.star-gate.com>


28809 26-Aug-1997 peter

Correct some things I forgot about until it was too late with smp_active.
smp_active = 1 used to indicate that the system had frozen previously
started AP's, while smp_active = 0 was "AP's not yet started". I have split
this into smp_started (which is set when the AP's come online), and
smp_active is left for turning on/off AP scheduling.


28808 26-Aug-1997 peter

Clean up the SMP AP bootstrap and eliminate the wretched idle procs.

- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)

More cleanup to follow, this is more of a checkpoint than a
'finished' thing.


28755 25-Aug-1997 bde

Check for irq conflicts even if conflicts are allowed. Conflicting
irqs can't work (at best, the first one attached wins). It used to
be necessary to skip this check because of bogus irqs in the sound
drivers, but the sound drivers have been fixed, except possibly the
OSS ones.


28747 25-Aug-1997 bde

Finished (?) support for DISABLE_PSE option. 2-3MB of kernel vm was sometimes
wasted.

Fixed type mismatches for functions with vm_prot_t's as args. vm_prot_t
is u_char, so the prototypes should have used promoteof(u_char) to match
the old-style function definitions. They use just vm_prot_t. This depends
on gcc features to work. I fixed the definitions since this is easiest.
The correct fix may be to change vm_prot_t to u_int, to optimize for time
instead of space.

Removed a stale comment.


28743 25-Aug-1997 bde

Removed a bogus comment.


28717 25-Aug-1997 peter

s/.align/.p2align/ so that we get the same results when building elf
objects (the tools are a bit better)


28669 24-Aug-1997 fsmp

A clean fix for the spl "deadlock before smp_active" problem.

Added a new variable, 'bsp_apic_ready', which is set as soon as the bootstrap
CPU has initialized its local APIC. Conditionalize the GENSPLR functions
to call ss_lock ONLY after bsp_apic_ready is TRUE; This should prevent
any problems with races between the time the 1st AP becomes ready and the
time smp_active is set.


28641 24-Aug-1997 fsmp

The last of the encapsolation of cpl/spl/ipending things into a critical
region protected by the simplelock 'cpl_lock'.

Notes:

- this code is currently controlled on a section by section basis with
defines in machine/param.h. All sections are currently enabled.

- this code is not as clean as I would like, but that can wait till later.

- the "giant lock" still surrounds most instances of this "cpl region".
I still have to do the code that arbitrates setting cpl between the
top and bottom halves of the kernel.

- the possibility of deadlock exists, I am committing the code at this
point so as to exercise it and detect any such cases B4 the "giant lock"
is removed.


28551 21-Aug-1997 bde

#include <machine/limits.h> explicitly in the few places that it is required.


28496 21-Aug-1997 charnier

Revert my previous commit about using CS_SECURE macro.
Requested by: Bruce.


28487 21-Aug-1997 fsmp

Made PEND_INTS default.
Made NEW_STRATEGY default.
Removed misc. old cruft.

Centralized simple locks into mp_machdep.c
Centralized simple lock macros into param.h

More cleanup in the direction of making splxx()/cpl MP-safe.


28442 20-Aug-1997 fsmp

Preperation for moving cpl into critical region access.
Several new fine-grained locks.
New FAST_INTR() methods:
- separate simplelock for FAST_INTR, no more giant lock.
- FAST_INTR()s no longer checks ipending on way out of ISR.
sio made MP-safe (I hope).


28441 20-Aug-1997 fsmp

Preperation for moving cpl into critical region access.
Several new fine-grained locks.
Control of new FAST_INTR() methods.


28359 18-Aug-1997 charnier

Use CS_SECURE macro.
Reviewed by: John Dyson


28352 18-Aug-1997 fsmp

Removed volatile from arg to simple_lock & friends.


28231 15-Aug-1997 fsmp

The promised "better fix" for "Trap 9 When Boot SMP" problem.
We now tsleep() in kthread_init() between start_init()
and prepare_usermode() while waiting for ALL the idle_loop()
processes to come online.

Debugged & tested by: "Thomas D. Dean" <tomdean@ix.netcom.com>

Reviewed by: David Greenman <dg@root.com>


28138 13-Aug-1997 steve

Add parentheses because == has higher precedence than &.

PR: i386/4225
Submitted by: Frank MacLachlan <fpm@n2.net>


28124 12-Aug-1997 dyson

Back out a part of the disk scheduling "improvements" :-(. Let me know
how the system works now!!!


28044 10-Aug-1997 fsmp

Oops, fix breakage to UP kernel.


28043 10-Aug-1997 fsmp

Added trap specific lock calls: get_fpu_lock, etc.
All resolve to the GIANT_LOCK at this time, it is purely a logical partitioning.


28041 10-Aug-1997 fsmp

Cheap fix for kern/4255.
If the problem is seen this fix suggests a compile-time work-around then panics.


28027 09-Aug-1997 fsmp

Some fixes towards making "default configs" work again.
Still not fixed, no idea why.

Debug help from: "Thomas D. Dean" <tomdean@ix.netcom.com>


28026 09-Aug-1997 fsmp

Minor conditionalization of XXX_MPLOCK on PEND_INTS.


28023 09-Aug-1997 fsmp

Added 'lock' instruction before 3 places that update ipending.
This may or may not fix the "high IO freezes SMP kernel" problem.


28013 09-Aug-1997 dyson

Modify the scheduling policy to take into account disk I/O waits
as chargeable CPU usage. This should mitigate the problem of processes
doing disk I/O hogging the CPU. Various users have reported the
problem, and test code shows that the problem should now be gone.


28009 09-Aug-1997 dyson

A couple of missing doscmd header files. Messed up again. Now can
compile the kernel!!!
Submitted by: Jonathan Lemon <jlemon@americantv.com>


27993 09-Aug-1997 dyson

VM86 kernel support.
Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>,
Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>,
and probably alot of others.
Submitted by: Jnathan Lemon <jlemon@americantv.com>


27950 07-Aug-1997 dyson

Fix the DDB breakpoint code when using the 4MB page support.


27947 07-Aug-1997 dyson

More vm_zone cleanup. The sysctl now accounts for items better, and
counts the number of allocations.


27940 06-Aug-1997 peter

printf does not understand %hd in the kernel


27923 05-Aug-1997 dyson

Another attempt at cleaning up the new memory allocator.


27922 05-Aug-1997 dyson

Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use
the new zone code in pmap.c so that we can get rid of the ugly ad-hoc
allocations in pmap.c.


27906 05-Aug-1997 msmith

memcmp -> bmcp
Submitted by: smp, bde


27904 05-Aug-1997 dyson

Modify pmap to use our new memory allocator.


27903 05-Aug-1997 dyson

Slightly reorder some operations so that the main processor gets global
mappings early on.


27902 05-Aug-1997 dyson

Remove the PMAP_PVLIST conditionals in pmap.*, and another unneeded define.


27899 05-Aug-1997 dyson

Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator. This new allocator will also be
used for machine dependent pmap PV entries.


27893 04-Aug-1997 fsmp

Eliminate frequent silo overflows by restoring the TEST_LOPRIO code.
This code was eliminated when the PEND_INTS algorithm was added. But it was
discovered that PEND_INTS only worsen latency for FAST_INTR() routines,
which can't be marked pending.

Noticed & debugged by: dave adkins <adkin003@gold.tc.umn.edu>


27873 04-Aug-1997 msmith

Nuke the nonexistend pad bytes from the end of the DMI header structure.


27872 04-Aug-1997 msmith

Correctly checksum the DMI signature structure. Format the BSD revision
number therein.

Report from: dave adkins <adkin003@gold.tc.umn.edu>


27823 01-Aug-1997 msmith

Support functions for working with x86 PC-architecture BIOS.
Initially functionality is confined to 32-bit BIOS functions, however
it is envisioned that BIOS support may be enlisted for other
activities in the future.


27822 01-Aug-1997 msmith

Support for PC BIOS functions.


27808 31-Jul-1997 fsmp

Fixed imen declaration.

Submitted by: Bruce Evans <bde@zeta.org.au>


27780 31-Jul-1997 fsmp

Converted the TEST_LOPRIO code to default.
Created mplock functions that save/restore NO registers.
Minor cleanup.


27779 31-Jul-1997 fsmp

Converted the TEST_LOPRIO code to default.
removed PEND_INTS 1st try
direct call to MPtrylock


27778 31-Jul-1997 fsmp

Converted the TEST_LOPRIO code to default.


27749 29-Jul-1997 msmith

Return to using disable/enable_intr() for guarding DMA register access.
Mask the read value from the count register in order to return zero correctly
after TC, as per intel datasheet : "If it is not autoinitialised, this
register will have a count of FFFFH after TC"


27738 28-Jul-1997 msmith

Pedant attack! Use variable names consistent with discourse in
comments. Remove reduntant extra addition that was unncessary, and
unneeded mask (asuming inb works correctly).

Submitted by: Stephen McKay <syssgm@dtir.qld.gov.au>


27737 28-Jul-1997 msmith

Use disable_intr() / read/write_eflags() to ensure that interrupt
handlers don't skew the results of isa_dmastatus. The function can be
safely called with interrupts disabled.

Submitted by: Stephen McKay <syssgm@dtir.qld.gov.au>


27728 28-Jul-1997 fsmp

Modified the PEND_INTS algorithm to fix the ISA INT loss problem.

Noticed by: dave adkins <adkin003@gold.tc.umn.edu> and others.


27697 26-Jul-1997 fsmp

mpapic.c & mp_machdep:
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.

mplock.s:
- minor update of try_mplock for new algorithm where a CPU uses try_mplock
instead of get_mplock in the ISRs.


27696 26-Jul-1997 fsmp

clock.c:
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.

apic_vector.s:
- new algorithm where a CPU uses try_mplock instead of get_mplock:
if successful continue as before.
if fail set ipending bit, mask INT (to avoid recursion), cleanup & iret.

This allows the CPU to return to successful work, while the ISR will be run
by the CPU holding the lock as part of the doreti dance.


27663 24-Jul-1997 fsmp

param.h:
Macros to convert the Lite2 lock manager primitives to the names used
in the kernel proper. This allows us to hide them from the lock
manager till they can be turned on.
smp.h:
declarations for the new simplelock functions.


27654 24-Jul-1997 kato

Treat 6x86MX CPU as 686-class CPU instead of 586-class CPU.


27639 24-Jul-1997 msmith

Add isa_dmastatus() for reading the current ISA DMA counter for a
given channel.

Submitted by: luigi@labinfo.iet.unipi.it (Luigi Rizzo)


27638 24-Jul-1997 fsmp

Removed the defunct GET_MPLOCK/REL_MPLOCK macros.
These are no-ops for UP, and should have been removed when vector.s
was split into UP and SMP subsets.


27634 23-Jul-1997 fsmp

New simple_lock code in asm:
- s_lock_init()
- s_lock()
- s_lock_try()
- s_unlock()

Created lock for IO APIC and apic_imen (SMP version of imen)
- imen_lock

Code to use imen_lock for access from apic_ipl.s and apic_vector.s.
Moved this code *outside* of mp_lock.

It seems to work!!!


27633 23-Jul-1997 fsmp

Forced 32bit alignment of struct simple_lock in param.h.

Added declarations of new simple_lock data and functions to smp.h.


27619 23-Jul-1997 fsmp

Coded simple_lock and friends in asm.


27616 22-Jul-1997 fsmp

Last commit didn't take, operator error???


27615 22-Jul-1997 fsmp

Hid the existance of imen via a dump routine.


27607 22-Jul-1997 jkh

Well, consensus seems very split on this so I talked it over with DG
and he says he's happy to see forward movement in aligning our defaults
with a 16 bit world, the 8 bit folk already being veterans by this
point who know how to use userconfig.

In any case, perhaps Warner will soon come to save us all with his Dynamic
Probing(tm) feature and this will all become totally moot in any case,
so it's probably not worth arguing about either way.


27591 21-Jul-1997 fsmp

Enabled the FPU emilaute bit define: CR0_EM

Reviewed by: Bruce Evans <bde@zeta.org.au>


27568 21-Jul-1997 fsmp

Disabled 2 static inlines:
- INTRGET()
- INTRSET()

These were only used in if_ze.c (already removed) and npx.c. The code
in npx.c has also been cleaned of all APIC code.


27567 21-Jul-1997 fsmp

Made the SMP case ignore the possibility of an INT13 interface.
This eliminates all the APIC code, and thus several routines that
would otherwise need to be made MP-safe.

Reviewed by: Bruce Evans <bde@zeta.org.au>


27566 21-Jul-1997 dyson

Fix a crash that has manifest itself while running X after the 4MB
page upgrades.


27563 20-Jul-1997 fsmp

Developed a new strategy for handling the 8254/8259/APIC issue.


27561 20-Jul-1997 fsmp

Minor cleanup.
Pass string arg to apic_dump.
Moved bootverbose printing of SMP enabled INTs from clock.c to autoconf.c


27560 20-Jul-1997 fsmp

Minor cleanup.


27559 20-Jul-1997 fsmp

Pass string arg to apic_dump.


27555 20-Jul-1997 bde

Removed unused #includes.


27542 20-Jul-1997 bde

Removed unused #includes and a stale forward declaration.


27535 20-Jul-1997 bde

Removed unused #includes.


27523 19-Jul-1997 fsmp

Added code to support #define APIC_PIN0_TIMER.
This code ALWAYS runs the 8254 timer thru the 8259 ICU.
It depricates the usage of "options SMP_TIMER_NC" in the config file.


27522 19-Jul-1997 fsmp

Added #code to support define APIC_PIN0_TIMER.
This code ALWAYS runs the 8254 timer thru the 8259 ICU.
It depricates the usage of "options SMP_TIMER_NC" in the config file.


27520 19-Jul-1997 fsmp

SMP or APIC_IO:
- Increased NIDT to 256.
- Moved IPI vectors up above the linux compat vector.
- Removed runtime setup of RTC vector.


27519 19-Jul-1997 fsmp

Increased NIDT to 256 for case of SMP or APIC_IO.


27517 18-Jul-1997 fsmp

Split TEST_CPUSTOP code into CPUSTOP_ON_DDBBREAK and mainline code.


27490 18-Jul-1997 fsmp

Made the printing of the APIC INTs depend on bootverbose.


27489 18-Jul-1997 fsmp

printf cleanup.


27484 17-Jul-1997 dyson

Hopefully fix a few problems that could cause hangs in SMP mode.
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.

We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...


27464 17-Jul-1997 dyson

Add support for 4MB pages. This includes the .text, .data, .data parts
of the kernel, and also most of the dynamic parts of the kernel. Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)

The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.


27462 17-Jul-1997 peter

Remove the disable for the P5 cpu class bcopy using the FPU on SMP kernels,
it is understood to work now (and has been for quite a while apparently).


27424 15-Jul-1997 kato

Oops, added popfl after trynexgen label.

PR: 4091
Submitted by: Kazutaka YOKOTA <yokota@zodiac.mech.utsunomiya-u.ac.jp>


27410 15-Jul-1997 fsmp

Removed a stale "FIXME:".


27409 15-Jul-1997 jkh

Add SYSVSHM by default. Nobody seems to have objected too strongly
to this when raised, and most were in favor of at least this option
(some also asked for semaphores and messages, but I'll leave that argument
for another time :).


27408 15-Jul-1997 fsmp

Cleanup.


27407 15-Jul-1997 fsmp

Tighten up asm code for TEST_PRIO and other misc. things.
Use some new defines in place of "magic numbers".


27406 15-Jul-1997 fsmp

Tighten up asm code for EOI access.


27405 15-Jul-1997 fsmp

New defines to eliminate "magic numbers" in various places.


27353 13-Jul-1997 fsmp

new code to control other CPUs: stop_cpus()/restart_cpus()/_Xstopcpu
this code is controlled by smptests.h: TEST_CPUSTOP, OFF by default

new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
this code is controlled by smptests.h: TEST_ALTTIMER, ON by default


27352 13-Jul-1997 fsmp

Cleanup old stop_cpus/restart_cpus() cruft.
new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
new code to control other CPUs: stop_cpus()/restart_cpus()/_Xstopcpu


27351 13-Jul-1997 fsmp

Many new test defines, including:
- TEST_CPUSTOP adds stop_cpus()/restart_cpus(), OFF by default
- TEST_ALTTIMER new method for attaching 8259 PIC to APIC
this method avoids 'ExtInt' programming, ON by default
- TIMER_ALL sends 8259/8254 timer INTs to all CPUs, ON by default
- ASMPOSTCODExxx code to display bytes to POST hardware, OFF by default


27296 09-Jul-1997 ache

Back out my changes with 'conflicts' keyword for IRQs,
sounddriver fixed now.


27289 08-Jul-1997 fsmp

General cleanup of APIC code.
stop_cpus()/restart_cpus() STILL not working!


27288 08-Jul-1997 fsmp

Minor cleanup of APIC code.


27285 08-Jul-1997 fsmp

General cleanup of APIC code.
stop_cpus/restart_cpus STILL not working!


27255 07-Jul-1997 fsmp

stop_cpus(), currently BROKEN! (turned off in smptests.h by default).
restart_cpus(), currently BROKEN! (turned off in smptests.h by default).


27252 06-Jul-1997 fsmp

Additional debugging functions and macros.
"spurious INTerrupt" support.


27251 06-Jul-1997 fsmp

First cut at code for handling "spurious INTerrupts".
First cut at code for handling CPU stop/restart.

Notes:
not working properly yet.


27250 06-Jul-1997 fsmp

#ifdef out debug for now...


27249 06-Jul-1997 fsmp

Added a hook for a "spurious INTerrupt handler".


27133 01-Jul-1997 bde

Un-inline a call to spl0(). It is not time critical, and was only inline
because there was no non-inline spl0() to call.

Don't frob intr_nesting_level in idle() or cpu_switch(). Interrupts
are mostly disabled then, so the frobbing had little effect.


27131 01-Jul-1997 bde

Un-inline a call to spl0(). It is not time critical, and was only inline
because there was no non-inline spl0() to call.


27007 27-Jun-1997 fsmp

apic_vector.s:
- added Xcpustop IPI code to support stop_cpus()/restart_cpus().
it is off by default, enable via smptests.h:TEST_CPUSTOP

intr_machdep.h:
- moved +ICULEN to lower level.
- added entry for Xcpustop.


27005 27-Jun-1997 fsmp

Added POST code output to various points of the startup code.

General cleanup.

New functions to stop/start CPUs via IPIs:

- int stop_cpus( u_int map );
- int restart_cpus( u_int map );

Turned off by default, enabled via smptests.h:TEST_CPUSTOP.
Current version has a BUG, perhaps a deadlock?


27004 27-Jun-1997 fsmp

Experimental calls to stop_cpus()/restart_cpus() within breakpoint calls.
Turned off by default in smptests.h.


27003 27-Jun-1997 fsmp

Added other_cpus to CPU private page.

This variable is a bitmap showing all CPUs present EXCEPT the CPU
owning the variable. In other words, it is equal to the global bitmap
'all_cpus' minus its own bit.


27002 27-Jun-1997 fsmp

Preliminaries for stop_cpus()/restart_cpus().
Both are turned off by default.

Added macro for displaying POST codes from kernel.


27001 27-Jun-1997 fsmp

Program lint1 to handle NMIs.

Till now NMIs would be ignored. Now an NMI is caught by the BSP.
APs still ignore NMI, am working on code to allow a CPU to stop other CPUs
via an IPI.


27000 27-Jun-1997 fsmp

Added fields to the LVT1/2 group.


26994 27-Jun-1997 fsmp

Removed '#include <machine/smptests.h>' line, no longer needed.


26985 27-Jun-1997 kato

Added CPU_DIRECT_MAPPED_CACHE option which sets L1 cache in direct
mapped mode on Cyrix 486DLC box.


26954 26-Jun-1997 tegge

Back out a bad commit.


26950 25-Jun-1997 fsmp

Merged/renamed functions:

- get_isa_apic_mask() -> isa_apic_mask()
- get_isa_apic_irq() && get_eisa_apic_irq() -> isa_apic_pin()
- get_pci_apic_irq() -> pci_apic_pin()


26949 25-Jun-1997 fsmp

Modified to use merged/renamed functions:

- get_isa_apic_mask() -> isa_apic_mask()
- get_isa_apic_irq() && get_eisa_apic_irq() -> isa_apic_pin()


26948 25-Jun-1997 fsmp

Modified to declare merged/renamed functions:

- get_isa_apic_mask() -> isa_apic_mask()
- get_isa_apic_irq() && get_eisa_apic_irq() -> isa_apic_pin()
- get_pci_apic_irq() -> pci_apic_pin()


26945 25-Jun-1997 tegge

Allow the kernel configuration file to override the amount of memory
available to the kernel (VM_KMEM_SIZE). The default (32 MB) is too low
when having 512 MB or more physical memory in a server environment. This is
relevant on systems where "panic: kmem_malloc: kmem_map too small" is a
problem.


26944 25-Jun-1997 tegge

Allow kernel configuration file to override PMAP_SHPGPERPROC. The default
value (200) is too low in some environments, causing a fatal
"panic: get_pv_entry: cannot get a pv_entry_t". The same panic might
still occur due to temporary shortage of free physical memory
(cf. PR i386/2431).


26943 25-Jun-1997 tegge

Block some interrupts during the call to pmap_zero_page in
vm_page_zero_idle. This fixes some occurences of the problem
reported in PR kern/3216: "panic: pmap_zero_page: CMAP busy"


26896 24-Jun-1997 tegge

Ensure that the boot CPU honours write protection in kernel mode.
This fixes one of the problems noted in PR kern/3688.


26888 24-Jun-1997 kato

Recognize AMD K5 PR166 and PR200 CPUs.


26886 24-Jun-1997 fsmp

Fix calculation of initial mplock value.
We now use LOGICAL, not PHYSICAL, IDs to calculate the mplock.


26882 24-Jun-1997 fsmp

Fixed breakage for "default" configurations in mptable_pass1().


26812 22-Jun-1997 peter

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


26811 22-Jun-1997 peter

Kill some stale leftovers from the earlier attempts at SMP per-cpu pages


26771 21-Jun-1997 bde

Fixed va_arg() to work for small args (as in stdarg.h).


26659 15-Jun-1997 wollman

Fix another power down braino.


26657 15-Jun-1997 wollman

When APM is configured, turn off the power when halting for good.


26513 09-Jun-1997 ache

While deciding to install irq with unneded "conflicts" keyword,
additionly check that intr vector is non-NULL


26512 08-Jun-1997 ache

Add safety check in case "conflicts" keyword specified more times than
needed


26511 08-Jun-1997 ache

Make "conflicts" keyword work again


26494 07-Jun-1997 bde

Preserve %fs and %gs across context switches. This has a relatively low
cost since it is only done in cpu_switch(), not for every exception.
The extra state is kept in the pcb, and handled much like the npx state,
with similar deficiencies (the state is not preserved across signal
handlers, and error handling loses state).


26490 07-Jun-1997 bde

Updated comments.


26475 06-Jun-1997 jkh

YAMF22 - XSERVER comment changes.


26447 04-Jun-1997 pst

Document a non-standard gdbremote protocol extension (kludge, really)
that I snuck in to our GDB last year. This allows you to debug headless
machines by sharing the console port between the debugger and the system
console. It's not 100% reliabile, but it works well. It's optional
and disabled by default.
Submitted by: Juniper Networks


26388 02-Jun-1997 peter

Fill in some gaps in the cpuid features list..
bit 10 is the old bit for MTRR (presumably this changed, an older P5 I
have has got it, the newer cpus have the new MTRR bit set)
bit 11 is SEP (fast syscalls), bit 23 is MMX
Fill in the other reserved ones with a stub so that we can see them if
they turn up.

Obtained from: Intel AP-485 rev.06


26383 02-Jun-1997 kato

Added PC-98 code.


26379 02-Jun-1997 dfr

Change isa_device.h to intr_machdep.h


26373 02-Jun-1997 dfr

Move interrupt handling code from isa.c to a new file. This should make
isa.c (slightly) more portable and will make my life developing the really
portable version much easier.

Reviewed by: peter, fsmp


26309 31-May-1997 peter

Include file updates.. <machine/spl.h> -> <machine/ipl.h>, add
<machine/ipl.h> to those files that were depending on getting SWI_*
implicitly via <machine/cpufunc.h>


26305 31-May-1997 peter

remove #include of <machine/spl.h> - they are externed now

Reviewed by: bde


26302 31-May-1997 peter

The SWI_NET_MASK and SWI_TTY_MASK handlers are now back adjacent to the
top of the hardware interrupt handlers. Apparently this is slightly
faster with the bit scanning instruction that looks these up - this set of
changes reverts the original change.

Reviewed by: bde


26298 31-May-1997 kato

- Use `6x86MX' instead of `M2'. Cyrix officially use `6x86MX' for the
CPU code-named `M2'.

- Use the result of cpuid instruction instead of DIR to identify
6x86MX cpu. DIR0 and DIR1 are not documented in the data sheet, and
cpuid instruction is enabled at reset time.

- Add a function, init_6x86MX() to initialize 6x86MX cpu. It supports
CPU_SUSP_HLT and CPU_IORT options. It always sets NC1 (640K - 1M is
not cached.), and enables L1 cache in write-back mode.

- Fix typo in the comment in identblue().


26270 29-May-1997 fsmp

Code such as apic_base[APIC_ID] converted to lapic__id

Changes to pmap.c for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.


26269 29-May-1997 fsmp

apic.h now has structure definitions for both the local APIC and io APIC.

apic.h has defines like:
#define lapic__id lapic->id

Once private pages and "known virtual addr" mapping of the APICs is
ready all 'lapic__XXX' will be changed to 'lapic.XXX', and the defines
will be removed.

Changes to smp.h for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.


26268 29-May-1997 fsmp

Added code to manage the local and io APICs as structures.


26267 29-May-1997 peter

remove no longer needed opt_smp.h includes


26266 29-May-1997 peter

minor style police (recent divergence from KNF code)


26265 29-May-1997 peter

remove opt_smp.h and fix the reason it was needed.


26264 29-May-1997 peter

No longer need opt_smp.h here


26263 29-May-1997 peter

remove opt_smp.h from this well-included file, minor style police


26262 29-May-1997 peter

remove opt_smp.h, minor style police


26252 28-May-1997 fsmp

Add declaration of mp_probe().

This is now called directly from machdep.c.


26203 27-May-1997 fsmp

Nuke the printing of the unredirect message unless bootverbose.


26174 26-May-1997 se

Yet another fix for configuration mechanism 1 register accesses:
Adjust the data port address by adding the two low order bits of
the register number. The address port takes only a word address
(i.e. ignores the two low order bits written to it).


26173 26-May-1997 se

Fix previous fix: The enable bit is bit 31 (0x8000000) and not bit 15.


26172 26-May-1997 se

Set enable bit when writing the configuration address in configuration
mode 1. Omission of this bit makes all config register accesses fail in
on recent chip sets ...

(The problem was reported and debug output provided by: Steve Passe)


26171 26-May-1997 fsmp

Fix breakage from my last commit where mp_start() was missing from UP builds.


26169 26-May-1997 fsmp

Changed inclusion of isa/icu.s to isa/ipl.s.
This is part of the breakup of UP/SMP specific INTerrupt code.


26168 26-May-1997 fsmp

Split vector.s into UP and SMP specific files:
- vector.s <- stub called by i386/exception.s
- icu_vector.s <- UP
- apic_vector.s <- SMP

Split icu.s into UP and SMP specific files:
- ipl.s <- stub called by i386/exception.s (formerly icu.s)
- icu_ipl.s <- UP
- apic_ipl.s <- SMP

This was done in preparation for massive changes to the SMP INTerrupt
mechanisms. More fine tuning, such as merging ipl.s into exception.s,
may be appropriate.


26159 26-May-1997 se

Completely replace the PCI bus driver code to make it better reflect
reality. There will be a new call interface, but for now the file
pci_compat.c (which is to be deleted, after all drivers are converted)
provides an emulation of the old PCI bus driver functions. The only
change that might be visible to drivers is, that the type pcici_t
(which had been meant to be just a handle, whose exact definition
should not be relied on), has been converted into a pcicfgregs* .

The Tekram AMD SCSI driver bogusly relied on the definition of pcici_t
and has been converted to just call the PCI drivers functions to access
configuration space register, instead of inventing its own ...

This code is by no means complete, but assumed to be fully operational,
and brings the official code base more in line with my development code.

A new generic device descriptor data type has to be agreed on. The PCI
code will then use that data type to provide new functionality:

1) userconfig support
2) "wired" PCI devices
3) conflicts checking against ISA/EISA
4) maps will depend on the command register enable bits
5) PCI to Anything bridges can be defined as devices,
and are probed like any "standard" PCI device.

The following features are currently missing, but will be added back,
soon:

1) unknown device probe message
2) suppression of "mirrored" devices caused by ancient, broken chip-sets

This code relies on generic shared interrupt support just commited to
kern_intr.c (plus the modifications of isa.c and isa_device.h).


26157 26-May-1997 se

Add support for shared interrupts to the kernel. This code is meant
be (eventually) architecture independent. It provides an emulation
of the ISA interrupt registration function register_intr(), but that
function does no longer manipulated the interrupt controller and
interrupt descriptor table, but calls the architecture dependent
function setup_icu() for that purpose.

After the ISA/EISA bus code has been modified to directly call the new
interrupt registartion functions (intr_create() and intr_connect()),
the emulation of register_intr() should be dropped.

The C level interrupt handler function should take a (void*) argument,
and the function pointer type (inthand2_t) should defined in some other
place than isa_device.h.

This commit is a pre-requisite for the removal of the PCI specific shared
interrupt code.

Reviewed by: dfr,bde


26155 26-May-1997 fsmp

Added a test called 'LATE_START'.

This is now the default, it delays most of the MP startup to the function
machdep.c:cpu_startup(). It should be possible to move the 2 functions
found there (mp_start() & mp_announce()) even further down the path once
we know exactly where that should be...

Help from: Peter Wemm <peter@spinner.dialix.com.au>


26129 25-May-1997 fsmp

Made the array vec[] a global.
This allows the APIC code to reorder the vectors at runtime.


26108 25-May-1997 fsmp

Broke up parse_mp_table() into 2 passes:
- The 1st (preparse_mp_table()) counts the number of cpus, busses, etc. and
records the LOCAL and IO APIC addresses.
- The 2nd pass (parse_mp_table()) does the actual parsing of info and recording
into the incore MP table.

This will allow us to defer the 2nd pass untill malloc() & private pages
are available (but thats for another day!).


26102 24-May-1997 fsmp

Delay mp_start() till after the msgbuf is mapped. We really want to delay
it till even later but tss setup prevents that right now...


26101 24-May-1997 fsmp

Now that panic() is properly printing messages for early SMP panics all
the 'printf("..."); panic("\n")' sections are returned to 'panic("...")'.


26037 23-May-1997 charnier

typo (Cyirx -> Cyrix).


26019 22-May-1997 fsmp

Convert all:
panic( "xxxxx\n" );

to:
printf( "xxxxx\n" );
panic( "\n" );

For some as yet undetermined reason the argument to panic() is often NOT
printed, and the system sometimes hangs before reaching the panic printout.
So we hopefully at least print some useful info before the hang, as oppossed to
leaving the user clueless as to what has happened.


25985 21-May-1997 jdp

This commit affects ELF kernels only.

Remove "setdefs.h" and arrange to generate it automatically at
ELF kernel build time.

"gensetdefs.c" is a utility which scans a set of ELF object files
and outputs a line ``DEFINE_SET(name, length);'' for each linker
set that it finds. When generating an ELF kernel, this is run just
before the final link to generate "setdefs.h".

Remove the init_sets() function from "setdef0.c", and its call from
"machdep.c". Since "gensetdefs.c" calculates the length of each
set, it is no longer necessary in an ELF kernel to count the set
elements at kernel initialization time. Also remove "set_of_sets"
which was used for this purpose.

Link "setdef0" and "setdef1" into the kernel only if building for
ELF. Since init_sets() is no longer used, there is no need to link
them into an a.out kernel.


25984 21-May-1997 jdp

Fill out the ELF header files to make them more or less complete.
Fix a macro name that was misspelled both in brandelf.c and
imgact_elf.h.


25982 21-May-1997 jdp

Make setbits() SMP-safe. Eliminate the SETBITS() macro, and replace
all uses of it with the equivalent calls to setbits().

This change incidentally eliminates a problem building ELF kernels
that was caused by SETBITS.

Reviewed by: fsmp, peter
Submitted by: bde


25925 19-May-1997 kato

Recognize AMD 486 CPUs.


25837 15-May-1997 tegge

Ignore the supplied nfs_diskless structure from the bootstrap loader
if we want to use NFS v3 to mount root and swap.


25723 11-May-1997 tegge

Bring in some kernel bootp support. This removes the need for netboot
to fill in the nfs_diskless structure, at the cost of some kernel
bloat. The advantage is that this code works on a wider range of
network adapters than netboot. Several new kernel options are
documented in LINT.
Obtained from: parts of the code comes from NetBSD.


25711 11-May-1997 bde

Fixed initialization of ldt[]. Unused entries were garbage. A comment
was stale.

Fixed initialization of gdt[] for the BDE_DEBUGGER case. APM entries
clobbered debugger entries if the debugger was loaded (APM is incompatible
with BDE_DEBUGGER) and unused entries were garbage if the debugger wasn't
loaded.


25648 10-May-1997 bde

Cleaned up #includes. Lite2 cleaned up <sys/mount.h> so no kludges
are required for NFS now.

Ifdefed SMP #defines.


25559 07-May-1997 fsmp

fix bug in get_isa_apic_mask() where EISA bus was ignored.

Submitted by: Peter Wemm <peter@spinner.DIALix.COM>


25558 07-May-1997 peter

Don't allow access to illegal addresses in /dev/kmem to panic kernel
(eg: above 0xffc00000). Programs using /dev/kmem are implicitly racing
the kernel, and can get right up high in memory. I've been running
these for some time now, but with printfs. It's saved two panics at
least that I can remember.


25557 07-May-1997 peter

clean up forked child creation. This is simplified also by having
md_regs being struct trapframe *. Do a npxsave() if needed and copy the
pcb rather than use the increasingly defunct savectx(). Copy %edi and
%ebp explicitly.

Submitted by: bde

XXX npxproc could be declared in npx.h so the externs with smp fruit
are not needed.


25556 07-May-1997 peter

md_regs is struct trapframe * now, rather than int []
Remove TF_REGP() macro and use. The original reason (address space
problems due to having UPAGES in mapped into user space) is gone. It
looks cleaner without it.


25555 07-May-1997 peter

md_regs is now a struct trapframe *


25554 07-May-1997 peter

forgotten comment


25552 07-May-1997 peter

simplify IOPL gain/remove privs code. It's easier with md_regs
being a trapframe.


25550 07-May-1997 peter

remove now redundant (struct trapframe *) cast


25549 07-May-1997 peter

Convert md_regs from an int[] to a struct trapframe *. It simplifies
some code.


25548 07-May-1997 peter

remove #include "opt_smp.h"
remove declarations for the SMPcurproc[NCPU] etc arrays. There was no
need to mention NCPU there, and they've been moved to their normal home.


25547 07-May-1997 peter

remove #include "opt_smp.h" and <machine/smp.h>. Slightly elaborate on
a comment.


25545 07-May-1997 peter

remove #include opt_smp.h
declare SMPcurpcb[] next to #define and uniprocessor counterpart


25517 06-May-1997 fsmp

Force user to config SMP kernel with "options APIC_IO".

Reviewed by: Peter Wemm <peter@spinner.DIALix.COM>


25499 05-May-1997 fsmp

Code to handle SMP/APIC_IO mapping of ISA INTs to APIC pins above IRQ15.

- doesn't break my system.
- NOT yet verified on the affected motherboard.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25498 05-May-1997 fsmp

Code to handle SMP/APIC_IO mapping of ISA INTs to APIC pins above IRQ15.

- doesn't break my system.
- NOT yet verified on the affected motherboard.

Stifle an annoying dma_start busy message for the sound cards.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25495 05-May-1997 kato

Use `MediaGX' instead of `Gx86'.


25494 05-May-1997 kato

Use `M2' instead of `6x86 with MMX'. Cyrix seems to use `M2' officially.


25485 05-May-1997 peter

correct the order of the variables
use #ifdef where possible instead of #if defined

Submitted by: the KNF police, ie: bde :-)


25472 05-May-1997 dyson

Make sure that *fork() always returns with %edx == 1 in the
child. This was sometimes not happening correctly during my
threads code work.


25460 04-May-1997 joerg

This mega-commit brings the following:

. It makes cd9660 root f/s working again.
. It makes CD9660 a new-style option.
. It adds support to mount an ISO9660 multi-session CD-ROM as the root
filesystem (the last session actually, but that's what is expected
behaviour).

Sigh. The CDIOREADTOCENTRYS did a copyout() of its own, and thus has
been unusable for me for this work. Too bad it didn't simply stuff
the max 100 entries into the struct ioc_read_toc_entry, but relied on
a user supplied data buffer instead. :-( I now had to reinvent the
wheel, and created a CDIOREADTOCENTRY ioctl command that can be used
in a kernel context.

While doing this, i noticed the following bogosities in existing CD-ROM
drivers:

wcd: This driver is likely to be totally bogus when someone tries
two succeeding CDIOREADTOCENTRYS (or now CDIOREADTOCENTRY)
commands with requesting MSF format, since it apparently
operates on an internal table.

scd: This driver apparently returns just a single TOC entry only for
the CDIOREADTOCENTRYS command.

I have only been able to test the CDIOREADTOCENTRY command with the
cd(4) driver. I hereby request the respective maintainers of the
other CD-ROM drivers to verify my code for their driver. When it
comes to merging this CD-ROM multisession stuff into RELENG_2_2 i will
only consider drivers where i've got a confirmation that it actually
works.


25457 04-May-1997 peter

Don't remove i586_ctr_freq from scope, leave it defined as zero. This
simplifies some assumptions and stops some code compile problems.

This should fix the compile hiccup in PR#3491, but smp kernel profiling
isn't likely to be fixed by this.


25421 03-May-1997 fsmp

added declaration for get_isa_apic_mask().

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25419 03-May-1997 fsmp

new function to turn an APIC pin# into an INT mask.
added missing APIC_IO define.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25362 01-May-1997 fsmp

cleaned up FAST_IPI code.
- one-liners all become inline.
- multi-liners become functions.
- FAST_IPI defines go away.

re-worked APICIPI_BANDAID code.
- now refered to as DETECT_DEADLOCK.
- on by default.


25361 01-May-1997 fsmp

fixed spelling error.

Submitted by: Bruce Albrecht <bruce@zuhause.mn.org>


25320 30-Apr-1997 fsmp

changed expect_lock() to try_lock(), the real name used in mplock.s


25292 29-Apr-1997 fsmp

Enabled 'FIX_MP_TABLE_WORKS' code.
This code re-numbers PCI busses in the MP table to match PCI semantics
when the MP BIOS fails to do it properly.

Reviewed by: Peter Wemm <peter@spinner.DIALix.COM>


25291 29-Apr-1997 peter

Use a common numbering of the tty and net software interrupt levels
between the SMP and non-SMP case. It simplifies the #ifdef's, since
NHWI changes (at least for the moment) when APIC's are involved.


25243 28-Apr-1997 fsmp

cleaned out an old FIXME.


25218 28-Apr-1997 fsmp

removed TEST_CPUHITS code.

replaced push/pop of %ds with use of 'ss' prefix in Xinvltlb.

Submitted by: Bruce Evans <bde@zeta.org.au>


25216 28-Apr-1997 fsmp

removed all the TEST_UPPERPRIO crud.


25215 28-Apr-1997 fsmp

remove all the SMP_INVLTLB defines, making the code default for APIC_IO.

Reviewed by: informal discussion with Peter Wemm <peter@spinner.DIALix.COM>


25205 27-Apr-1997 fsmp

informal discussion between Bruce Evans <bde@zeta.org.au>,
Peter Wemm <peter@spinner.DIALix.COM>, Steve Passe <smp@csn.net>

removed all the IPI_INTS code.
made the XFAST_IPI32 code default, renaming Xfastipi32 to Xinvltlb.
cleanup of i386/isa/isa_device.h to eliminate SMP dependancies:
made the id_irq member of struct isa_device an u_int.
made the id_drq member of struct isa_device an int.
removed all other '#ifdefs' concerning SMP & APIC_IO.
removed SMP/APIC_IO dependancies from if_ze.c.


25204 27-Apr-1997 fsmp

informal discussion between Bruce Evans <bde@zeta.org.au>,
Peter Wemm <peter@spinner.DIALix.COM>, Steve Passe <smp@csn.net>

removed all the IPI_INTS code.
made the XFAST_IPI32 code default, renaming Xfastipi32 to Xinvltlb.


25194 27-Apr-1997 peter

Whoops.. We forgot to turn off the 4MB Virtual==Physical mapping at address
zero from bootstrap in the non-SMP case.

Noticed by: bde


25178 26-Apr-1997 peter

Try and make these usermode safe, Steve beat me in finding these..


25175 26-Apr-1997 peter

Remove the curproc printing on trap/interrupt/etc. It's outlived it's
usefulness, and there were problems with it anyway.

Found by: bde


25173 26-Apr-1997 peter

Back out bogus code that slipped past my read of the pre-merge diff
(Problems noted by Bruce)


25172 26-Apr-1997 peter

Fix some SMP merge bugs (from Bruce) -
#include out of order
pccard_configure() called twice
munged tab (existing problem made worse)


25164 26-Apr-1997 peter

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


25159 26-Apr-1997 kato

Add new cpu type, CPU_CY486DX, which shows Cyrix 486S/DX series CPUs,
and initialization routine for those CPUs.

Tested by: Bob Bishop <rb@gid.co.uk>


25111 23-Apr-1997 bde

Fixed longstanding profiling bug. The frame pointer wasn't set up
for syscalls, so one frame was lost in backtraces from syscalls.
This is handled better in the kernel by using a different mcount
entry point for profiling before the frame pointer is set up.

Expand RCSID().

Use .p2align instead of the ambiguous .align.

Added idempotency ifdef.

Removed unused macros ALTENTRY(), ALTASENTRY(), ASENTRY(), _MID_ENTRY.

Cleaned up formatting.

Reviewed by: jdp reviewed an old version
Obtained from: parts from NetBSD


25083 22-Apr-1997 jdp

Make the necessary changes so that an ELF kernel can be built. I
have successfully built, booted, and run a number of different ELF
kernel configurations, including GENERIC. LINT also builds and
links cleanly, though I have not tried to boot it.

The impact on developers is virtually nil, except for two things.
All linker sets that might possibly be present in the kernel must be
listed in "sys/i386/i386/setdefs.h". And all C symbols that are
also referenced from assembly language code must be listed in
"sys/i386/include/asnames.h". It so happens that failure to do
these things will have no impact on the a.out kernel. But it will
break the build of the ELF kernel.

The ELF bootloader works, but it is not ready to commit quite yet.


25038 20-Apr-1997 phk

Fix up the "hlt vector" change I made.
Reviewed by: bde, bde, bde


25015 19-Apr-1997 kato

Don't disable CPU cache in init_486dlc. If BIOS supports Cyrix 486,
BIOS enables CPU cache and other registers. If BIOS does not supports
it, CPU cache is disabled at reset time.

This commit closes PR/3292.

PR: 3292


24980 16-Apr-1997 kato

Use reset port before clearing page table in cpu_reset if PC98 is
defined. Clearing page table could hang some new PC-98.


24965 15-Apr-1997 bde

Only do indirections in ENTRY() if _ARCH_INDIRECT is defined.


24933 14-Apr-1997 phk

Forget all about APM. Instead of "hlt" call through a vector which
APM can then fiddle with. Default for the vector is to "htl; ret"


24929 14-Apr-1997 bde

Use the same IOPL check as in syscons.
Reviewed by: pst, joerg


24925 14-Apr-1997 bde

Fixed printing of registers in dbflalt_handler(). The registers
were always in a tss; that tss just changed from the one in the
pcb to common_tss (who knows where it was when there was no curpcb?).
Not using the pcb also fixed the problem that there is no pcb in
idle(), so we now always get useful register values.


24908 14-Apr-1997 gibbs

GENERIC, LINT:
Add an ie entry that corresponds to the location the old ix entry used
to probe and kill the ix entry.

files.i386:
Remove entries for the ix driver.


24900 13-Apr-1997 bde

Don't forget to set `runtime' in fork_trampoline(). The time slice before
switching to a child for the first time was being counted twice. I think
this only affected unimportant statistics.

Simplified arg handling in fork_trampoline(). splz() doesn't actually
smash the registers of interest.


24852 13-Apr-1997 dyson

Decrease the amount of memory allocated for bouncing. This will
allow large systems to boot successfully with bounce buffers compiled
in. We are now limiting bounce space to 512K. The 8MB allocated for
a 512MB system is very bogus -- and that is now fixed.


24851 13-Apr-1997 dyson

The pmap code was too generous in the allocation of kva space for
the pv entries. This problem has become obvious due to the increase
in the size of the pv entries. We need to create a more intelligent
policy for pv entry management eventually.
Submitted by: David Greenman <dg@freebsd.org>


24848 13-Apr-1997 dyson

Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.

Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.


24743 09-Apr-1997 se

Mask out revision register in consistency test of class register.


24740 09-Apr-1997 se

Fix spelling of align and interrupt in comments.


24739 09-Apr-1997 se

Fix consistency test to not fail on pre PCI 2.0 motherboards


24702 07-Apr-1997 peter

Lower the spl() of the new process from splhigh() right away, since
nothing else will lower it until either much later, or never(?) for
kernel processes.

This basically re-fixes what Bruce fixed in rev 1.29 of kern_fork.c,
which was broken again now the child does not execute back up the fork()
calling tree.


24696 07-Apr-1997 peter

Use UPAGES_HOLE instead of UPAGES in case it's changed some time.

Rename the PT* index KSTK* #defines to UMAX*, since we don't have a kernel
stack there any more..

These are used to calculate VM_MAXUSER_ADDRESS and USRSTACK, and really
do not want to be changed with UPAGES since BSD/OS 2.x binary compatability
depends on it.


24693 07-Apr-1997 peter

Clean up some dead wood. Kill the page table page for mapping the
proc0/idlePTD/bootstrap stack into place in user space. We save 4K.
Remove p0upa, it is now unneeded.


24691 07-Apr-1997 peter

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


24690 07-Apr-1997 peter

No longer use an i386tss as the basis of our pcb - it wasn't particularly
convenient and makes life difficult for my next commit. We still need
an i386tss to point to for the tss slot in the gdt, so we use a common
tss shared between all processes.

Note that this is going to break debugging until this series of commits
is finished. core dumps will change again too. :-( we really need
a more modern core dump format that doesn't depend on the pcb/upages.

This change makes VM86 mode harder, but the following commits will remove
a lot of constraints for the VM86 system, including the possibility of
extending the pcb for an IO port map etc.

Obtained from: bde


24676 06-Apr-1997 mckay

Prevent wedging of the stat clock because of missed interrupts.
This should cure the "alternate system clock has died!" problem.

Discussed with: bde, joerg


24666 06-Apr-1997 dyson

Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)


24494 01-Apr-1997 bde

Removed a wrong comment of mine.

Removed unused #includes.


24437 31-Mar-1997 dg

Changed the way that the exec image header is read to be filesystem-
centric rather than VM-centric to fix a problem with errors not being
detectable when the header is read.
Killed exech_map as a result of these changes.
There appears to be no performance difference with this change.


24418 30-Mar-1997 joerg

Implement the `detach' command for remote GDB. It gets you back at DDB.


24372 29-Mar-1997 phk

Sanitize APM a bit. Convert various #ifdef to id_flags instead.
You may want to add "flags 0x31" to apm0 if you have a lousy
implementation. Read LINT.


24361 29-Mar-1997 bde

Don't keep cpu interrupts enabled during the lookup in vm_page_zero_idle().
Lookup isn't done every time the system goes idle now, but it can still
take > 1800 instructions in the worst case, so if cpu interrupts are kept
disabled then it might lose 20 characters of sio input at 115200 bps.

Fixed style in vm_page_zero_idle().


24345 28-Mar-1997 bde

Added a setjmp() and a longjmp() so that an unexpected trap inside
ddb isn't necessarily fatal. You can now do silly things like
`call vprint' and `show map' without losing control.


24344 28-Mar-1997 bde

Backed out rev.1.5. if %cs is bad, %eip may be bad, but this is no longer
fatal.


24342 28-Mar-1997 joerg

Something long overdue: compile inb() and outb() into the kernel as
functions if DDB is available. The remaining occurences are usually
only inlined and thus not available in DDB.

I'm sure Bruce will have 23 additions to these 30 lines of code, but
at least it's a starting point. ;-)


24334 28-Mar-1997 ache

Remove recently commited support for iobase == -2 ("port none")
is is really probe routine task (return -1 for no ports)


24283 25-Mar-1997 mpp

Change sigreturn() to return EFAULT if it is passed an
address outside of the process's address space.
Now it matches its man page :-). Closes PR# 2682.

Discussed with: bde
Submitted by: Jonathan Lemon <jlemon@americantv.com>


24237 25-Mar-1997 ache

Replace more verbose "at <not configured>" with less verbose "at ?",
we don't need much attention here, because this diagnostic printed first
and then card will be configured.


24236 25-Mar-1997 ache

Follow config intention for iobase:
print "at <not configured>" for iobase == -1 (autodetect not happens)
and not print anything for iobase == -2 (none)
Old code treat this two special config numbers as big port numbers.


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


24200 24-Mar-1997 kato

Fix typo.
Submitted by: Bruce Evans <bde@zeta.org.au>


24113 22-Mar-1997 kato

Oops, I forgot to `cvs add'. This file is a part of new CPU
identification and initialization routines.


24112 22-Mar-1997 kato

Improved CPU identification and initialization routines. This
supports All Cyrix CPUs, IBM Blue Lightning CPU and NexGen (now AMD)
Nx586 CPU, and initialize special registers of Cyrix CPU and msr of
IBM Blue Lightning CPU.

If revision of Cyrix 6x86 CPU < 2.7, CPU cache is enabled in
write-through mode. This can be disabled by kernel configuration
options.

Reviewed by: Bruce Evans <bde@freebsd.org> and
Jordan K. Hubbard <jkh@freebsd.org>


24099 22-Mar-1997 dyson

Decrease the latency/overhead in the prezero code when there is
an adequate number of prezeroed pages.


23860 13-Mar-1997 bde

Quoted CMD640. It's still missing from options.i386.

Removed stale comment saying that npx0 is mandatory.


23819 12-Mar-1997 se

Activate CMD640 workaround


23576 09-Mar-1997 bde

Moved userland assembler macros from <machine/asmacros.h> to
<machine/asm.h>.


23571 09-Mar-1997 bde

Cloned src/lib/libc/i386/DEFS.h to create <machine/asm.h> for the i386.
The former file was too hard to get at from other parts of the src tree
and will go away.


23415 05-Mar-1997 se

improve pcibus_check: Only assume PCI if at least one PCI to anything bridge
on bus 0.
This fixes problems with EISA-only systems mistakenly being assumed to support PCI.


23409 05-Mar-1997 bde

Made FPU stuff conditional on npx as well as I586_CPU.


23393 05-Mar-1997 bde

Only print clock calibration messages if the system was booted with -v.

Submitted by: partly by gpalmer


23386 05-Mar-1997 gpalmer

Back out the patch to break up the clock probe lines. Instead, follow
Bruce's suggestion of deleting "relative to mc146818A clock ",
thus shortening the line ...


23375 04-Mar-1997 gpalmer

Split the rather long and line-wrapping clock probe messages on boot.
(2.2?)

Submitted by: Mathew Dood <winter@jurai.net>


23230 01-Mar-1997 ache

Add missing #include <machine/segments.h> for ISPL and SEL_UPL macros


23206 28-Feb-1997 bde

Print function args in the current radix instead of always in hex.

Print the stack pointer together with the frame pointer in the trap,
syscall and interrupt messages. The frame pointer is not very useful
for locating syscall args since syscall functions don't have a frame
pointer.

Print all the numbers in the trap, syscall and interrupt messages in
the default radix. The syscall number was confusing because it was
printed in decimal.

Use %#n format more and 0x%x less. 0x%x of course doesn't work with
a variable radix. ddb is now fairly consistent about using %+#n to
print all numbers. It omits the '+' for signed numbers the '#' in a
few cases (e.g., for function args) to save space.


23184 28-Feb-1997 bde

Fixed the gcc ellipsis change to work with gcc-1.x.


23070 24-Feb-1997 alex

Typo police.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22827 17-Feb-1997 bde

Replaced START_ENTRY by _START_ENTRY. -current hasn't got my cleanup
of DEFS.h which renamed it.


22808 16-Feb-1997 bde

Select between the generic math functions and the i387-specific ones
at runtime.

etc/make.conf:
Nuked HAVE_FPU option.

lib/msun/Makefile:
Always build the i387 objects. Copy the i387 source files at build
time so that the i387 objects have different names. This is simpler
than renaming the files in the cvs repository or repeating half of
bsd.lib.mk to add explicit rules.

lib/msun/src/*.c:
Renamed all functions that have an i387-specific version by adding
`__generic_' to their names.

lib/msun/src/get_hw_float.c:
New file for getting machdep.hw_float from the kernel.

sys/i386/include/asmacros.h:
Abuse the ENTRY() macro to generate jump vectors and associated code.
This works much like PIC PLT dynamic initialization. The PIC case is
messy. The old i387 entry points are renamed. Renaming is easier
here because the names are given by macro expansions.


22639 13-Feb-1997 bde

Moved definition of FUNCTION_ALIGNMENT to a machine-dependent place.
Changed it from 4 to 16 for i386's. It can be anything for i386's,
but compiler options limit it to a power of 2, and assembler and
linker deficiencies limit it to a small power of 2 (<= 16).
We use 16 in the kernel to get smaller tables (see Makefile.i386 and
<machine/asmacros.h>). We still use the default of 4 in user mode.

Use HISTCOUNTER instead of (*kcount) in the definition of KCOUNT()
for consistency with other macros.


22636 13-Feb-1997 bde

Align text to 16-byte boundaries if profiling is enabled. This will
allow a fourfold reduction in the size of the profiling buffers. This
goes with rev.1.91 of Makefile.i386 which does the same thing for C
functions.


22564 11-Feb-1997 bde

Restored changes from rev.1.58-1.60 which were blown away by the
previous commit.


22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


22415 07-Feb-1997 phk

I have no idea what this is all about, but it works and Bruce hasn't
complained so it cannot be entirely bad :-)

I include the email that probably explains it for people who already know:

> >Compiling with -O3 inlines functions. However the function that is being
> >inlined in makeinfo.c (add_word_args()) is a vararg function and must not be
> >inlined.
> >
> >The code in question is K&R style, and AFIK, there is no way for the compiler
> >to determine that the function uses vararg. Either change the code to use
> >prototypes, or use stdarg, or add a directive to prevent inlining.
>
> Not declaring a varargs function as varargs before it is used gives
> undefined behaviour.
>
> However, in practice the bug is probably in FreeBSD's <varargs.h>, which
> doesn't use gcc's __builtin_next_arg(). gcc should notice that it is
> used and not inline functions that have it. <stdarg.h.> uses it, but I
> think there's another gcc builtin that it should be using.

Patch attached. The ellipsis causes gcc to flag this as a varargs function,
and the name "__builtin_va_alist" is special cased in gcc to hide the last
argument in the arglist.

Reviewed by: bde & phk
Submitted by: jlemon@americantv.com (Jonathan Lemon)


22203 02-Feb-1997 kato

Deleted i386_cpus[]. i386_cpus[] is a static variable in identcpu.c.

Found-by: lint


22130 30-Jan-1997 dg

Removed PG_N from here, too. Some machines don't like it and it's unnecessary.


22129 30-Jan-1997 dg

Removed unnecessary PG_N flag from device memory mappings. This is handled
by the CPU/chipset already and was apparantly triggering a hardware bug that
causes strange parity errors.


22106 29-Jan-1997 bde

Estimate an initial overhead of 0 usec instead of 20 usec in DELAY().
I have code to calibrate the overhead fairly accurately, but there
is little point in using it since it is most accurate on machines
where an estimate of 0 works well. On slow machines, the accuracy
of DELAY() has a large variance since it is limited by the resolution
of getit() even if the initial delay is calibrated perfectly.

Use fixed point and long longs to speed up scaling in DELAY().
The old method slowed down a lot when the frequency became variable.
Assume the default frequency for short delays so that the fixed
point calculation can be exact.

Fast scaling is only important for small delays. Scaling is done
after looking at the counter and outside the loop, so it doesn't
decrease accuracy or resolution provided it completes before the
delay is up. The comment in the code is still confused about this.


22093 29-Jan-1997 bde

Disabled logging of masked exceptions on exit. Keep the side effect of
saving the state (see rev.1.17).


22005 25-Jan-1997 bde

Sync with <pci/pcibus.h>. pcibus.c unfortunately still compiled (with
only 3 or 4 warnings) when pb_maxirq went away.


21979 24-Jan-1997 bde

Fixed some formatting bugs (mostly regressions in rev.1.48). Replaced
some magic numbers by pmap constants. Cosmetic.


21975 24-Jan-1997 bde

Initialize CR0_MP in setregs() in case npx0 is disabled or not configured.
Disabling npx0 works right now.

Don't reference `npxdriver' if npx0 is not configured. Not configuring
npx0 doesn't quite work yet.

Don't clear potential non-npx pcb flags in setregs().


21974 24-Jan-1997 obrien

KNF style police.

Reported by: Bruce
Thanks to: Bruce for also providing a diff.


21953 23-Jan-1997 dyson

Remove some dead code from trapwrite.
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


21944 22-Jan-1997 dyson

Fix I386 copyout support. The new page-table management code will
not lazy-fault page table pages. Update the copyout support to take
that into account. This should fix some segfault problems on such
machines.

After a short test period, we'll move this into 2.2.

Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


21857 19-Jan-1997 obrien

Add bits to identify AMD K5 and K6 cpu's.
Tested only on my AMD K5 PR-133. Bit values for K6 taken from AMD document
on how to test such things.

2.2 Candidate.


21801 17-Jan-1997 jkh

Adjust ex0 entries properly after talking with Javier.


21783 16-Jan-1997 bde

Guard against the i8254 timer being uninitialzed if DELAY() is
called early for console i/o. The timer is usually in BIOS mode
if it isn't explicitly initialized. Then it counts twice as fast
and has a max count of 65535 instead of 11932. The larger count
tended to cause infinite loops for delays of > 20 us. Such delays
are rare. For syscons and kbdio, DELAY() is only called early
enough to matter for ddb input after booting with -d, and the delay
is too small to matter (and too small to be correct) except in the
PC98 case. For pcvt, DELAY() is not used for small delays (pcvt
uses its own broken routine instead of the standard broken one),
but some versions call DELAY() with a large arg when they unnecessarily
initialize the keyboard for doing console output. The problem is
more serious for pcvt because there is always some early console
output.

Guard against the i8254 timer being partially or incorrectly
initialized. This would have prevented the endless loop.

Should be in 2.2.


21769 16-Jan-1997 jkh

Add the ex driver (Intel EtherExpress Pro/10).

I have no idea if this works since I don't have one of the cards to test.
I also don't know what the LINT and GENERIC entries should look like,
so I just made up some values for now and left them commented out.
Someone who knows the factory settings for a Pro/10, please contact me!

Submitted-By: Javier Martín Rueda <jmrueda@diatel.upm.es>


21767 16-Jan-1997 bde

Fixed printing of small offsets. E.g., -4(%ebp) is now printed
as -0x4(%ebp) instead of as _APTD+0xffc(%ebp), and if GUPROF is
defined, 8(%ebp) is now printed as 0x8(%ebp) instead of as
GMON_PROF_HIRES+0x4(%ebp).


21737 15-Jan-1997 dg

Fix bug related to map entry allocations where a sleep might be attempted
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.

Reviewed by: wollman


21734 15-Jan-1997 bde

Fixed longstanding annoying warning about a type mismatch. pmap doesn't
really uses pt_entry_t internally, so don't use it here.

Fixed range checking for writing. The partial page (if any) following
etext wasn't writable.


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


21568 11-Jan-1997 dyson

When we changed pmap_protect to support adding the writeable
attribute to a page range, we forgot to set the PG_WRITEABLE
flag in the vm_page_t. This fixes that problem.


21540 11-Jan-1997 nate

Moved pccard_configure() to the end of the configure() list. This
avoids problems with the PCIC controller grabbing an interrupt that
another card needs.

Closes PR: kernel/2405

Reviewed by: bde


21529 11-Jan-1997 dyson

Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.) Upper level recoded to use
pmap_ts_referenced.


21493 10-Jan-1997 kato

Staticize the functions rtc_inb, rtc_outb, rtc_serialcombit, and
rtc_serialcom. These functions are only used by PC98.


21438 08-Jan-1997 nate

Make the code more consistant by using the INTR*MASK macros througout the
code.

Reviewed by: bde

[
Bruce suggest removing the macros completely, but I'm not up to that
task quite yet.
]


21415 08-Jan-1997 nate

Changed magic # 0xa0000 -> ISA_HOLE_START since it's now defined.


21279 04-Jan-1997 bde

Reenabled i586_optimized_copyin/out yet again.


21278 04-Jan-1997 bde

Fixed context switching of FPU state after a fault in
i586_optimized_copyin/out.


21277 04-Jan-1997 bde

Fixed botched tables:
- the operands for bt, bts, arpl and `enter' were reversed.
- btr was reported as bts (with the correct operand order).
- cmpxchg was misplaced. It was misplaced differently in the
comments. It is misplaced differently again in the i486 manual.
I put it where the i586 manual and gas say it is.
- fucompp was misplaced.
- the rr table for(s) some versions of fstp, fcom and fcomp was non-null.
This caused some invalid opcodes to be reported as "" instead of as
"<bad instruction>".
- the word and long versions of the fi* instructions were reversed.
- aaa and daa were reversed.

Fixed bugs involving unusual operand sizes:
- 32-bit registers weren't always forced for bswap or for moves to and
from special registers.
- the operand sizes weren't reported for [l]call or [l]jmp.
- displacements weren't truncated mod 2^16 when the operand size was
16-bit.
- too-large displacements and offsets were fetched, and too-large
offsets were reported, when the operand size was 16-bit.
- sign extended immediate bytes were extended too far when the operand
size was 16-bit.

Fixed bugs involving usual operand sizes:
- 8-bit source registers weren't forced for mov[sz]b[wl].
- 16-bit source registers weren't forced for mov[sz]w[wl].
- immediate bytes were sometimes reported as sign extended even for
byte operations. Same for immediate words in word operations.
- the immediate byte was not reported as sign extended for `push'.

Finished Pentium support:
- cpuid, cmpxchg8b and rsm were missing.

Finished i287 support:
- fneni, fndisi and fsetpm were missing. These are harmless nops on
later FPUs.

Improvements:
- report invalid opcodes 0xd6 and 0xf1 using .byte. They are special
in not causing invalid operand exceptions when executed.
- report the immediate byte for unusual aam and aad instuctions.
Immediate bytes other than 0x0a always worked and are documented to
work on Pentiums.


21181 02-Jan-1997 se

Add code to copy the LDT, if required.

This code was sent to me by Bruce Evans, and seems to fix some
possible kernel panic in case of an execution error. It did not
cause any problems on my system, but I did never observe the
problem this patch is supposed to fix, anyway.

This patch is a NOP, unless the kernel is built with "options
USER_LDT", and doesn't affect the GENERIC kernel for this reason.

I want to have it in 2.2: it fixes a bug ...

Submitted by: bde


21039 30-Dec-1996 dyson

Let the VM system know that on certain arch's that VM_PROT_READ
also implies VM_PROT_EXEC. We support it that way for now,
since the break system call by default gives VM_PROT_ALL. Now
we have a better chance of coalesing map entries when mixing
mmap/break type operations. This was contributing to excessive
numbers of map entries on the modula-3 runtime system. The
problem is still not "solved", but the situation makes more
sense.

Eventually, when we work on architectures where VM_PROT_READ
is orthogonal to VM_PROT_EXEC, we will have to visit this
issue carefully (esp. regarding security issues.)


20998 29-Dec-1996 dyson

Superficial clean-up of useracc calls. (The useracc usage of
B_READ/B_WRITE is bogus anyway.) Might as well make the call prettier
anyway.


20997 29-Dec-1996 dyson

Allow pmap_protect to increase permissions. This mod can eliminate
the need for unnecessary vm_faults.
Submitted by: Alan Cox <alc@cs.rice.edu>


20969 28-Dec-1996 bde

Disabled i586-optimized copyin and copyout again. The fault handler
is still broken - it doesn't restore the floating point state.

2.2-BETA users should disable it using npx0 flags 0x04 the same as
2.2-ALPHA users should have.


20748 21-Dec-1996 phk

Give MFS_ROOT priority over NFS as root filesystem.

2.2 candidate.


20730 21-Dec-1996 se

Mention amd driver in comment regarding PCI drivers.


20651 18-Dec-1996 bde

Only handle copyin/out/etc faults when not in an interrupt handler.
This makes unexpected faults (in an interrupt handler) more likely
to crash properly. It could be done even better (more robustly and
more efficiently) using lazy fault handling.


20650 18-Dec-1996 bde

Fixed formatting of KERN_DUMPDEV.

Should be in 2.2.


20641 18-Dec-1996 bde

Moved the printing of the BIOS geometries from cpu_startup() into
configure() where it always belonged. It was originally slightly
misplaced after configure(). Rev.138 left it completely misplaced
before the DEVFS, DRIVERS and CONFIGURE sysinits by not moving it
together with configure().

Restored the printing of bootinfo.bi_n_bios_used now that it can
be nonzero.


20617 18-Dec-1996 se

Add driver for AMD 53c974 SCSI (Tekram DC390/390T).
Remove MAX_LUN=2 option for NCR driver: FAILSAFE does
no longer imply MAX_LUN=1.


20578 17-Dec-1996 dg

Fix nbuf calculation /4 -> /8. 2.2 already has it this way.

Reviewed by: dyson


20515 15-Dec-1996 se

Remove "options MAXLUN=2" since the ncr driver will probe for 8 LUNs
now anyway, even if compiled with FAILSAFE defined.


20471 14-Dec-1996 jkh

Make the USERCONFIG_BOOT semantics closer to what was original
intended.


20390 13-Dec-1996 jkh

Close PR#2198:

I've added an installation from optical disk drive facility.
This enables FreeBSD to be installed from an optical disk, which
may be formatted in "super floppy" style or sliced into MSDOS-FS
and UFS partitions.

Note: ncr.c should be reviewed by Stefan Esser <se@freebsd.org>
and cd.c by Joerg Wunsch <joerg@freebsd.org> before bringing this
into 2.2.

Submitted-By: Shunsuke Akiyama <akiyama@kme.mei.co.jp>


20368 12-Dec-1996 swallace

Soften range-check for LDTs.

Reviewed by: bde


20348 12-Dec-1996 dg

Fix allocation for exech_map to be 16*PAGE_SIZE rather than 32*PAGE_SIZE
so that it is scaled the same as exec_map (16 concurrent exec'ers).


20313 11-Dec-1996 dyson

One minor mod to set the limit of nbufs to 2048 from 1536. More important
fix to exech_map, it used 32*ARG_MAX, and it should use 32*PAGE_SIZE.


20146 05-Dec-1996 dyson

Clean-up of the new buffer kva allocation code. Also, there was an
error in the !BOUNCE_BUFFERS case.


20070 01-Dec-1996 bde

Removed all references to b_cylinder (aka b_cylin). It was evil and
hasn't been used for a year or two since disksort() started sorting
on b_pblkno.


20068 01-Dec-1996 dyson

Fix a problem with the new buffer_map management code. Additionally,
decrease the size of buffer_map to approx 2/3 of what it used to be
(buffer_map can be smaller now.) The original commit of these changes
increased the size of buffer_map to the point where the system would
not boot on large systems -- now large systems with large caches will
have even less problems than before.


20044 30-Nov-1996 bde

Reenabled i586-optimized copyin/out.

Should be in 2.2. Don't put it there for a while.


20018 29-Nov-1996 bde

Fixed EFAULT handling in i586_copyin() and i586_copyout(). Use a
consistent stack frame in fastmove() so that only one new fault handler
is necessary.

Should be in 2.2. Harmless until the i586 versions are reenabled.


20017 29-Nov-1996 bde

Don't print bootinfo.bi_n_bios_used in cpu_startup() since it is always
zero because no drivers have had a chance to change it.


20016 29-Nov-1996 bde

Don't clobber the SIGCONT bit in the signal mask in sigreturn(). Use
the `sigcantmask' macro to get the correct set of unmaskable signals.

Found by: NIST-PCTS.


19957 25-Nov-1996 phk

Make a kernel with DDB but without sio possible again. This is only
a stopgap measure, a more complete solution is on somebodys whiteboard
(and we all know that THAT means :-).

Reviewed by: pst


19828 17-Nov-1996 dyson

Improve the caching of small files like directories, while not
substantially increasing buffer space. Specifically, we double
the number of buffers, but allocate only half the amount of memory
per buffer. Note that VDIR files aren't cached unless instantiated
in a buffer. This will significantly improve caching.


19804 16-Nov-1996 gibbs

Since there have been so many reports of the Memory Mapped I/O to the
aic7xxx cards failing on certain motherboards, reverse the logic used to
control this feature. AHC_FORCE_PIO is replaced with AHC_ALLOW_MEMIO.
GENERIC no longer needs to specify the AHC_FORCE_PIO option since this is
the default.


19798 15-Nov-1996 bde

Disabled i586-optimized copyin and copyout. They usually panic if the
user supplies a bad address, because they push a lot of stuff that the
fault handler doesn't know about onto the stack. This has been broken
for more than half a year despite being tested for almost half a year
in -current.


19772 15-Nov-1996 jkh

Change this back to movl for -current since it seems to work there.
Bruce says that movl is broken in -stable, which would certainly explain
why this didn't work there.


19750 14-Nov-1996 jkh

movl instruction should have been lea (this is why userconfig didn't
work in 2.1).

Spotted-by-the-keen-eyes-of: Don Lewis <Don.Lewis@tsc.tdk.com>


19678 12-Nov-1996 bde

Removed another #include of opt_temporary.h.

YA2.2C.


19674 12-Nov-1996 bde

Removed #include of "opt_temporary.h". All the temporary options went
away, so this header is no longer generated.

This change should be in 2.2. The old version shouldn;t have been in
2.2 (blush).


19653 11-Nov-1996 bde

Replaced I586_OPTIMIZED_BCOPY and I586_OPTIMIZED_BZERO with boot-time
negative-logic flags (flags 0x01 and 0x02 for npx0, defaulting to unset = on).
This changes the default from off to on. The options have been in current
for several months with no problems reported.

Added a boot-time negative-logic flag for the old I5886_FAST_BCOPY option
which went away too soon (flag 0x04 for npx0, defaulting to unset = on).

Added a boot-time way to set the memory size (iosiz in config, iosize in
userconfig for npx0).

LINT:
Removed old options. Documented npx0's flags and iosiz.

options.i386:
Removed old options.

identcpu.c:
Don't set the function pointers here. Setting them has to be delayed
until after userconfig has had a chance to disable them and until after
a good npx0 has been detected.

machdep.c:
Use npx0's iosize instead of MAXMEM if it is nonzero.

support.s:
Added vectors and glue code for copyin() and copyout().
Fixed ifdefs for i586_bzero().
Added ifdefs for i586_bcopy().

npx.c:
Set the function pointers here.
Clear hw_float when an npx exists but is too broken to use.
Restored style from a year or three ago in npxattach().


19621 11-Nov-1996 dyson

Support the PG_G flag on Pentium-Pro processors. This pretty
much eliminates the unnecessary unmapping of the kernel during
context switches and during invtlb...


19523 08-Nov-1996 asami

Remove option I586_FAST_BCOPY. The code will be included by default
if I586_CPU is defined. Note there is a runtime check so the code
won't be run for non-Pentium CPUs anyway.

2.2 candidate, this code has been tested for almost half year in -current.


19503 07-Nov-1996 joerg

Fix the message buffer mapping. This actually allows to increase
the message buffer size in <sys/msgbuf.h>.

Reviewed by: davidg,joerg
Submitted by: bde


19482 07-Nov-1996 bde

Don't switch from fast interrupt handlers to normal interrupt
handlers if interrupts are nested more than a few (3) deep. This
only reduces the maximum nesting level by 1 with the standard
drivers unless there is a related bug somewhere, but can't hurt
much (the worst case is returning to hoggish interrupt handler like
wdintr(), but such interrupt handlers hurt anyway).

Fixed a previously harmless race incrementing the interrupt nesting
level.

This should be in 2.1.6 and 2.2.


19463 06-Nov-1996 bde

Count only hardware interrupts in cnt.v_intr, so that the individual
hardware interrupt counts add up to the total. Previously, software
interrupts generated by splz() were counted in the total. These
software interrupts seem to be very rare - there have apparently been
0 of them on freefall among the last 352448857 interrupts.


19346 03-Nov-1996 dyson

Fix a problem with running down processes that have left wired
mappings with mlock. This problem only occurred because of the
quick unmap code not respecting the wired-ness of pages in the
process. In the future, we need to eliminate the dependency
intrinsic to the design of the code that wired pages actually
be mapped. It is kind-of bogus not to have wired pages mapped,
but it is also a weakness for the code to fall flat because
of a missing page.

This show fix a problem that Tor Egge has been having, and also
should be included into 2.2-RELEASE.


19274 31-Oct-1996 julian

Further improved version of hadling a HALT when there is no console.


19269 30-Oct-1996 asami

More merge and update.

(1) deleted #if 0

pc98/pc98/mse.c

(2) hold per-unit I/O ports in ed_softc

pc98/pc98/if_ed.c
pc98/pc98/if_ed98.h

(3) merge more files by segregating changes into headers.

new file (moved from pc98/pc98):

i386/isa/aic_98.h

deleted:

well, it's already in the commit message so I won't repeat the
long list here ;)

Submitted by: The FreeBSD(98) Development Team


19219 28-Oct-1996 gibbs

Add two new aic7xxx driver options:

AHC_FORCE_PIO - This forces the driver to use PIO even on systems that
say they have memory mapped the controller's registers. This
seems to fix Ken Lam's problems. I've also placed this option
in the GENERIC kernel file so that we are guaranteed to install
even on these flakey machines.

AHC_SHARE_SCBS - This option attempts to share the external SCB SRAM on
the 398X controllers allowing a totoll of 255 non-paged SCBs.
This doesn't work quite yet, so this option is mostly here to
help 398X owners to experiment and give me feedback until this
works properly.


19186 26-Oct-1996 bde

Removed initialization of a variable that went away. Oops.


19173 25-Oct-1996 bde

Print the clock calibration messages all on one (long) line again so
that they are easy to grep for.

Removed now-unused i586 counter variables.

Fixed some style bugs.


19172 25-Oct-1996 bde

Improved biasing of i586 clock by adjusting for hardclock() latency.
I decided to do this for every hardclock() call instead of lazily
in microtime(). The lazy method is simpler but has more overhead
if microtime() is called a lot.

CPU_THISTICKLEN() is now a no-op and should probably go away.
Previously it did nothing directly but had the side effect of
setting i586_last_tick for CPU_CLOCKUPDATE() and i586_avg_tick for
debugging. CPU_CLOCKUPDATE() now uses a better method and
i586_avg_tick is too much trouble to maintain.

Reduced nesting of #includes in the usual case.

Increased nesting of #includes when CLOCK_HAIR is defined. This
is a kludge to get typedefs for inline functions only when the
inline functions are used. Normally only kern_clock.c defines
this. kern_clock.c can't include the i386 headers directly.

Removed unused LOCORE support.


19119 23-Oct-1996 dyson

Account for the UPAGES in the same way as before moving the MD code
from vm_glue into pmap.c. Now RSS should appear to be the same as before.


19064 20-Oct-1996 phk

Removing old isdn stuff.


19000 17-Oct-1996 bde

Improved non-statistical (GUPROF) profiling:
- use a more accurate and more efficient method of compensating for
overheads. The old method counted too much time against leaf
functions.
- normally use the Pentium timestamp counter if available.
On Pentiums, the times are now accurate to within a couple of cpu
clock cycles per function call in the (unlikely) event that there
are no cache misses in or caused by the profiling code.
- optionally use an arbitrary Pentium event counter if available.
- optionally regress to using the i8254 counter.
- scaled the i8254 counter by a factor of 128. Now the i8254 counters
overflow slightly faster than the TSC counters for a 150MHz Pentium :-)
(after about 16 seconds). This is to avoid fractional overheads.

files.i386:
permon.c temporarily has to be classified as a profiling-routine
because a couple of functions in it may be called from profiling code.

options.i386:
- I586_CTR_GUPROF is currently unused (oops).
- I586_PMC_GUPROF should be something like 0x70000 to enable (but not
use unless prof_machdep.c is changed) support for Pentium event
counters. 7 is a control mode and the counter number 0 is somewhere
in the 0000 bits (see perfmon.h for the encoding).

profile.h:
- added declarations.
- cleaned up separation of user mode declarations.

prof_machdep.c:
Mostly clock-select changes. The default clock can be changed by
editing kmem. There should be a sysctl for this.

subr_prof.c:
- added copyright.
- calibrate overheads for the new method.
- documented new method.
- fixed races and and machine dependencies in start/stop code.

mcount.c:
Use the new overhead compensation method.

gmon.h:
- changed GPROF4 counter type from unsigned to int. Oops, this should
be machine-dependent and/or int32_t.
- reorganized overhead counters.

Submitted by: Pentium event counter changes mostly by wollman


18992 17-Oct-1996 bde

Added missing extern declaration of timer_freq.
Sorted declarations of scalars.


18963 16-Oct-1996 bde

Fixed miscounting for non-statistical (GUPROF) profiling:
- use CROSSJUMP() and CROSSJUMP_LABEL() for conditional jumps from idle()
into cpu_switch() and vice versa.
- moved badsw code to after cpu_switch().

Cosmetic changes:
- moved sw0 string to be immediately after its caller (badsw).
- removed unused #include.


18961 16-Oct-1996 bde

Added macros CROSSJUMP(), CROSSJUMP_LABEL() and GPROF_RET. These will
be used to fix some benign(?) bugs in GUPROF profiling.

Fixed stale comments and long lines.


18937 15-Oct-1996 dyson

Move much of the machine dependent code from vm_glue.c into
pmap.c. Along with the improved organization, small proc fork
performance is now about 5%-10% faster.


18907 13-Oct-1996 dyson

Pmap_resident_count was mistakenly removed from pmap.h, thereby
disabling the RSS listing in ps and ^T. This commit re-inserts
the macro defn.


18904 13-Oct-1996 dyson

Minor optimization for final rundown of a pmap.


18897 12-Oct-1996 dyson

Performance optimizations. One of which was meant to go in before the
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.


18896 12-Oct-1996 bde

Cleaned up:
- fixed a sloppy common-style declaration.
- removed an unused macro.
- moved once-used macros to the one file where they are used.
- removed unused forward struct declarations.
- removed __pure.
- declared inline functions as inline in their prototype as well
as in theire definition (gcc unfortunately allows the prototype
to be inconsistent).
- staticized.


18892 12-Oct-1996 bde

Removed nested include if <sys/socket.h> from <net/if.h> and
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.


18855 10-Oct-1996 bde

Don't include "opt_cpu.h" in <machine/clock.h>, since this breaks lkm's.
The change breaks kern_clock.c; fix that temporarily by including
"opt_cpu.h" there.


18842 09-Oct-1996 bde

Put I*86_CPU defines in opt_cpu.h.


18837 09-Oct-1996 bde

Enable the i586-optimized bcopy if the cpu is a "586" and option
I586_OPTIMIZED_BCOPY is configured.

Similarly for bzero/I586_OPTIMIZED_BZERO.

Fake 586's had better have a hardware FPU with non-broken exception
handling (we mask exceptions, but broken exception handling may trap
on the instructions that do the masking). I guess this means that
the routines won't work on most 386's or FPUless 486's even when they
have a h/w FPU.


18835 09-Oct-1996 bde

Added i586-optimized bcopy() and bzero().

These are based on using the FPU to do 64-bit stores. They also
use i586-optimized instruction ordering, i586-optimized cache
management and a couple of other tricks. They should work on any
i*86 with a h/w FPU, but are slower on at least i386's and i486's.
They come close to saturating the memory bus on i586's. bzero()
can maintain a 3-3-3-3 burst cycle to 66 MHz non-EDO main memory
on a P133 (but is too slow to keep up with a 2-2-2-2 burst cycle
for EDO - someone with EDO should fix this). bcopy() is several
cycles short of keeping up with a 3-3-3-3 cycle for writing. For
a P133 writing to 66 MHz main memory, it just manages an N-3-3-3,
3-3-3-3 pair of burst cycles, where N is typically 6.

The new routines are not used by default. They are always configured
and can be enabled at runtime using a debugger or an lkm to change
their function pointer, or at compile time using new options (see
another log message).

Removed old, dead i586_bzero() and i686_bzero(). Read-before-write is
usually bad for i586's. It doubles the memory traffic unless the data
is already cached, and data is (or should be) very rarely cached for
large bzero()s (the system should prefer uncached pages for cleaning),
and the amount of data handled by small bzero()s is relatively small
in the kernel.

Improved comments about overlapping copies.

Removed unused #include.


18819 08-Oct-1996 bde

Fixed pessimized (short) i/o port types.


18702 05-Oct-1996 jkh

Multiple changes stacked as one commit since they all depend on one another.

First, change sysinstall and the Makefile rules to not build the kernel
nlist directly into sysinstall now. Instead, spit it out as an ascii
file in /stand and parse it from sysinstall later. This solves the chicken-n-
egg problem of building sysinstall into the fsimage before BOOTMFS is built
and can have its symbols extracted. Now we generate the symbol file in
release.8.

Second, add Poul-Henning's USERCONFIG_BOOT changes. These have two
effects:

1. Userconfig is always entered, rather than only after a -c
(don't scream yet, it's not as bad as it sounds).

2. Userconfig reads a message string which can optionally be
written just past the boot blocks. This string "preloads"
the userconfig input buffer and is parsed as user input.
If the first command is not "USERCONFIG", userconfig will
treat this as an implied "quit" (which is why you don't need
to scream - you never even know you went through userconfig
and back out again if you don't specifically ask for it),
otherwise it will read and execute the following commands
until a "quit" is seen or the end is reached, in which case
the normal userconfig command prompt will then be presented.

How to create your own startup sequences, using any boot.flp image
from the next snap forward (not yet, but soon):

% dd of=/dev/rfd0 seek=1 bs=512 count=1 conv=sync <<WAKKA_WAKKA_DOO
USERCONFIG
irq ed0 10
iomem ed0 0xcc000
disable ed1
quit
WAKKA_WAKKA_DOO


Third, add an intro screen to UserConfig so that users aren't just thrown
into this strange screen if userconfig is auto-launched. The default
boot.flp startup sequence is now, in fact, this:

USERCONFIG
intro
visual

(Since visual never returns, we don't need a following "quit").

Submitted-By: phk & jkh


18567 29-Sep-1996 bde

Added "memory" to clobber list in invlpg(). It needs it if invltlb()
needs it.

Fixed style in invlpg().

Sorted recently renamed functions.

Added prototypes in the non-gcc section for recently added/renamed
functions.


18548 28-Sep-1996 dyson

Essentially rename pmap_update to be invltlb. It is a very machine
dependent operation, and not really a correct name. invltlb and invlpg
are more descriptive, and in the case of invlpg, a real opcode.

Additionally, fix the tlb management code for 386 machines.


18538 28-Sep-1996 bde

Restored my change in rev.1.119 which was clobbered by the previous commit.


18528 28-Sep-1996 dyson

Move pmap_update_1pg to cpufunc.h. Additionally,
use the invlpg opcode instead of the nasty looking .byte directives.
There are some other minor micro-level code improvements to pmap.c


18514 27-Sep-1996 peter

part 2 of the bsdi compat tweak attempt. I believe that BSDI use both
lcall 7,0 (ie: ldt slot 0) and lcall 0x87,0 (ldt slot 16, it's shifted
three bits to the left). I was fiddling with this so long ago, I don't
recall the specifics.


18513 27-Sep-1996 peter

Apparently, BSDI have a new system call gate. I was experimenting
with this quite a while ago when somebody reported a BSD/OS 2.1 binary
that wouldn't run. I'm pretty sure they tried it and I'm pretty sure
they mentioned to me that the patch worked.


18511 27-Sep-1996 peter

I've been meaning to commit this for months. Implement select()
for /dev/random and /dev/urandom. Both are always writable, urandom is
always readable, and /dev/random is readable when >= 8 bits are in the
pool.


18490 24-Sep-1996 bde

Fixed a few hundred warnings (2400 in LINT) for signed vs unsigned
comparisons in the inb() and outb() macros. I decided that int args
are OK here. Any type that can hold a u_int16_t without overflow
is correct, and 32-bit types are optimal.

Introduced a few tens of warnings (100 in LINT) for use of pessimized
(short) types for the port arg. Only a few drivers are affected by
this. u_short pessimizations aren't detected.

Added `__extension__' before the statement-expression in inb() so
that it can be compiled without warnings by gcc -pedantic.


18428 20-Sep-1996 bde

Changed an arg name in the pseudo-prototype for bzero() to match
the prototype.

Put the jump table for i486_bzero() in the data section. This
speeds up i486_bzero() a little on Pentiums without significantly
affecting its speed on 486's.

Don't waste time falling through 14 nop's to return from do1 in
i486_bzero().

Use fastmove() for counts >= 1024 (was > 1024). Cosmetic.

Fixed profiling of fastmove().

Restored meaningful labels from the pre-1.1 version in fastmove().
Local labels are evil.

Fixed (high resolution non-) profiling of __bb_init_func().


18380 19-Sep-1996 phk

Add APM_IDLE_CPU option, that is off by default.
I maintain that it saves more power to simply "hlt" the CPU than to
spend tons of time trying to tell the APM bios to do the same.
In particular if you do it 100 times a second...


18297 14-Sep-1996 bde

Attached simple external ddb commands `show rtc', `show pgrpdump'
and `show cbstat'. The pgrpdump code was previously controlled by
`#ifdef DEBUG'.


18288 14-Sep-1996 bde

Changed cncheckc() interface so that it is 8-bit clean - return -1
instead of 0 if there is no input.

syscons.c:
Added missing spl locking in sccncheckc(). Return the same value as
sccngetc() would. It is wrong for sccngetc() to return non-ASCII, but
stripping the non-ASCII bits doesn't help.


18275 13-Sep-1996 bde

Made debugging code (pmap_pvdump()) compile again so that I can test LINT.
I don't know if it actually works.


18265 12-Sep-1996 asami

Another round of merge/update.

(1) Add PC98 support to apm_bios.h and ns16550.h, remove pc98/pc98/ic
(2) Move PC98 specific code out of cpufunc.h (to pc98.h)
(3) Let the boot subtrees look more alike

Submitted by: The FreeBSD(98) Development Team
<freebsd98-hackers@jp.freebsd.org>


18260 12-Sep-1996 dyson

Primarily a fix so that pages are properly tracked for being
modified. Pages that are removed by the pageout daemon were
the worst affected. Additionally, numerous minor cleanups,
including better handling of busy page table pages. This
commit fixes the worst of the pmap problems recently introduced.


18252 11-Sep-1996 phk

Make userconfig two (default: on) options:
USERCONFIG to enable
VISUAL_USERCONFIG to get the gui stuff too.
Requested by: pst


18239 11-Sep-1996 dyson

A minor fix to the new pmap code. This might not fix the global problems
with the last major pmap commits.


18233 10-Sep-1996 bde

Removed more devconf leftovers.


18232 10-Sep-1996 bde

Removed bogus LARGMEM code and option. The code paniced when
biosextmem > 65536, but biosextmem is a 16-bit quantity so it is
guaranteed to be < 65536. Related cruft for biosbasemem was
mostly cleaned up in rev.1.26.


18207 10-Sep-1996 bde

Updated #includes to 4.4Lite style.


18169 08-Sep-1996 dyson

Addition of page coloring support. Various levels of coloring are afforded.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)


18163 08-Sep-1996 dyson

Improve the scalability of certain pmap operations.


18095 07-Sep-1996 asami

Yet another merge. Remove support.s by deleting memcopy. Remove
autoconf.c by merging icu.h. Fix a couple of typos.

Submitted by: The FreeBSD(98) Development Team.


18084 06-Sep-1996 phk

Remove devconf, it never grew up to be of any use.


18023 03-Sep-1996 nate

Cleaned up version of my 'extended BIOS' patch. This one is commented
better and much simpler to understand, and works just as well (better)
as a bonus.

Submitted by: bde


17986 01-Sep-1996 dg

Change an splclock that needs to be an splhigh into an splhigh.

Reviewed by: bde


17983 01-Sep-1996 nate

If the basemem value supplied by the bootblocks, differs from the value
returned by the RTC, use the bootblock supplied value. Also, map the
'stolen by BIOS' memory in the same manner as the ISA-hole memory, since
it is really an extenstion of the BIOS. This is necessary for 32-bit
BIOS functions such as APM support on laptops, and the loss of memory
for non-necessary functions seems to be at most 4k.

Reviewed by: phk
Obtained from: email conversation with jtk@atria.com


17950 30-Aug-1996 pst

Improvements from Bruce Evans


17879 28-Aug-1996 bde

Cleaned up interrupt masking by declaring the state variable in a
machine-dependent macro and passing it to all machine-dependent
macros.

Eliminated the state variable for the GUPROF case.


17865 28-Aug-1996 pst

Clean up formatting and fix an & -> && bug pointed out by bde


17847 27-Aug-1996 pst

Support for GDB remote debug protocol.

Sponsored by: Juniper Networks, Inc. <pst@jnx.com>


17846 27-Aug-1996 wosch

Add hints to the file ./LINT and the handbook.


17677 19-Aug-1996 julian

Collect all the functioons concerned with rebooting into one place
also add the at_shutdown callout list, and change the one user of
the present (broken) method (the vn driver) to use the new scheme.


17559 12-Aug-1996 wollman

Back out mistaken local change that sneaked in on the last commit.


17558 12-Aug-1996 wollman

Don't declare the user_ldt functions unless USER_LDT is defined.
Eliminates an obnoxious warning.


17521 11-Aug-1996 dg

Add support for i686 machine check trap.


17520 11-Aug-1996 dg

Defined T_MCHK exception for i686; renumbered T_RESERVED to 29.


17490 10-Aug-1996 peter

Add recognition for the AMD 5x86 CPU models.

Submitted by: A JOSEPH KOSHY <koshy@india.hp.com>


17488 10-Aug-1996 peter

Trivial cosmetic tweak to make the i[56]86 CPU MHz reprting round to the
nearest .01 Mhz rather than simply truncating it downwards.

This hack makes this 89.999928 Mhz clock correctly round to the closer
90.00-MHz rather than 89.99-MHz:
> i586 clock: 89999928 Hz, i8254 clock: 1193152 Hz
> CPU: Pentium (90.00-MHz 586-class CPU)


17395 02-Aug-1996 bde

Eliminated i586_ctr_rate. Use i586_ctr_freq instead.


17394 02-Aug-1996 bde

Eliminated i586_ctr_rate. Use i586_ctr_freq instead.

Changed i586_ctr_bias from long long to u_int. Only the low 32 bits
are used now that microtime uses a multiplication to do the scaling.
Previously the high 32 bits had to match those of rdtsc() to prevent
overflow traps and invalid timeval adjustments.


17384 01-Aug-1996 wollman

Add an fls() inline function which does the opposite operation to
ffs(). (That is to say, it searches in the opposite direction.)


17371 31-Jul-1996 bde

Eliminated pcb_inl. It was always 0 because context switches don't occur
in interrupt handlers.


17366 31-Jul-1996 dg

Converted timer/run queues to 4.4BSD queue style. Removed old and unused
sleep(). Implemented wakeup_one() which may be used in the future to combat
the "thundering herd" problem for some special cases.

Reviewed by: dyson


17355 30-Jul-1996 bde

Fixed longstanding bug of not checking `dumpdev' or setting `dumplo'
early enough when the dump device is specified in the config file.

Removed stale comment about configuration root and swap devices.

Don't bother clearing dumplo when dumpdev is set to NODEV. Everything
is controlled by dumpdev.

Fixed the kern.dumpdev sysctl. Writes were handle bogusly.


17353 30-Jul-1996 bde

Fixed the machdep.i8254_freq and machdep.i586_freq sysctls. Writes were
handled bogusly.

Centralized the setting of all the frequency variables. Set these
variables atomically. Some new ones aren't used yet.


17334 30-Jul-1996 dyson

Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.


17329 29-Jul-1996 dyson

Fix a problem with a DEBUG section of code.


17325 29-Jul-1996 dyson

Fix an error in statement order in pmap_remove_pages, remove the pmap
pte hint (for now), and general code cleanup.


17321 28-Jul-1996 dyson

Fix a problem that pmap update was not being done for kernel_pmap. Also
remove some (currently) gratuitious tests for PG_V... This bug could
have caused various anomolous (temporary) behavior.


17294 27-Jul-1996 dyson

This commit is meant to solve a couple of VM system problems or
performance issues.

1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.


17256 23-Jul-1996 asami

Update to current state of PC98 world.

Submitted by: The FreeBSD(98) development team


17236 21-Jul-1996 joerg

Post-commit review by Bruce. Mostly stylistic changes.

Submitted by: bde


17231 20-Jul-1996 joerg

Major cleanup of the timerX_{acquire,release} stuff. In particular,
make it more intelligible, improve the partially bogus locking, and
allow for a ``quick re-acquiration'' from a pending release of timer 0
that happened ``recently'', so it was not processed yet by clkintr().
This latter modification now finally allows to play XBoing over
pcaudio without losing sounds or getting complaints. ;-) (XBoing
opens/writes/closes the sound device all over the day.)

Correct locking for sysbeep().

Extensively (:-) reviewed by: bde


17194 17-Jul-1996 bde

Fixed adjustment of `time' when timer0 is released. 27465 was 27645 in
a comment and in code that was only used when pcaudio was closed. The
maximum error was 66 usec.


17178 15-Jul-1996 nate

Moved declaration of zbuf outside of #ifdef DEVFS code.


17174 15-Jul-1996 bde

Quick fix for previous commit: don't free zbuf on close since it may be
in use in another process that blocked in uiomove().


17166 14-Jul-1996 dyson

Almost gratuitious improvement of the performance of reading
/dev/zero.


17121 12-Jul-1996 bde

Removed "optimization" using gcc's builtin memcpy instead of bcopy.
There is little difference now since the amount copied is large,
and bcopy will become much faster on some machines.


17120 12-Jul-1996 bde

Renamed upa to p0upa to match p0upt.

Cleaned up some comments.


17118 12-Jul-1996 bde

Export `dumpmag' to utilities but not to the kernel.

Restored a truncated comment.


17117 12-Jul-1996 bde

Fixed cloned comments about npx traps to match context.


17109 12-Jul-1996 bde

Fixed operand order for shld and shrd.

Finished the constant poisoning that was begun in rev.1.14. Consts
aren't very poisonous (or useful) unless -Wcast-qual is in CFLAGS,
and it isn't in the default CFLAGS.


17108 12-Jul-1996 bde

Don't use NULL in non-pointer contexts.


17093 11-Jul-1996 jkh

Merge.


17053 10-Jul-1996 jkh

Clean out some historical cruft.


17014 08-Jul-1996 wollman

Fix something that's been bugging me for a long time: move the CPU
type identification code out of machdep.c and into a new file of its
own. Hopefully other grot can be moved out of machdep.c as well
(by other people) into more descriptively-named files.


16878 01-Jul-1996 bde

Fixed lots of warnings about unportable casts of pointers to volatile
variables: don't depend on the compiler generating atomic code to set
the variables - use inline asm to specify the atomic instruction(s)
explicitly.


16875 01-Jul-1996 bde

Moved declarations of non-cpu things from <machine/cpufunc.h> to better
places.


16874 01-Jul-1996 bde

Use the standard timer (interrupt) frequency while calibrating the clocks.
Testing with the high frequency of 20000 Hz (to find problems) only found
the problem that this frequency is too high for slow i386's.

Disable interrupts while setting the timer frequency. This was unnecessary
before rev.1.57 and forgotten in rev.1.57. The critical (i8254) interrupts
are disabled in another way at boot time but not in the sysctl to change
the frequency.


16843 30-Jun-1996 joerg

Enable ktrace by default, accompanied by a small reminder about the
implications (4 KB bloat, slight slowdown of syscalls).

Reviewed by: freebsd-hackers


16747 26-Jun-1996 dyson

When page table pages were removed from process address space, the
resident page stats were not being decremented. This mode corrects
that problem.


16733 25-Jun-1996 bde

Added #include of <machine/md_var.h>. This will be needed when
some declarations are moved from <machine/cpufunc.h> to better
places.


16725 25-Jun-1996 bde

trap.c:
Fixed profiling of system times. It was pre-4.4Lite and didn't support
statclocks. System times were too small by a factor of 8.

Handle deferred profiling ticks the 4.4Lite way: use addupc_task() instead
of addupc(). Call addupc_task() directly instead of using the ADDUPC()
macro.

Removed vestigial support for PROFTIMER.

switch.s:
Removed addupc().

resourcevar.h:
Removed ADDUPC() and declarations of addupc().

cpu.h:
Updated a comment. i386's never were tahoe's, and the deferred profiling
tick became (possibly) multiple ticks in 4.4Lite.

Obtained from: mostly from NetBSD


16723 25-Jun-1996 bde

Save John Polstra's initial fix for profiling for reference. The
multiplication in addupc() overflowed for addresses >= 256K, assuming
the usual profil(2) scale parameter of 0x8000. addupc() will go away
soon.

Submitted by: John Polstra <jdp@polstra.com>


16680 25-Jun-1996 dyson

Limit the scan for preloading pte's to the end of an object.


16532 20-Jun-1996 dg

Properly account for non-page aligned buffers.


16530 20-Jun-1996 dg

Minor KNF formatting change to vmapbuf() and vunmapbuf().


16499 19-Jun-1996 dyson

Clean up vmapbuf and vunmapbuf significantly. The previous code was
very rough.


16471 18-Jun-1996 bde

Removed unused #includes of <i386/isa/icu.h> and <i386/isa/icu.h>. icu.h
is only used by the icu support modules and by a few drivers that know
too much about the icu (most only use it to convert `n' to `IRQn'). isa.h
is only used by ioconf.c and by a few drivers that know too much about
isa addresses (a few have to, because config is deficient).


16428 17-Jun-1996 bde

In getit(), use read_eflags()/write_eflags() to preserve the interrupt
enable flag instead of enable_intr() to restore it to its usual state.
getit() is only called from DELAY() so there is no point in optimising
its speed (this wasn't so clear when it was extern), and using
enable_intr() made it inconvenient to call DELAY() from probes that need
to run with interrupts disabled.


16415 17-Jun-1996 dyson

Several bugfixes/improvements:
1) Make it much less likely to miss a wakeup in vm_page_free_wakeup
2) Create a new entry point into pmap: pmap_ts_referenced, eliminates
the need to scan the pv lists twice in many cases. Perhaps there
is alot more to do here to work on minimizing pv list manipulation
3) Minor improvements to vm_pageout including the use of pmap_ts_ref.
4) Major changes and code improvement to pmap. This code has had
several serious bugs in page table page manipulation. In order
to simplify the problem, and hopefully solve it for once and all,
page table pages are no longer "managed" with the pv list stuff.
Page table pages are only (mapped and held/wired) or
(free and unused) now. Page table pages are never inactive,
active or cached. These changes have probably fixed the
hold count problems, but if they haven't, then the code is
simpler anyway for future bugfixing.
5) The pmap code has been sorely in need of re-organization, and I
have taken a first (of probably many) steps. Please tell me
if you have any ideas.


16407 16-Jun-1996 joerg

Explain the options for the `od' driver.


16363 14-Jun-1996 asami

The Great PC98 Merge.

All new code is "#ifdef PC98"ed so this should make no difference to
PC/AT (and its clones) users.

Ok'd by: core
Submitted by: FreeBSD(98) development team


16354 13-Jun-1996 se

Change CONF1_ENABLE_MSK to 0x7ff00000 in another attempt to decide
whether a system could possibly support PCI configuration mechanism 1
(or whether it rather is an EISA only system ...).


16344 13-Jun-1996 asami

A fast memory copy for Pentiums using floating point registers.
It is called from copyin and copyout.

The new routine is conditioned on I586_CPU and I586_FAST_BCOPY, so you
need

options "I586_FAST_BCOPY"

(quotes essenstial) in your kernel config file.

Also, if you have other kernel types configured in your kernel, an
additional check to make sure it is running on a Pentium is inserted.
(It is not clear why it doesn't help on P6s, it may be just that the
Orion chipset doesn't prefetch as efficiently as Tritons and friends.)

Bruce can now hack this away. :)


16324 12-Jun-1996 dyson

Fix a very significant cnt.v_wire_count leak in vm_page.c, and some
minor leaks in pmap.c. Bruce Evans made me aware of this problem.


16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


16300 11-Jun-1996 pst

Move warning messages under bootverbose


16299 11-Jun-1996 pst

Put clock calibration #defines in opt_clock.h to ease reconfiguration


16216 08-Jun-1996 bde

Removed unnecessary forward declarations of incomplete structs.


16215 08-Jun-1996 bde

Stop using the alias `pcb_ptd' for `pcb_tcc.tss_cr3'. Use the (existing)
alias `pcb_cr3' instead. That is still one alias too many, but is convenient
for me since I've replaced the tss in the pcb by a few scalar variables in
the pcb.


16213 08-Jun-1996 bde

Removed bogus `altfmt' code. No alternative formats are supported, but
altfmt was abused to sometimes screw up the disassembly of the bytes
following unconditional jump instructions. Gas doesn't pad to a longword
boundary like the comment said - that is the programmer's responsibility.


16211 08-Jun-1996 bde

Removed recently introduced unnecessary #includes of <machine/cpu.h>
(bootverbose isn't there in -current) and nearby unnecessary #includes.


16197 08-Jun-1996 dyson

Adjust the threshold for blocking on movement of pages from the cache
queue in vm_fault.

Move the PG_BUSY in vm_fault to the correct place.

Remove redundant/unnecessary code in pmap.c.

Properly block on rundown of page table pages, if they are busy.

I think that the VM system is in pretty good shape now, and the following
individuals (among others, in no particular order) have helped with this
recent bunch of bugs, thanks! If I left anyone out, I apologize!

Stephen McKay, Stephen Hocking, Eric J. Chet, Dan O'Brien, James Raynard,
Marc Fournier.


16166 07-Jun-1996 dyson

Fix a bug in the pmap_object_init_pt routine that pages aren't taken
from the cache queue before being mapped into the process.


16136 05-Jun-1996 dyson

I missed a case of the page table page dirty-bit fix.


16122 05-Jun-1996 dyson

Keep page-table pages from ever being sensed as dirty. This should fix
some problems with the page-table page management code, since it can't
deal with the notion of page-table pages being paged out or in transit.
Also, clean up some stylistic issues per some suggestions from
Stephen McKay.


16100 03-Jun-1996 sos

Added missing CR0_NW define for Cyrix 486DLC support. It's still not
stable on my hardware, but its better... *sigh*

Obtained from: NetBSD


16079 02-Jun-1996 dyson

Don't carry the modified or referenced bits through to the child
process during pmap_copy. This minimizes unnecessary swapping or creation of
swap space. If there is a hold_count flaw for page-table
pages, clear the page before freeing it to lessen the chance of a system
crash -- this is a robustness thing only, NOT a fix.


16075 02-Jun-1996 joerg

Be slightly more verbose during configure() in the bootverbose case.
This breaks the long silence after the ``npx0'' message and allows to
track some of the problems regarding the root f/s decisions.


16057 01-Jun-1996 dyson

Fix the problem with pmap_copy that breaks X in small memory machines. Also
close some windows that are opened up by page table allocations. The
prefaulting code no longer uses hold counts, but now uses the busy
flag for synchronization.


16029 31-May-1996 peter

Jump some hoops to have the *.s code being able to be run through both an
ansi and traditional cpp.

The nesting rules of macros are different, which required some changes.
Use __CONCAT(x,y) instead of /**/.
Redo some comments to use /* */ rather than "# comment" because the ansi
cpp cares about those, and also cares about quote matching.


16026 31-May-1996 dyson

This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also. There is still
a hang problem using X in small memory machines.


15977 29-May-1996 dyson

The wrong address (pindex) was being used for the page table directory. No
negative side effects right now, but just a clean-up.


15926 27-May-1996 phk

Cleanup the last of the assembly time "-KERNBASE" relocations.


15868 22-May-1996 peter

Fix harmless warning.. pmap_nw_modified was not having it's arg
cast to pt_entry_t like the others inside the DIAGNOSTIC code.


15858 22-May-1996 dyson

A serious error in pmap.c(pmap_remove) is corrected by this. When
comparing the PTD pointers, they needed to be masked by PG_FRAME, and
they weren't. Also, the "improved" non-386 code wasn't really an
improvement, so I simplified and fixed the code. This might have
caused some of the panics caused by the VM megacommit.


15832 21-May-1996 dyson

To quote Stephen McKay: pmap_copy is a complex NOP at this moment :-).

With this fix from Stephen, we are getting the target fork performance
that I have been trying to attain: P5-166, before the mega-commit: 700-800usecs,
after: 600usecs, with Stephen's fix: 500usecs!!! Also, this could be the
solution of some strange panic problems...
Reviewed by: dyson@freebsd.org
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


15819 19-May-1996 dyson

Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.


15809 18-May-1996 dyson

This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.


15759 13-May-1996 nate

Added commented out PCCARD entries to GENERIC, also document and add
entries in LINT.


15722 10-May-1996 wollman

Allocate mbufs from a separate submap so that NMBCLUSTERS works as
expected.


15694 09-May-1996 phk

Fix brino on my part. _etext doesn't include the padding to a page
boundary, which means that it doesn't mark the start of the data
section (which is then inaccessible to the programmer ??).
Hopefully fixes recent locore reboot problems.


15583 03-May-1996 phk

Another sweep over the pmap/vm macros, this time with more focus on
the usage. I'm not satisfied with the naming, but now at least there is
less bogus stuff around.


15565 02-May-1996 phk

Move atdevbase out of locore.s and into machdep.c
Macroize locore.s' page table setup even more, now it's almost readable.
Rename PG_U to PG_A (so that I can...)
Rename PG_u to PG_U. "PG_u" was just too ugly...
Remove some unused vars in pmap.c
Remove PG_KR and PG_KW
Remove SSIZE
Remove SINCR
Remove BTOPKERNBASE

This concludes my spring cleaning, modulus any bug fixes for messes I
have made on the way.

(Funny to be back here in pmap.c, that's where my first significant
contribution to 386BSD was... :-)


15543 02-May-1996 phk

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


15538 02-May-1996 phk

First pass at cleaning up macros relating to pages, clusters and all that.


15534 02-May-1996 phk

KGDB is dead. It may come back one day if somebody does it.


15508 01-May-1996 bde

Added calibration the i8254 and the i586 clocks agains the RTC at boot
time. The results are currently ignored unless certain temporary options
are used.

Added sysctls to support reading and writing the clock frequency variables
(not the frequencies themselves). Writing is supposed to atomically
adjust all related variables.

machdep.c:
Fixed spelling of a function name in a comment so that I can log this
message which should have been with the previous commit.

Initialize `cpu_class' earlier so that it can be used in startrtclock()
instead of in calibrate_cyclecounter() (which no longer exists).

Removed range checking of `cpu'. It is always initialized to CPU_XXX
so it is less likely to be out of bounds than most variables.

clock.h:
Removed I586_CYCLECTR(). Use rdtsc() instead.

clock.c:
TIMER_FREQ is now a variable timer_freq that defaults to the old value of
TIMER_FREQ. #define'ing TIMER_FREQ should still work and may be the best
way of setting the frequency.

Calibration involves counting cycles while watching the RTC for one second.
This gives values correct to within (a few ppm) + (the innaccuracy of the
RTC) on my systems.


15507 01-May-1996 bde

i386/machdep.c
include/clock.h
isa/clock.c


15501 01-May-1996 bde

Don't return unused values in cpu_switch() or savectx().

Don't preserve unused registers in the NPX case in savectx().


15497 01-May-1996 bde

Only disable sio3 by default.


15478 30-Apr-1996 se

Make pcibus_check() ignore Device/Vendor IDs of all 0.


15472 30-Apr-1996 phk

pte.h: Add the VADDR(pdi,pti) macro to construct virtual address from
page dir+table index.
pmap.h: remove NUPDE, it was wrong and not used. Sanitize KSTKPTEOFF.
vmparam.h: Calculate virtual addr from PDI+PTI from pmap.h rather than
using magic math. Remove UPDT, not used.


15471 30-Apr-1996 phk

Remove a spurious mapping that was introduced earlier.


15453 29-Apr-1996 jkh

Add ATAPI_STATIC so that the ATAPI cdroms work correctly again
under -current.
Submitted-By: Serge Vakulenko <vak@cronyx.ru>


15428 28-Apr-1996 phk

Fix some bugs I introduced and some old ones as well.
Add BDE_DEBUGGER back.
Improve quality of comments.
Thanks Bruce!

Reviewed by: phk
Submitted by: bde


15403 26-Apr-1996 bde

Fixed a bug introoduced in the previous change. ISA device memory was
mapped to semi-random place(s) depending on the content(s) of physical
address 0xA0000. This was fatal at least on my system with a some
memory-mapped devices. Console syscons somehow wasn't affected. It
bogusly hardcodes the address. Sigh.


15392 26-Apr-1996 phk

A significant debogofication of locore.s. I havn't found any actualy
bugs, but it is a lot easier to navigate this twisted code now.


15379 25-Apr-1996 phk

Fix cpu_fork for real.

Suggested by: bde


15345 22-Apr-1996 nate

- add apm to the GENERIC kernel (disabled by default), and add some comments
regarding apm to LINT
- Disabled the statistics clock on machines which have an APM BIOS and
have the options "APM_BROKEN_STATCLOCK" enabled (which is default
in GENERIC now)
- move around some of the code in clock.c dealing with the rtc to make
it more obvios the effects of disabling the statistics clock

Reviewed by: bde


15340 22-Apr-1996 dyson

This fixes a troubling oversight in some of the pmap code enhancements.
One of the manifiestations of the problem includes the -4 RSS problem
in ps.

Reviewed by: dyson
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


15330 20-Apr-1996 gibbs

Change the devconf description from "ISA or EISA bus" to "ISA bus" now
that we have eisaconf.


15304 19-Apr-1996 phk

savectx returns through cpu_switch in case of the child, so it must
return void just like cpu_switch. Fix prototype and usage from machdep.c


15301 18-Apr-1996 phk

Fix a bogon. cpu_fork & savectx ecpected cpu_switch to restore %eax,
they shouldn't.


15282 18-Apr-1996 nate

Added a disabled psm0 (PS/2) mouse device, using the new 'disable'
keyword.


15232 13-Apr-1996 bde

Use PCB_SAVEFPU_SIZE instead of a too-small size in savectx(). This
bug only affected FPU emulators. It might have caused bogus FPU states
in core dumps and in the child pcb after a fork. Emulated FPU states
in core dumps don't work for other reasons, and the child FPU state
is reinitialized by exec, so the problem might not have caused any
noticeable affects.

Cleaned up #includes.


15231 13-Apr-1996 bde

Generate #define of PCB_SAVEFPU_SIZE for use in savectx().


15215 12-Apr-1996 phk

Make alltraps a .globl so that DDB doesn't make people belive they have
an ALIGNFLT on their hands all the time.


15204 11-Apr-1996 bde

Moved AUTO_EOI_[12] and most sio and pcvt options out of the makefile.


15180 10-Apr-1996 jkh

Disable sio3 in GENERIC - it messes with ATI cards.


15174 10-Apr-1996 nate

hp300 -> i386


15155 09-Apr-1996 jkh

Gag! Somebody removed the bus mouse from GENERIC for reasons unknown.
That certainly explains why I noticed it suddenly missing from the
2.2 SNAPSHOT! :-)


15147 08-Apr-1996 smpatel

Add a lock for DMA Channels to prevent two devices from using the same DMA
channel at the same time. The functions isa_dma_acquire() and
isa_dma_release() should be used in all ISA drivers which call isa_dmastart().
This can be used more generally to register the usage of DMA channels in
any driver, but it is required for drivers using isa_dmastart() and friends.

Clean up sanity checks, error messages, etc.
Remove isa_dmadone_nobounce(), it is no longer needed

Reviewed by: bde


15146 08-Apr-1996 wollman

Added a $Id$ keyword. Bruce still needs to put a copyright notice
on this file.


15123 07-Apr-1996 bde

Use breakpoint() function instead of inline assembler.


15122 07-Apr-1996 bde

Changed bdb() to breakpoint() and always enable it.

Made the style more consistent, especially for the new Pentium functions.


15117 07-Apr-1996 bde

Removed never-used #includes of <machine/cpu.h>. Many were apparently
copied from bad examples.


15116 07-Apr-1996 bde

Removed now-unused #includes of <machine/cpu.h>. They were for bootverbose
being declared in the wrong place.


15111 07-Apr-1996 bde

Moved declaration of bootverbose to a better place. It isn't
machine-dependent.

Moved declaration of cpu_fork() to a better place. Only its
implementation is machine-dependent.


15109 07-Apr-1996 bde

Fixed the ownership and permissions of /dev/io. Rev.1.32 broke rev.1.29.


15088 07-Apr-1996 dyson

Major cleanups for the pmap code.


15065 06-Apr-1996 dg

Switch 586/686 back to generic_bzero and #if 0'd the "optimized" code. It
turns out that it actually reduces performance in real-world cases.

Noticed by: bde


15054 05-Apr-1996 ache

Fix adjkerntz expression priority


15045 05-Apr-1996 ache

Add wall_cmos_clock sysctl variable, needed to manage adjkerntz even for
UTC cmos clocks (needed for Local Timezone FSes)


15018 03-Apr-1996 dyson

Fixed a problem that the UPAGES of a process were being run down
in a suboptimal manner. I had also noticed some panics that appeared
to be at least superficially caused by this problem. Also, included
are some minor mods to support more general handling of page table page
faulting. More details in a future commit.


14988 01-Apr-1996 scrappy

Convert from using devfs_add_devsw() to devfs_add_devswf()

Fixed Permissions/Ownership in DEVFS to reflect /dev


14967 31-Mar-1996 dg

Change if/goto into a while loop.


14944 31-Mar-1996 bde

Finished removing NOP macros.


14943 31-Mar-1996 bde

Moved rtcin() to clock.c.

Always delay using one inb(0x84) after each i/o in rtcin() - don't
do this conditional on the bogus option DUMMY_NOPS not being defined.
If you want an optionally slightly faster rtcin() again, then inline
it and use a better named option or sysctl variable. It only needs
to be fast in rtcintr().


14942 31-Mar-1996 bde

Killed religous FASTER_NOP again.


14919 29-Mar-1996 bde

Count PCI irqs in up to 4 ISAish counters named `pci irqnn' instead of
in the clk0 counter.

Reviewed by: s


14916 29-Mar-1996 bde

Parenthesized macros.

Fixed munged tabs.


14889 28-Mar-1996 wollman

>Blush<. Use the correct opcode for the WRMSR instruction.


14887 28-Mar-1996 wollman

Teach the disassembler about the 0f,3x family of instructions
(RDMSR, RDTSC, WRMSR, and RDPMC).


14868 28-Mar-1996 dyson

Remove a now unnecessary prototype from pmap.c. Also remove now
unnecessary vm_fault's of page table pages in trap.c.


14867 28-Mar-1996 dyson

Significant code cleanup, and some performance improvement. Also,
mlock will now work properly without killing the system.


14846 27-Mar-1996 bde

Fixed permissions of /devfs/*random.

Fixed group and permissions of /devfs/perfmon.


14837 27-Mar-1996 bde

Print stack pointer and frame pointer in trap messages.

Fixed "trace/trap" message.

Reviewed by: davidg


14836 27-Mar-1996 bde

Eliminated dependency on opt_sysvipc.h.


14835 27-Mar-1996 bde

Removed vestiges of dummy frame at top of tmpstk.

Use alignment macros where appropriate.

Cleaned up #includes.


14834 27-Mar-1996 bde

Fixed traceback for the following cases:
- legitimate null frames from idle() (traceback was aborted after a null
pointer trap)
- second instruction of normal function prologue, and last instruction of
a function (caller wasn't reported).

Reviewed by: davidg


14825 26-Mar-1996 wollman

Add support for Pentium and Pentium Pro performance counters.
(This code is as yet untested; to come after man page is written.)
This also adds inlines to cpufunc.h for the RDTSC, RDMSR, WRMSR, and RDPMC
instructions. The user-mode interface is via a subdevice of mem.c;
there is also a kernel-size interface which might be used to aid
profiling.


14773 23-Mar-1996 nate

Whoops, back out the last commit, which was accidentally committed at
the same time as the if_zp cleanup patch.

The commit that occurred was an incomplete patch for APM on my laptop
and needs more work.


14772 23-Mar-1996 nate

Now that ac->ac_ipaddr and arpwhohas() no longer exist, remove the
ifdef'd out code that used it.


14724 20-Mar-1996 jkh

Add vx0 device to GENERIC. Yes, I know that this bloats GENERIC, but
what can we do?


14691 19-Mar-1996 nate

Always enable interrupts before calling the APM idle/busy routines.

Suggested by: phk@FreeBSD.org


14646 17-Mar-1996 jkh

Add fe0 to the LINT and GENERIC files (hmmm - looks like my rcvs setup't
isn't supplying all the proper header info here! Last commit of fe0
entry should have had the following Submitted by line also).
Submitted-by: Masahiro SEKIGUCHI <seki@sysrap.cs.fujitsu.co.jp>


14607 13-Mar-1996 dyson

Make sure that we pmap_update AFTER modifying the page table entries.
The P6 can do a serious job of reordering code, and our stuff could
execute incorrectly.


14595 12-Mar-1996 dg

Killed some historical #define cruft that we've never used in FreeBSD:

UDOT_SZ
SYSPTSIZE
USRPTSIZE
MSGBUFPTECNT
DMMIN
DMMAX
DMTEXT
USRIOSIZE
VM_PHYS_SIZE


14577 12-Mar-1996 nate

Removed undocumented an unused APM_SLOWSTART code.


14550 11-Mar-1996 jkh

Add FAILSAFE option for selecting extra conservativeness when such
is more practical (like during installation). Correspondingly, set the
option by default in GENERIC now.


14527 11-Mar-1996 hsu

For Lite2: proc LIST changes.
Reviewed by: david & bde


14503 11-Mar-1996 hsu

Change type of code argument to sendsig from unsigned to u_long to make it
consistent w/ signalvar.h and kern_sig.c.
Reviewed by: davidg & bde


14470 10-Mar-1996 dyson

Improved efficiency in pmap_remove, and also remove some of the pmap_update
optimizations that were probably incorrect.


14451 10-Mar-1996 gibbs

Cleanse the SCSI subsystem of its internally defined types
u_int32, u_int16, u_int8, int32, int16, int8.
Use the system defined *_t types instead.


14447 10-Mar-1996 jkh

Don't print DMA busy messages - the sound code apparently runs
afoul of this without actually providing useful information and
works nonetheless.
Submitted by: Jim Lowe <james@miller.cs.uwm.edu>


14433 09-Mar-1996 dyson

Correct some new and older lurking bugs. Hold count wasn't being
handled correctly. Fix some incorrect code that was included
to improve performance. Significantly simplify the pmap_use_pt and
pmap_unuse_pt subroutines. Add some more diagnostic code.


14348 03-Mar-1996 jkh

USER_LDT changes for the Willows TwinXPDK toolkit. Only tested with WINE
since that's the only other USER_LDT using code that I know of.
Submitted by: Gary Jennejohn <Gary.Jennejohn@munich.netsurf.de>
Obtained from: {Origin of diffs may be someone else - I only rec'd them from
Gary}


14331 02-Mar-1996 peter

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


14328 02-Mar-1996 peter

Add more options into the conf/options and i386/conf/options.i386 files
and the #include hooks so that 'make depend' is more useful. This
covers most of the options I regularly use (but not all) and some other
easy ones.


14245 25-Feb-1996 dyson

Re-insert a missing pmap_remove operation.


14243 25-Feb-1996 dyson

Fix a problem with tracking the modified bit. Eliminate the
ugly inline-asm code, and speed up the page-table-page tracking.


14081 13-Feb-1996 phk

Correct & Update the printing of CPU features. We have printed rubbish
since version 1.117 when Garrett made the switch to %b. Updated to
reflect Intel AP-485 (241618-004).


13915 05-Feb-1996 dg

Unspam my changes in rev 1.54 that John spammed in rev 1.55.


13909 04-Feb-1996 dyson

Changed vm_fault_quick in vm_machdep.c to be global. Needed for
new pipe code.


13908 04-Feb-1996 dg

Rewrote cpu_fork so that it doesn't use pmap_activate, and removed
pmap_activate since it's not used anymore. Changed cpu_fork so that
it uses one line of inline assembly rather than calling mvesp() to
get the current stack pointer. Removed mvesp() since it is no longer
being used.


13852 02-Feb-1996 dg

Killed last change - it was bogus. cpu_switch() already assumes that
return address is on the stack.


13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


13758 30-Jan-1996 wollman

No longer use the cyclecounter to attempt to correct for late or missed
clock interrupts.

Keep a 1-in-16 smoothed average of the length of each tick. If the
CPU speed is correctly diagnosed, this should give experienced users
enough information to figure out a more suitable value for `tick'.


13740 30-Jan-1996 dg

savectx() strikes again: the saved stack pointer wasn't properly adjusted
to remove the return address. It's only the frame pointer and luck that
allowed the code to work at all.


13729 30-Jan-1996 dg

Increase tmpstk size to 8K and make certain it is longword aligned.


13646 27-Jan-1996 bde

Allocate DMA bounce buffers only when requested by drivers. Only the
fd and wt drivers need bounce buffers, so this normally saves 32K-1K
of kernel memory.

Keep track of which DMA channels are busy. isa_dmadone() must now be
called when DMA has finished or been aborted.

Panic for unallocated and too-small (required) bounce buffers.

fd.c:
There will be new warnings about isa_dmadone() not being called after
DMA has been aborted.

sound/dmabuf.c:
isa_dmadone() needs more parameters than are available, so temporarily
use a new interface isa_dmadone_nobounce() to avoid having to worry
about panics for fake parameters. Untested.


13644 27-Jan-1996 bde

Cleaned up unused #includes and some other historical cruft.
Sorted and KNFised declarations.


13611 24-Jan-1996 peter

Add commands for ptrace get/set registers.. (Same numbers as NetBSD)


13580 23-Jan-1996 dg

Simplified savectx() a little and fixed a bug that caused it to return
garbage in the child process rather than "1" like it is supposed to.

Reviewed by: bde


13543 21-Jan-1996 joerg

Initialize the cpu_class variable. This prevents i386 machines from
panicing with a privileged instruction fault early at boot time.
Submitted by: rock@wurzelausix.CS.Uni-SB.DE (D. Rock)


13509 20-Jan-1996 nate

Added a comment above the npx0 device line
# Mandatory, don't remove


13505 19-Jan-1996 phk

Reinstate AUTO_EOI_1. This did break suspend/resume on some portables.
In particular mine. We may want to make it a negative option to
keep GENERIC sane, ie NO_AUTO_EOI_1.


13495 19-Jan-1996 peter

Some trivial fixes to get it to compile again, plus some new lint:
- cpuclass should be cpu_class
- CPUCLASS_I386 should be CPUCLASS_386
(^^ those only show up if you compile for i386)
- two missing prototypes on new functions
- one missing static


13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


13454 16-Jan-1996 bde

Removed declarations of nonexistent functions.


13453 16-Jan-1996 ache

Since new bcd* macros not argument range overflow resistant,
fix argument overflow for years >= 2000


13446 15-Jan-1996 phk

Get rid of two and a half printf in the kernel.
Add more features to the one remaining to handle the job:
+ signed quantity.
# alternate format
- left padding
* read width as next arg.
n numeric in (argument specified) default radix.

Fix the DDB debugger to use these.
Use vprintf in debug routine in pcvt.

The warnings from gcc may become more wrong and intolerable because
of this.

Warning: I have not checked the entire source for unsupported or
changed constructs, but generally belive that there are only a few.

Suggested by: bde


13445 15-Jan-1996 phk

My wife is busy making me a new conical hat, so you don't need to
send any to me this time. Commited an old copy of this files where
the tables were swapped. Duh!.


13444 15-Jan-1996 phk

Soren called an said that I screwed up badly, so I backup until
I find out how... Sorry.


13438 15-Jan-1996 phk

Make bin2bcd and bcd2bin global macroes instead of having local
implementations all over the place.


13402 12-Jan-1996 bde

Fixed handling of Feb 29 in resettodr().


13350 08-Jan-1996 ache

Replace ugly year/month calculations in resettodr to more clean
variants, idea taken from NetBSD clock.c.
At least year calculation was wrong, pointed by Bruce.
Use different strategy to store year for BIOS without RTC_CENTURY


13314 07-Jan-1996 gibbs

Add comment about only needing on of either ahc, ncr, or ahb type
controllers to handle any number of devices.
Remove unnecessary extra units for these controllers.


13290 06-Jan-1996 peter

Choose a different name to hold the option definition.. The original one
was overlapping with another file, and making some undesirable behavior a
little worse - it's triggering a bug in config that appears to have been
there for some time (before the options files, anyway.)


13265 05-Jan-1996 wollman

Convert BOUNCE_BUFFERS and BOUNCEPAGES to new option scheme.


13228 04-Jan-1996 wollman

Convert DDB to new-style option.


13226 04-Jan-1996 wollman

Convert SYSV IPC to new-style options. (I hope I got everything...)
The LKMs will need an extra file, to come later.


13225 04-Jan-1996 wollman

convert the math emulation to use the new-style options.


13203 03-Jan-1996 wollman

Converted two options over to the new scheme: USER_LDT and KTRACE.


13157 01-Jan-1996 bde

Fixed user-mode mcount which I broke in the previous revision.
Do it the old way for now.

Moved recent additions around a lot to minimise ifdefs.

Added prototypes.


13130 31-Dec-1995 joerg

Restrict /devfs/io perms to 0600.

Nobody in our regular source tree, or in the non-distfile part of the
ports tree does use /dev/io anyway, so this might be replaced by
another scenario some day.


13125 30-Dec-1995 dg

In memory test, cast pointer as "volatile int *", not "int *" to make sure
that gcc doesn't cache the value used in the test. Pointed out by Erich
Boleyn <erich@uruk.org>.


13107 29-Dec-1995 bde

Implemented non-statistical kernel profiling. This is based on
looking at a high resolution clock for each of the following events:
function call, function return, interrupt entry, interrupt exit,
and interesting branches. The differences between the times of
these events are added at appropriate places in a ordinary histogram
(as if very fast statistical profiling sampled the pc at those
places) so that ordinary gprof can be used to analyze the times.

gmon.h:
Histogram counters need to be 4 bytes for microsecond resolutions.
They will need to be larger for the 586 clock.
The comments were vax-centric and wrong even on vaxes. Does anyone
disagree?

gprof4.c:
The standard gprof should support counters of all integral sizes
and the size of the counter should be in the gmon header. This
hack will do until then. (Use gprof4 -u to examine the results
of non-statistical profiling.)

config/*:
Non-statistical profiling is configured with `config -pp'.
`config -p' still gives ordinary profiling.

kgmon/*:
Non-statistical profiling is enabled with `kgmon -B'. `kgmon -b'
still enables ordinary profiling (and distables non-statistical
profiling) if non-statistical profiling is configured.


13095 29-Dec-1995 jkh

Make a couple of options that hurt when they're removed more
carefully noted.


13086 28-Dec-1995 dg

Made bzero a function vector and added a 586/686 optimized version of
bzero.
Deprecated blkclr (removed it).
Removed some old cruft from cpufunc.h.

The optimized bzero was submitted by Torbjorn Granlund <tege@matematik.su.se>
The kernel adaption and other changes by me.


13085 28-Dec-1995 dg

Made bzero a function vector and added a 586/686 optimized version of
bzero.
Deprecated blkclr (removed it).
Removed some old cruft from cpufunc.h.

The optimized bzero was submitted by Torbjorn Granlund <tege@matematik.su.se>
The kernel adaption and other changes by me.


13081 28-Dec-1995 dg

Fix one more label that I overlooked with the P6 support. Sigh.


13065 27-Dec-1995 dg

Update bcopyb & bcopy to reflect changes I made in the libc version of
bcopy:

Be smarter about handling overlapped copies and only go backwards if it
is really necessary. Going backwards on a P6 is much slower than forwards
and it's a little slower on a P5. Also moved the count mask and 'std'
down a few lines - it's a couple percent faster this way on a P5.


13058 27-Dec-1995 markm

random_machdep.c: New version, also includes revectored interrupts, rather
than hooking permanently.
vector.s: : Remove the interrupt hook. This is done dynamically, now.


13056 27-Dec-1995 markm

Modify the ioctl to handle revectored interrupts for the entropy gatherers.


13031 26-Dec-1995 bde

Removed almost all traces of libkern.a. The objects that were in
libkern.a are now specified by listing their source files in
files.${MACHINE}. The list is machine-dependent to save space.
All the necessary object for each machine must be linked into the
kernel in case an lkm wants one.


13014 25-Dec-1995 dg

Fix a lable goofup I made in the previous P6 support changes.


13004 25-Dec-1995 dg

Fix typo in CPUCLASS.


13002 24-Dec-1995 dg

Added device fxp0 (device driver for Intel EtherExpress Pro/100).


13001 24-Dec-1995 dg

Added I686_CPU.


13000 24-Dec-1995 dg

Add Pentium Pro CPU detection and special handling. For now, all the
optimizations we have for 586s also apply to 686s...this will be fine-
tuned in the future as appropriate.


12991 23-Dec-1995 dg

Made "AUTO_EOI_1" standard. auto-EOI on the master ICU is a documented
feature of the ICU. auto-EOI on the slave is not safe, however, so it
remains an option. Killed religious FASTER_NOP when writing the ICU.

Reviewed by: bde


12990 23-Dec-1995 dg

Use FASTER_NOP rather than NOP in rtcin() - only one inb delay was ever
needed.
Reviewed by: bde


12978 22-Dec-1995 bde

Staticized code that was hidden by `#ifdef DEBUG'.


12977 22-Dec-1995 bde

Increased the double fault stack size from 512 to PAGE_SIZE. This is
wasteful, but better than clobbering the variables below the stack.
About 300 bytes of variables were clobbered when I examined double
faults using ddb. Perhaps a page that is known not to be accessed by
the double fault handler could be used. Such pages are not easy to
find, since the double fault handler calls panic() which calls sync()
and possibly dumpsys().


12958 22-Dec-1995 dg

Fix a small logic bug that caused the arguments of the previous frame to
be used instead of the ones for the current frame if a breakpoint had been
set at the entry to a function.


12953 21-Dec-1995 julian

Reviewed by: peter (verbally :)
Move functions specific to mem.c to mem.c from conf.c


12952 21-Dec-1995 dg

Rewrote most of the ddb stack traceback code. These changes are smarter
about decoding trap/syscall/interrupt frames and generally works better
than the previous stuff.
Removed some special (incorrect) frobbing of the frame pointer that
was messing some things up with the new traceback code.


12941 20-Dec-1995 wollman

Increase Pentium cyclecounter calibration time to 131072 us. This
experimentally seems to give better results on my machine.


12930 19-Dec-1995 dg

Corrected a typo in a comment.


12929 19-Dec-1995 dg

Implemented a (sorely needed for years) double fault handler to catch stack
overflows.
It sure would be nice if there was an unmapped page between the PCB and
the stack (and that the size of the stack was configurable!). With the
way things are now, the PCB will get clobbered before the double fault
handler gets control, making somewhat of a mess of things. Despite this,
it is still fairly easy to poke around in the overflowed stack to figure
out the cause.


12905 17-Dec-1995 bde

Cleaned up prototypes in pmap headers: removed ones for nonexistent
functions; moved misplaced ones; restored most of KNFish formatting
from 4.4lite version; removed bogus __BEGIN/END_DECLS.


12904 17-Dec-1995 bde

Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.


12889 16-Dec-1995 peter

Catch a couple more null devsw dereferences...


12879 16-Dec-1995 bde

Completed function declarations and/or added prototypes and/or added
#includes to get prototypes.

pci now uses a different interrupt handler type for interrupts that it
dispatches and the isa interrupt handler type for the interrupts that
it handles.


12850 14-Dec-1995 bde

Added a prototype. Merged prototype lists.


12849 14-Dec-1995 bde

Added a prototype.


12848 14-Dec-1995 bde

Moved some more prototypes outside of ifdefs and grouped them together.


12844 14-Dec-1995 bde

Fixed staticization of DDB functions.


12827 14-Dec-1995 peter

GENERIC/LINT: Remove redundant quoting on some option lines.
LINT: add a couple of new/missing/undocumented options
files.i386: add linux code so that you can compile a kernel with static
linux emulation ("options LINUX")
i386/*: use #if defined(COMPAT_LINUX) || defined(LINUX) to enable static
support of linux emulation (just like "IBCS2" makes ibcs2 static)

The main thing this is going to make obvious, is that the LINUX code
(when compiled from LINT) has a lot of warnings, some of which dont look
too pleasant..


12819 14-Dec-1995 phk

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


12817 14-Dec-1995 phk

Make math_emulators LKMable.


12813 13-Dec-1995 julian

devsw tables are now arrays of POINTERS to struct [cb]devsw
seems to work hre just fine though I can't check every file
that changed due to limmited h/w, however I've checked enught to be petty
happy withe hte code..

WARNING... struct lkm[mumble] has changed
so it might be an idea to recompile any lkm related programs


12791 12-Dec-1995 gibbs

Have Eisa and PCI probes occur before ISA probes. Buslogic EISA and PCI cards
can be found in ISA compatibility mode by the ISA driver, but since the
EISA and PCI probes are non-invasive, we prefer them to find the card first.
Since both EISA and PCI probes can rely on interrupts, enable them before
probing of any type is performed. All ISA probes are still "protected" by
splhigh().


12789 12-Dec-1995 gibbs

Have bt0 entry specify "bt_isa_intr" for its vector. This one entry will
allow one EISA/ISA/PCI/VL Buslogic controller to be probed. The driver
is almost fully dynamic. It just needs some kdc work and for the SCSI code
to stop passing unit numbers up in the scsi_xfer struct.


12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


12749 10-Dec-1995 bde

Added pcvt option FAT_CURSOR.

Fixed comment about PCVT_VERSION=210.

Fixed tabs and trailing blanks.


12731 10-Dec-1995 bde

Removed new alias d_size_t for d_psize_t.

Removed old aliases d_rdwr_t and d_ttycv_t for d_read_t/d_write_t and
d_devtotty_t.

Sorted declarations of switch functions into switch order.

Removed duplicated comments and declarations of nonexistent switch
functions.


12724 10-Dec-1995 phk

Staticize and cleanup.


12722 10-Dec-1995 phk

Staticize and cleanup.
remove a TON of #includes from machdep.


12713 10-Dec-1995 bde

Restored used function fusword() (used by GPL math emulator).


12702 09-Dec-1995 phk

Remove various unused symbols and procedures.


12701 09-Dec-1995 phk

Move sysctl machdep.consdev to cons.c


12678 08-Dec-1995 phk

Julian forgot to make the *devsw structures static.


12675 08-Dec-1995 julian

Pass 3 of the great devsw changes
most devsw referenced functions are now static, as they are
in the same file as their devsw structure. I've also added DEVFS
support for nearly every device in the system, however
many of the devices have 'incorrect' names under DEVFS
because I couldn't quickly work out the correct naming conventions.
(but devfs won't be coming on line for a month or so anyhow so that doesn't
matter)

If you "OWN" a device which would normally have an entry in /dev
then search for the devfs_add_devsw() entries and munge to make them right..
check out similar devices to see what I might have done in them in you
can't see what's going on..
for a laugh compare conf.c conf.h defore and after... :)
I have not doen DEVFS entries for any DISKSLICE devices yet as that will be
a much more complicated job.. (pass 5 :)

pass 4 will be to make the devsw tables of type (cdevsw * )
rather than (cdevsw)
seems to work here..
complaints to the usual places.. :)


12673 07-Dec-1995 peter

The static prototype for setroot() was apparently accidently moved
into a block of code that was #ifdef CD9660, meaning you got a compile
failure if you didn't have the CD9660 filesystem configured.


12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


12649 06-Dec-1995 peter

Moving the kern.dumpdev sysctl handler from kern_sysctl.c to swapgeneric.c
is not real helpful since swapgeneric.c doesn't seem to be used, except
perhaps on a GENERIC kernel. (Sorry Paul.. :-)

I've moved it from swapgeneric.c to autoconf.c, since autoconf.c also deals
with dumpdev things. There may be a better place.....


12623 04-Dec-1995 phk

A major sweep over the sysctl stuff.

Move a lot of variables home to their own code (In good time before xmas :-)

Introduce the string descrition of format.

Add a couple more functions to poke into these marvels, while I try to
decide what the correct interface should look like.

Next is adding vars on the fly, and sysctl looking at them too.

Removed a tine bit of defunct and #ifdefed notused code in swapgeneric.


12608 03-Dec-1995 bde

__purified pmap_pte(). This seems to make no difference.


12607 03-Dec-1995 bde

Completed function declarations and/or added prototypes.


12604 03-Dec-1995 bde

Staticized.

Completed function declarations and added prototypes.

Cleaned up prototypes.

Cleaned up #includes.

Removed unused variable `dkn'.


12592 03-Dec-1995 bde

Moved inline functions for insque() and remque() to <sys/queue.h>.
Protected them with `#ifdef KERNEL' so that <sys/queue.h> is valid C++.
Added the necessary #includes of <sys/queue.h>.

These functions are bogus and should be replaced by the queue macros.


12589 03-Dec-1995 bde

Removed unused thread support (partly to get rid of its incomplete
function declarations).

Removed unused #includes (lots of vm ones).


12535 29-Nov-1995 nate

GENERIC - Add a commented out line for adding support for IBM ThinkPad
keyboards

LINT - Add SCANSET=2 support to the LINT kernel and comments reflecting it's
purpose.


12533 29-Nov-1995 wollman

Fix Pentium CPU rate diagnosis:
- Don't print out meaningless iCOMP numbers, those are for droids.
- Use a shorter wait to determine clock rate to avoid deficiencies
in DELAY().
- Use a fixed-point representation with 8 bits of fraction to store
the rate and rationalize the variable name. It would be
possible to use even more fraction if it turns out to be
worthwhile (I rather doubt it).

The question of source code arrangement remains unaddressed.


12521 29-Nov-1995 julian

If you're going to mechanically replicate something in 50 files
it's best to not have a (compiles cleanly) typo in it! (sigh)


12517 29-Nov-1995 julian

OK, that's it..
That's EVERY SINGLE driver that has an entry in conf.c..
my next trick will be to define cdevsw[] and bdevsw[]
as empty arrays and remove all those DAMNED defines as well..

Each of these drivers has a SYSINIT linker set entry
that comes in very early.. and asks teh driver to add it's own
entry to the two devsw[] tables.

some slight reworking of the commits from yesterday (added the SYSINIT
stuff and some usually wrong but token DEVFS entries to all these
devices.

BTW does anyone know where the 'ata' entries in conf.c actually reside?
seems we don't actually have a 'ataopen() etc...

If you want to add a new device in conf.c
please make sure I know
so I can keep it up to date too..

as before, this is all dependent on #if defined(JREMOD)
(and #ifdef DEVFS in parts)


12499 28-Nov-1995 peter

After having put on my Asbestos suit, complete the MFS_ROOT part of Terry's
mountroot changes. This means that the mfs_initminiroot functionality
into the root mfs_mount....


12471 24-Nov-1995 bde

Staticized. Moved some ero-initialized values to the bss.

Added prototypes.


12453 21-Nov-1995 bde

Completed function declarations and/or added prototypes.


12430 20-Nov-1995 bde

Quick fix for stat_imask and intr_mask[8] not having the RTC interrupt
bit set. I broke stat_imask in Dec 1994 and update_intr_masks() has
copied the breakage to intr_mask[8] since Mar 1995. This can cause
the RTC to stop interrupting in rare cases (under loads heavy enough
for a new RTC interrupt to occur at a critical time just before Xintr8
finishes handling the previous one) and may have caused worse problems.


12429 20-Nov-1995 phk

Mega commit for sysctl.
Convert the remaining sysctl stuff to the new way of doing things.
the devconf stuff is the reason for the large number of files.
Cleaned up some compiler warnings while I were there.


12417 20-Nov-1995 phk

Remove unused vars.


12366 18-Nov-1995 bde

Updated comments. The comments about the unused addresses get broken
almost every time someone uses an address. This file is probably not
the right place to keep track of the unused addresses (or used
addresses :->).

Fixed comments on #endif's to match code.

Added defines for ASC and GSC sizes. This file is not the right place
to keep track of scanner addresses, but while there here and we
pretend to keep track of unused addresses, the sizes need to be here
too.

Sorted IO_*SIZE defines.


12357 18-Nov-1995 bde

Fixed the type of vm_fault_quick() - don't convert types back and forth
through bogus immediate types.

Added prototypes.


12356 18-Nov-1995 bde

Fixed handling of trace traps when cons_unavail is set. Added comments
about handing of other cases.


12290 14-Nov-1995 phk

Fix a couple of printfs.


12243 12-Nov-1995 phk

The entire sysctl callback to read/write version. I havn't tested this as
much as I'd like to, but the malloc stunt I tried for an interim for
sure does worse.
Now we can read and write from any kind of address-space, not only
user and kernel, using callbacks.
This may be over-generalization for now, but it's actually simpler.


12223 12-Nov-1995 bde

Oops, forgot the following log message in the previous commit:

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


12220 12-Nov-1995 bde

Reviewed by:
Submitted by:
Obtained from:


12186 10-Nov-1995 phk

convert more sysctl variables.


12176 09-Nov-1995 gibbs

Change ahb device line to eisaconf syntax.


12104 05-Nov-1995 gibbs

Add eisa0 and remove ISA configuration line for ahc0.


12092 05-Nov-1995 gibbs

Remove old eisaconf cruft from the eisa files. The old eisaconf kludged
in here to do some conflict detection. The new code doesn't do conflict
detection yet, but it will be implemented in another way.

aic7770.c moved to i386/eisa


12091 05-Nov-1995 gibbs

Modifications for the new eisaconf.


12080 04-Nov-1995 bde

Added `#include "ioconf.h"' to <machine/conf.h> and cleaned up the
misplaced extern declarations (mostly prototypes of interrupt handlers)
that this exposed. The prototypes should be moved back to the driver
sources when the functions are staticalized.

Added idempotency guards to <machine/conf.h>. "ioconf.h" can't be
included when building LKMs so define a wart in bsd.kmod.mk to help
guard against including it.


12078 04-Nov-1995 markm

Remove the #ifdev DEVRANDOM's, as promised.

/dev/random is now a part of the kernel! you will need to make
the device in /dev: sh MAKEDEV random
and take a look at some test code in src/tools/test/random.


12072 04-Nov-1995 bde

Finished(?) moving prototypes for devswitch functions to <machine/conf.h>.
One was hidden in an ifdef.

Continued cleaning up not so new init stuff.

Removed some more /*ARGSUSED*/ for devswitch functions.


12008 02-Nov-1995 peter

When the sync-on-shutdown fails to clear all buffers, this bit of code
can print them out.
I have seen that MFS can leave BUSY buffers, preventing a clean reboot...


11978 31-Oct-1995 peter

We no longer need the spltty() == splimp() hack if PPP is configured into
the kernel. ppp_tty.c goes to some lengths to minimise the inter-layer
calling (including a soft ISR). ppp_tty.c takes care of the soft masking
that was needed still.

(I've discovered that bugs in this area show up within an hour if the
masking was not correct.. :-} This combination has proven stable on
specialix serial ports, although there was some concern about the softtty
parts of sio/cy and netisr colliding - but Bruce has fixed that now)


11963 31-Oct-1995 peter

Add a simplistic netisr register routine - I need this now for ppp-2.2.


11955 31-Oct-1995 joerg

Include the "od" driver.


11946 30-Oct-1995 markm

Security fix - do not allow anyone but root to choose the interrupts used
in the the randomising process.
(This is a change to the /dev/random ioctl()))


11940 30-Oct-1995 bde

Removed bogus statics in declarations that don't allocate storage.

Added prototypes.


11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


11919 29-Oct-1995 bde

Fix mmioctl() for !DEVRANDOM case. mmioctl() is a function, not a
pointer to a function.


11875 28-Oct-1995 markm

Theodore Ts'po's random number gernerator for Linux, ported by me.
This code will only be included in your kernel if you have
'options DEVRANDOM', but that will fall away in a couple of days.
Obtained from: Theodore Ts'o, Linux


11872 28-Oct-1995 phk

Remove unused functions and variables, make things static, and other cleanups.


11865 28-Oct-1995 phk

Sorry, the last commit screwed up for me, this is the right one (I hope!)
Please refer to the previous commit message about sysctl variables.


11783 25-Oct-1995 jkh

Stable matcd port to 0x230, as per request by Bruce and Frank.
Submitted by: Frank Durda IV <uhclem@fw.ast.com>


11703 23-Oct-1995 dg

Remove PG_W bit setting in some cases where it should not be set.

Submitted by: John Dyson <dyson>


11693 23-Oct-1995 dg

More improvements to the logic for modify-bit checking. Removed
pmap_prefault() code as we don't plan to use it at this point in time.

Submitted by: John Dyson <dyson>


11670 22-Oct-1995 bde

Only allow `sensitive' devices for displays in find_display(). This is
a quick fix for syscons deciding not to become the console because it
thinks another tty device has priority.


11642 22-Oct-1995 dg

Simplified some expressions.


11602 21-Oct-1995 phk

A mixed bag of changes, relating to getting the state in "lsdev" right,
and pccard support to work sensibly. Better by far, but still not good.


11552 17-Oct-1995 se

Make CONF1_ENABLE_MSK1 even less restriktive: Ignore slot ID ...


11544 17-Oct-1995 se

At least the ASUS Triton motherboards don't disable the PCI bus configuration
accesses after the BIOS bus scan. The previous revision made the assumption,
that every PCI motherboard did ...

Change the test on the initial value of the CONF1_ADDR_PORT register in a way
that makes the probe succeed on triton based motherboards, without breaking
the EISA motherboard that has some non-PCI register at the same address.


11524 15-Oct-1995 se

Go back to separate tests for configuration mechanism 1 and mechanism 2.
Require the state of the configuration enable bits to be OFF assuming
that the BIOS left them that way, as it should anyway to avoid bad things
to happen.

The tests themselves are copied from the previous release, with the
exception of CONF1_ENABLE_MSK1 having the LSB set. This bit should be
read back as '0', since only DWORD addresses are legal.


11523 15-Oct-1995 phk

Pull all of libkern.a in (though not mcount) so the LKM's don't come
out shorthanded. Makes the idea of libkern pretty void now...


11452 12-Oct-1995 wollman

Reduce jitter of Pentium microtime() implementation by letting the counter
free-run and doing a subtract in microtime() rather than resetting the
counter to zero at every clock tick. In combination with the changes to
kern_clock.c, this should eliminate all the immediately obvious sources
of systematic jitter in timekeeping on Pentium machines.


11390 10-Oct-1995 bde

Include <sys/sysproto.h> so that machdep.c compiles cleanly again
(the prototype for sync() moved).

KNFize and otherwise clean up printing of BIOS geometries.

Add prototypes.

Continue cleaning up new init stuff.


11378 09-Oct-1995 se

Fix bad typo: CONF1_ENABLE_RES1 was written CONF1_ENABLE_CHK1 ...


11343 09-Oct-1995 bde

Fix tracing of syscalls. The previous fix required the undocumented
option DDB_NO_LCALLS to stop ddb getting control and broke all ddb
tracing. Now there is no option and no way for ddb to trace at
address _Xsyscall or to _Xsyscall, but tracing everywhere else
works. The previous fix did unnecessary things for Linux syscalls.

Don't bother checking that syscall frames are for user mode.

Make debugger traps inside the kernel (except at addresses _Xsyscall
and _Xsyscall+1) fatal if ddb is not configured. They "can't happen".

Add prototypes.

Remove stupid comments, e.g., /*ARGSUSED*/ for args that are used.


11222 05-Oct-1995 phk

remove GCC divsi3 routines which are never used.


11163 04-Oct-1995 julian

Submitted by: Juergen Lock <nox@jelal.hb.north.de>
Obtained from: other people on the net ?

1. stepping over syscalls (gdb ni) sends you to DDB, and returned
to the wrong address afterwards, with or without DDB. patch in
i386/i386/trap.c below.

2. the linux emulator (modload'ed) still causes panics with DIAGNOSTIC,
re-applied a patch posted to one of the lists...


11116 01-Oct-1995 dg

Insert zeroed pages at the head of the zero queue rather than at the tail.
A measurable performance improvement results from the potential for the
page to be partially cached when it is eventually used.


10960 22-Sep-1995 se

New approach to the PCI bus configuration mechanism probe problem:
- try to make sure there is any kind of PCI device
- if there is anything at port 0x0cf8, then check for mech. 1 or 2


10924 20-Sep-1995 dg

Fix rounding bug in last commit that would have caused the problem to not
be completely fixed.


10910 19-Sep-1995 bde

Fix benign type mismatches in isa interrupt handlers. Many returned int
instead of void.


10887 18-Sep-1995 se

Revert most changes of previous commit.
Changes relative to 1.12:
- Put extra instruction between outl()/inl() sequence to prevent the
old value being read back because of the bus capacitance.
- Additional check for existence of register at CONF2_ENABLE_PORT.


10826 16-Sep-1995 pst

Our existing Cyrix cache-disable code was short-cutting the steps for
setting the control register. Make the read and write operations two
completely separate steps.

While we're at it, pull in the whole set of Cyrix cache control options
from NetBSD-current, since a few motherboards do the right thing with
the Cyrix chip.

There is no option to disable the internal cache completely (yet).

Reviewed by: pst
Obtained from: NetBSD


10812 15-Sep-1995 dg

Check for page being resident when doing I/O with /dev/kmem and return
EFAULT if it is not resident. This prevents the system from manufacturing
a zero-fill page for unused but allocated areas of the kernel's VM. This
should fix the "CMAP busy" panic that some people saw during system
startup.


10807 15-Sep-1995 se

Another try to determine the PCI bus configuration mode (and whether
there is a PCI bus at all) ...

- Do not expect the chip sets to follow even very clearly expressed
requirements of the PCI 2.0 spec.
- Do not read back the value just written to an I/O port without making
sure that some other data have crossed the bus in between ...


10782 15-Sep-1995 dg

1) Killed 'BSDVM_COMPAT'.
2) Killed i386pagesperpage as it is not used by anything.
3) Fixed benign miscalculations in pmap_bootstrap().
4) Moved allocation of ISA DMA memory to machdep.c.
5) Removed bogus vm_map_find()'s in pmap_init() - the entire range was
already allocated kmem_init().
6) Added some comments.

virual_avail is still miscalculated NKPT*NBPG too large, but in order to
fix this properly requires moving the variable initialization into locore.s.
Some other day.


10763 15-Sep-1995 dg

Killed isa_allocphysmem() and isa_freephysmem(). They are completely used
functions. This file is disgusting; the isa DMA stuff is especially bad and
should be rewritten.


10762 15-Sep-1995 dg

1) Don't double map the kernel page tables. The double mapping was never
used and went a long way toward confusing the code.
2) Fix proc0's initial stack to not be 48 bytes smaller than it needs to
be.
3) Correct comment about 'first' arg to init386().


10735 14-Sep-1995 se

Improved verification of configuration space accesses working:
Scan for devices instead of assuming that device 0 is present on bus 0
of every PCI motherboard.


10710 13-Sep-1995 se

Make the PCI host bridge probe code more robust when dealing with chip sets
that use configuration mode 1, but still violate the PCI 2.0 specs ...
(Required for the Compaq Proliant, for example.)


10666 10-Sep-1995 bde

Make pcvt and syscons live in the same kernel. If both are enabled, then
the first one in the config has priority. They can be switched using
userconfig().

i386/i386/conf.c:
Initialize the shared syscons/pcvt cdevsw entry to `nx'.

Add cdevsw registration functions.

Use devsw functions of the correct type if they exist.

i386/i386/cons.c:
Add renamed syscons entry points to constab.

i386/i386/cons.h:
Declare the renamed syscons entry points.

i386/i386/machdep.c:
Repeat console initialization after userconfig() in case the current
console has become wrong. This depends on cn functions not wiring down
anything important.

sys/conf.h:
Declare new functions.

i386/isa/isa.[ch]:
Add a function to decide which display driver has priority. Should be
done better.

i386/isa/syscons.c:
Rename pccn* -> sccn*.

Initialize CRTC start address in case the previous driver has moved it.

i386/isa/syscons.c, i386/isa/pcvt/*
Initialize the bogusly shared variable Crtat dynamically in case the
stored value was changed by the previous driver.

Initialize cdevsw table from a template.

Don't grab the console if another display driver has priority.

i386/isa/syscons.h, i386/isa/pcvt/pcvt_hdr.h:
Don't externally declare now-static cdevsw functions.

i386/isa/pcvt/pcvt_hdr.h:
Set the sensitive hardware flag so that pcvt doesn't always have lower
priority than syscons. This also fixes the "stupid" detection of the
display after filling the display with text.

i386/isa/pcvt/pcvt_out.c:
Don't be confused the off-screen cursor offset 0xffff set by syscons.

kern/subr_xxx.c:
Add enough nxio/nodev/null devsw functions of the correct type for syscons
and pcvt.


10665 10-Sep-1995 bde

cons.c:
Split off cdevsw initialization in cninit() into a new function
cninit_finish() that isn't called until all hardware device drivers
have been attached. The bdevsw entry of the driver for the physical
console needs to be hooked after the physical driver has been
attached in case the attachment modified the entry.

Rearrange cninit() to avoid changing cn_tab until the driver for the
physical console has been initialized, so that the previous driver
(if any) can be used for debugging.

Start removing half-baked lint support. bdevsw functions usually have
unused args but /*ARGSUSED*/ was used for only about 5% of them.

cons.h:
Declare cn_init_finish().

autoconf.c:
Call cn_init_finish().

Start adding prototypes. Functions with bogus linkage (extern where
static is probably should be static) are explicitly declared as extern
so that the can be found easily (extern in a non-header is usually
wrong).

All:
Continue cleaning up init stuff: init functions shall be static;
INITs should be at the start of files...


10653 09-Sep-1995 dg

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


10624 08-Sep-1995 bde

Fix benign type mismatches in devsw functions. 82 out of 299 devsw
functions were wrong.


10616 08-Sep-1995 dg

1) Really print 'real' memory - use Maxmem, not physmem.
2) Output K bytes instead of pages as this means something to more people.
3) Moved printf of avail memory to after vm_bounce_init() call so that
bounce buffers are included in the figure.
4) Killed initcpu(); it's an unused vestige from the VAX.


10615 08-Sep-1995 julian

Submitted by: Luigi Rizzo (luigi@iet.unipi.it)
Obtained from: Luigi Rizzo and Gunther Schadow
Kernel support for the asc scanner driver


10609 07-Sep-1995 dg

Minor cleanup and (very) small micro optimization to Xsyscall (and the
linux one)..


10594 06-Sep-1995 wpaul

Put back the "real memory =" printf() that vanished when the code to
handle holes in memory was added.


10546 03-Sep-1995 dyson

Machine dependent routines to support pre-zeroed free pages. This
significantly improves demand zero performance.


10537 03-Sep-1995 julian

devfs changes..
changes to allow devices that don't probe (e.g. /dev/mem)
to create devfs entries
this required giving 'configure' its own SYSINIT entry
so we could duck in just before it with a DEVFS init
and some device inits..
my devfs now looks like:
./misc
./misc/speaker
./misc/mem
./misc/kmem
./misc/null
./misc/zero
./misc/io
./misc/console
./misc/pcaudio
./misc/pcaudioctl
./disks
./disks/rfloppy
./disks/rfloppy/fd0.1440
./disks/rfloppy/fd1.1200
./disks/floppy
./disks/floppy/fd0.1440
./disks/floppy/fd1.1200
also some sligt cleanups.. DEVFS needs a lot of work
but I'm getting back to it..


10431 30-Aug-1995 bde

Declare vfs_mountroot() in the right place.


10425 29-Aug-1995 bde

Remove relocation of Crtat. Drivers already relocate it (somewhat
bogusly). We used to undo the driver relocation here before doing
a somewhat less bogus relocation. The result was a null relocation
here.


10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


10342 26-Aug-1995 bde

Remove "memory" clobber statement from enable_intr(). Enabling interrupts
doesn't invalidate loaded variables.

Fix formatting of recent changes.


10268 25-Aug-1995 bde

Remove extra args from the calls to getit(). The bug was benign with the
default function call convention.


10157 21-Aug-1995 dg

A couple of micro optimizations to improve NULL syscall performance by
about 2%.


10126 20-Aug-1995 dg

Fixed a few bugs and annoyances with boot():

1) deal with cold flag better
2) check for key input more often
3) get rid of unused variables
4) minor formatting improvements


10097 18-Aug-1995 jkh

Bring in Serge Vakulenko's IDE CDROM (ATAPI) driver. A number of
people have now indicated to me that it's working more than well
enough to bring into -current.
Submitted by: Serge Vakulenko <vak@cronyx.ru>


10092 17-Aug-1995 dg

Killed some unused stuff inherited from Bill Jolitz. Note that since
this changes the size of the pcb struct, gdb will need to be rebuilt
or debugging won't work correctly.

Reviewed by: Bruce Evans


10063 15-Aug-1995 bde

Fake a call frame for traps so that `gdb -k' can report where fatal
traps occurred. This also helps ddb backtrace through trap frames.
Backtracing through syscall and interrupt frames still doesn't work
but it is relatively unimportant and more expensive to fix.


10004 08-Aug-1995 dyson

Make the spl oriented inline functions less likely to allow
potentially volatile memory to be kept in registers during
the "call" (inline expansion.) Do the same for pmap_update.


9799 30-Jul-1995 dg

Fix a bug in my disabled version of trap_pfault()...curpcb may be NULL even
when curproc isn't. This condition occurs at system startup and perhaps
at other times.


9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


9744 28-Jul-1995 dg

Fixed bug I introduced with the memory-size code rewrite that broke
floppy DMA buffers...use avail_start not "first". Removed duplicate
(and wrong) declaration of phys_avail[].

Submitted by: Bruce Evans, but fixed differently by me.


9714 25-Jul-1995 bde

Fix bogus constraint "i" that only worked with -O. The cases where it
didn't work are somewhat bogusly optimized away before the constraint
is checked. We still expect constants passed to inline functions to
remain constant, but if the compiler ever decides that they aren't
constant then it will just generate slightly slower code instead of
an error.


9578 19-Jul-1995 dg

Rewrote memory sizing code to generally deal with holes in extended memory.
This code change should allow certain Compaq machines with a 128K hole
at 16MB to work.


9550 16-Jul-1995 peter

This fixes a compiler warning, and a cosmetic problem with the linux
emul code when compiling with "options KTRACE".
ktrsyscall() was expecting an array of integers, this was passing the
address of a structure containing an array of integers..
The cosmetic problem was that it was calling the "enter syscall"
trace hook twice - this looks like a cut/paste error/typo.


9547 16-Jul-1995 phk

Reviewed by: phk
Submitted by: Andrew McRae <andrew@mega.com.au>

Some initial commits from the pcmcia stuff, to make life easier for the
testers.

We will use the name "pccard" since that is really the buzzword at present.


9546 16-Jul-1995 phk

Make the bootinfo structure visible from sysctl.
This can be used in libdisk to guess a better bios-geometry.


9545 16-Jul-1995 joerg

Include ``options POWERFAIL_NMI'' for owners of older (non-apm)
notebooks where a powerfail condition (external power drop; battery
state low) is signalled by an NMI. Makes it beep instead of panicing.

Reviewed by: davidg


9533 16-Jul-1995 dg

Truncate the fault address to a page boundry when calling vm_fault(). The
last change to fix the fault-twice bug with page tables wasn't quite
complete.


9524 14-Jul-1995 dg

Fixed bug that caused page tables to be faulted twice instead of once.

Submitted by: John Dyson


9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


9379 30-Jun-1995 se

The PCI config mechanism 1 test failed for the Intel Aries.
Make it less strict ...

Submitted by: NIIMI Satoshi <sa2c@and.or.jp>


9360 28-Jun-1995 se

PCI configuration mechanism now determined by a method, that doesn't
fail on new hardware (Compaq Prolinea and Compaq Prosignea), and that
doesn't erroneously identify old mech. 2 chip sets as using mech. 1.
(See section 3.6.4.1.1 of the PCI bus specs rev. 2.0)


9345 28-Jun-1995 dg

Killed redundant vnode_pager_umount() call. This is already done at
FS unmount time.


9344 28-Jun-1995 dg

Make path to kernel absolute if it is passed in relative. This fixes
a related bug in some of the new 'foo'boot bootstrap code that has been
added over the past months. This change makes it no longer necessary
for the bootstrap to fix up the path (i.e. it can be removed).


9343 28-Jun-1995 bde

Fix standards conformance bugs in <signal.h>:

include/signal.h:
There was massive namespace pollution from including <sys/types.h>.
POSIX functions were declared even when _ANSI_SOURCE is defined.

sys.sys/signal.h:
NSIG was declared even if _ANSI_SOURCE or _POSIX_SOURCE is defined.
sig_atomic_t wasn't declared if _POSIX_SOURCE is defined.
Declare a typedef for signal handling functions and use it to
unobfuscate declarations and to avoid half-baked function types
that cause unwanted compiler warnings at certain warning levels.
Fix confusing comment about SA_RESTART.

sys/i386/include/signal.h:
This has to be included to get the declaration of sig_atomic_t even
when _ANSI_SOURCE is defined, so be more careful about polluting
the ANSI namespace.

Uniformize idempotency ifdefs.


9326 26-Jun-1995 bde

Partially fix `sysctl machdep.console_device'. The fix will be complete
when syscons stops mapping the console to minor MAXCONS. There is
usually no corresponding device in /dev, and the correct device has
minor 0.

cons.c:
Initialize cn_tty properly, so that CPU_CONSDEV can work.
Comment about too many variants of the console tty pointer.

machdep.c:
Return device NODEV and not error EFAULT when there is no console device.


9223 14-Jun-1995 bde

Convert to ANSI C: change #endif THING to #endif /* THING */.
Fix one such THING in code to match comment.
Sort IO_GSC* into numeric order and update comments about the gaps.
Sort common SCSI addresses into alphabetical order.
Remove bogus comments about com ports having i/o size 4.
Uniformize whitespace.
Uniformize case in hex digits.

This file is very incomplete. In particular, it doesn't mention any
network cards. This doesn't matter much for the base addresses, but
it means that the comments about which addresses are free are mostly
bogus. The i/o sizes are unreliable because of split address ranges
for many devices (VGA, wd). The i/o sizes are incomplete. In
particular, there are no sizes for SCSI controllers. The bt driver
still returns a truth value instead of a size.


9202 11-Jun-1995 rgrimes

Merge RELENG_2_0_5 into HEAD


8876 30-May-1995 rgrimes

Remove trailing whitespace.


8833 29-May-1995 dg

Fix setdumpdev():
- the major number wasn't checked, so accesses beyond the end of bdevsw[]
were possible. Bogus major numbers are easy to get because `sysctl -w'
doesn't handle dev_t's reasonably - it doesn't convert names to dev_t's
and it converts the number 1025 to the dev_t 0x35323031.
- Driver d_psize() functions return -1 to indicate error ENXIO or ENODEV
(the interface is too braindamaged to say which). -1 was interpreted
as a size and resulted in the bogus error ENOSPC.
- it was possible to set the dumpdev for devices without a d_psize()
function. This is equivalent to setting the dumpdev to NODEV except
it confuses sysctl.
- change a 512 to DEV_BSIZE. There is an official macro dtoc() for
converting "pages" to disk blocks but it is never used in /usr/src/sys.
There is much confusion between PAGE_SIZE sized pages and NBPG sized
pages. Maxmem consists of both.

Not fixed:
- there is nothing to invalidate the dumpdev if the media goes away.
This reduces the benefits of the early calculation of dumplo. Bounds
checking in the dump routines is relied on to reduce the risk of
damage and little would be lost by relying on the dump routines to
calculate dumplo.
- no attempt is made to stay away from the start of the device to
avoid clobbering labels.

Fix wrong && anachronistic comment about the type of bootdev.

Reviewed by: davidg
Submitted by: Bruce Evans


8748 25-May-1995 dg

Made "NMBCLUSTERS" calculation dynamic and fixed bogus use of "NMBCLUSTERS"
in machdep.c (it should use the global nmbclusters). Moved the calculation
of nmbclusters into conf/param.c (same place where nmbclusters has always
been assigned), and made the calculation include an extra amount based
on "maxusers". NMBCLUSTERS can still be overrided in the kernel config
file as always, but this change will make that generally unnecessary. This
fixes the "bug" reports from people who have misconfigured kernels seeing
the network hang when the mbuf cluster pool runs out.

Reviewed by: John Dyson


8590 18-May-1995 dg

Added "BROKEN_KEYBOARD_RESET" option to disable using the keyboard reset
in cpu_reset(). Some MBs don't deal with this properly.

Submitted by: Rod Grimes


8521 14-May-1995 dg

Added apersand constraint to make sure that the source and destination
registers aren't combined.

Reviewed by: Bruce Evans and David Greenman
Submitted by: John Dyson


8504 14-May-1995 dg

Changed swap partition handling/allocation so that it doesn't
require specific partitions be mentioned in the kernel config
file ("swap on foo" is now obsolete).

From Poul-Henning:

The visible effect is this:

As default, unless
options "NSWAPDEV=23"
is in your config, you will have four swap-devices.
You can swapon(2) any block device you feel like, it doesn't have
to be in the kernel config.

There is a performance/resource win available by getting the NSWAPDEV right
(but only if you have just one swap-device ??), but using that as default
would be too restrictive.

The invisible effect is that:

Swap-handling disappears from the $arch part of the kernel.
It gets a lot simpler (-145 lines) and cleaner.

Reviewed by: John Dyson, David Greenman
Submitted by: Poul-Henning Kamp, with minor changes by me.


8488 13-May-1995 jkh

"1 easy fix in 10 excrutiating steps"

A phone call from Manfred quickly pointed up the fact that I got the conflict
check backwards. NOW we implement the conflict checking correctly! Wheesh!


8481 12-May-1995 wollman

The death of `options NODUMP'. Now the dump area can be dynamically
configured (and unconfigured) on the fly. A sysctl(3) MIB variable is
provided to inspect and modify the dump device setting.


8456 11-May-1995 rgrimes

Fix -Wformat warnings from LINT kernel.


8448 11-May-1995 bde

Add variable `idelayed' and macros setdelayed() and schedsofttty()
to access it. setdelayed() actually ORs the bits in `idelayed' into
`ipending' and clears `idelayed'.

Call setdelayed() every (normal) clock tick to convert delayed
interrupts into pending ones.

Drivers can set bits in `idelayed' at any time to schedule an interrupt
at the next clock tick. This is more efficient than calling timeout().
Currently only software interrupts can be scheduled.


8446 11-May-1995 bde

Add loadandclear(). It atomically loads a value from memory, clears the
value in memory and returns the original value.


8434 11-May-1995 jkh

Pass me the pointed chapeau - this typo somehow got through my testing.


8433 11-May-1995 wpaul

If you config a kernel with 'config kernel swap generic' and try to
boot diskless with it, you get a panic because setconf() is only
called for mountroot == ffs_mountroot. It really needs to be called
no matter what manner of rootfs we have. I can't really say if
swapgeneric will work with a CD-ROM though. (I get the feeling I'm
the only one who uses swapgeneric these days anyway.)


8431 11-May-1995 jkh

Remove all vestiges of the ALLOW_CONFLICT_FOO evil and replace it with
something slightly less evil - a per device conflict flag.


8427 11-May-1995 wollman

Delete two debugging printfs that mistakenly crept in.


8426 11-May-1995 wollman

Make networking domains drop-ins, through the magic of GNU ld. (Some day,
there may even be LKMs.) Also, change the internal name of `unixdomain'
to `localdomain' since AF_LOCAL is now the preferred name of this family.
Declare netisr correctly and in the right place.


8265 04-May-1995 dg

Correct the definition for the (unused) cpu_setstack().


8214 02-May-1995 dg

Added a memcpy() routine.


8213 02-May-1995 phk

A missing 'and', probably my fault.

Submitted by: Ed Hudson <elh@p5.spnet.com>


8211 01-May-1995 dyson

Fixed a problem that can cause left-over pv_entries and as
as side-effect, removed some legacy code that was necessary
when we called vm_fault inside of vm_fault_quick instead of using
the kernel/user space byte move routines.


8074 26-Apr-1995 rgrimes

Add outb to keyboard controller to do a cpu_reset, this fixes 2 known
cases of motherboards that failed to reboot.


8055 25-Apr-1995 phk

Add support for MFS root filesystem.


8042 24-Apr-1995 phk

Added "bio" to matcd.


8015 23-Apr-1995 julian

hmm spotted a difference resulting from a merge I didn't examine close enough


8014 23-Apr-1995 julian

include hooks for EISA configuration (possibly wrong :)


8007 23-Apr-1995 phk

Forgot this commit the other day. The receiving end of the "boot -C" option.


7994 22-Apr-1995 wpaul

Tiny printf formatting change: if we have no cpu_vendor or cpu_id info,
don't generate a newline. (Yeah, I'm picking nits, but that empty line
I get on my 386 just looks dumb, okay? :)


7950 20-Apr-1995 phk

Add wd2 and wd3 as swap-devices too.


7930 18-Apr-1995 rgrimes

Reapply my fix for this:
Output the CPU features line during the probe on a seperate line, for
folks with lots of features the output use to wrap and look ugle.


7908 17-Apr-1995 phk

Print the BIOS geometries in a human-readable format.


7874 16-Apr-1995 dg

Remove gratuitous waste of 2K of memory for BIOS variables. We never load
the kernel at 0-640k; we haven't had the ability to do that since before
2.0R. Furthermore, I fail to see how putting an instruction at 0 and then
doing a .org 0x500 is going to prevent the stuff from getting clobbered
in the first place; a.out is just too stupid to know about sparse address
spaces.


7852 15-Apr-1995 bde

Don't waste time sending an EOI to ICU1 if option AUTO_EOI_1 is defined.
Previously, this worked right if both AUTO_EOI_1 and AUTO_EOI_2 are
defined, but not if AUTO_EOI_1 is defined and AUTO_EOI_2 is not defined.
The latter case should be the default. DUMMY_NOPS should be the default
too. Currently there are only two NOPs slowing down rtcin() (although
there are no delays in writertc()) and several FASTER_NOPs slowing down
interrupt handling in vector.s.

Fix stack offsets for the (previously) unused untested
FAST_INTR_HANDLER_USES_ES case.


7818 14-Apr-1995 dufault

Add scsi target. Add "after config" call to autoconf so that scsi
targets will be configured after all scsi busses have been configured.


7814 14-Apr-1995 wpaul

Hopefully I won't get flamed for this: insert a few more #if defined(I486_CPU)
and #if defined (I586_CPU) thingies into identifycpu() so that we only
compile in what's actually needed for a given CPU. So far as I can tell,
none of my 386 machines generate a cpu_vendor code, so I made the extra vendor
and feature line conditional on I486_CPU and I586_CPU. (Otherwise we
print out a blank line which looks silly.)


7792 13-Apr-1995 wpaul

This a subtle reminder to people that not everybody compiles their
kernels with 'options I586_CPU.'

The declaration for pentium_mhz is hidden inside an #ifdef I586_CPU,
but machdep.c refers to it whether I586_CPU is defined or not. This
temporary hack puts the offending code inside an #ifdef I586_CPU as
well so that a kernel without it will successfully compile.

I must emphasize the word 'temporary:' somebody needs to seriously
beat on the identifycpu() function with an #ifdef stick so that
I386_CPU, I486_CPU and I586_CPU will do the right things.


7780 12-Apr-1995 wollman

Add a class field to devconf and mst drivers.
For those where it was easy, drivers were also fixed to call
dev_attach() during probe rather than attach (in keeping with the
new design articulated in a mail message five months ago). For
a few that were really easy, correct state tracking was added as well.
The `fd' driver was fixed to correctly fill in the description.
The CPU identify code was fixed to attach a `cpu' device. The code
was also massively reordered to fill in cpu_model with somethingremotely
resembling what identifycpu() prints out. A few bytes saved by using
%b to format the features list rather than lots of ifs.


7746 10-Apr-1995 phk

I got that wrong,
lnc0 @ 0x280
lnc1 @ 0x300

moved le0 into sorted sequence.


7745 10-Apr-1995 phk

lnc0 is @ 0x300
lnc1 is @ 0x280


7731 10-Apr-1995 phk

Changes to make FreeBSD use a CDROM as rootdev, for installation purposes.
If "BOOTCDROM" is defined, you get this pretty special case stuff.


7693 09-Apr-1995 dg

Cosmetic changes.


7681 08-Apr-1995 phk

Move default address of lnc0 to 0x300. Luigi Rizzo said that his card
cannot even go below 0x300...


7680 08-Apr-1995 joerg

Implement a simple hook (or hack?) to allow graphics device console
drivers to protect DDB from being invoked while the console is in
process-controlled (i.e., graphics) mode.

Implement the logic to use this hook from within pcvt. (I'm sure
Søren will do the syscons part RSN).

I've still got one occasion where the system stalled, but my attempts
to trigger the situation artificially resulted int the expected
behaviour. It's hard to track bugs without the console and DDB
available. :-/


7660 08-Apr-1995 phk

Added the "eg0" interface driver for the 3Com "3c505" or "etherlink/+"
card. This is the braindamaged card with the 80186 CPU on it. It is
slow, probably not very good after all, but hey, if you have one lying
around doing nothing anyway...

Added the "zp0" driver to GENERIC.


7645 06-Apr-1995 ache

Print "on isa" for devices with port==0 per Bruce suggestion


7644 06-Apr-1995 rgrimes

Output the CPU features line during the probe on a seperate line, for
folks with lots of features the output use to wrap and look ugle.

Reviewed by: phk


7624 04-Apr-1995 ache

Print "on motherboard" for isa? devices with id_iobase == 0


7490 30-Mar-1995 dg

Made pmap_testbit a static function.


7480 30-Mar-1995 rgrimes

Submitted by: Mahesh Neelakanta <mahesh@gcomm.com>

Change I/O address of Intel EtherExpress driver (ix0) from 0x280 to
0x300.


7430 28-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) that I didn't notice when I fixed
"all" such warnings before.


7403 26-Mar-1995 dg

Removed declaration of pmap_changebit()...it is no longer exported.

Submitted by: John Dyson


7402 26-Mar-1995 dg

Changed pmap_changebit() into a static function as it always should
have been.

Submitted by: John Dyson


7345 25-Mar-1995 swallace

Do a printf("\n") after all conditional printfs have been done so that
a newline is always done. Remove \n's from last conditonal printfs.


7254 22-Mar-1995 se

Correct pcibus_setup() to return as soon as one test succeeds.


7251 22-Mar-1995 se

Delete PCI PCI bridge simulator code ...

Submitted by: Wolfgang Stanglmeier <wolf@kintaro.cologne.de>


7244 22-Mar-1995 se

Remove spurious declaration of printf().

Submitted by: Michael Reifenberger <root@rz-wb.fh-sw.de>


7234 21-Mar-1995 se

New ISA specific PCI code.
Supports shared PCI interrupts.

Submitted by: Wolfgang Stanglmeier <wolf@kintaro.cologne.de>


7214 21-Mar-1995 dg

Added a new version of trap_pfault() that disallows kernel page faults
to the user address space unless pcb_onfault is set. The code is currently
commented out because iBCS2 and process debugging parts of the kernel
need to be changed/fixed first.


7213 21-Mar-1995 dg

Changed some #ifdef DIAGNOSTIC code that I added to be #ifdef DEBUG.


7170 19-Mar-1995 dg

Removed redundant newlines that were in some panic strings.


7135 18-Mar-1995 rgrimes

Add Intel EtherExpress16 (ix0) driver.
Reviewed by:
Submitted by:
Obtained from:


7103 17-Mar-1995 dg

Call dev_shutdownall() just after unmounting filesystems.


7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


7087 16-Mar-1995 se

Prepare for shared interrupts (required by the new PCI code that adds
support for PCI PCI bridges, e.g. found on 4ch. Ethernet cards).

Submitted by: Wolfgang Stanglmeier <wolf@kintaro.cologne.de>


7026 12-Mar-1995 amurai

Adding tunnel pseudo-device for Network Installation with User process PPP.
Reviewed by: amruai@spec.co.jp


6995 11-Mar-1995 phk

Moved bb stuff to support.s per Bruces suggestion.


6981 10-Mar-1995 phk

Add a dummy ___bb_init_func for BB profiling of the kernel.
To use this: recompile src/gnu/usr.bin/cc, compile your kernel. The
files you want to profile should be compiled with '-a -g'. "strip -x"
the kernel and run. You don't need to profile all files in the kernel.
My next commit is the program to extract the data from the running kernel.


6978 10-Mar-1995 dg

kmem_alloc() returns zero-filled memory; it isn't necessary to bzero()
it.


6977 10-Mar-1995 dg

Removed unnecessary routines vm_get_pmap() and vm_put_pmap().
kmem_alloc() returns zero filled memory, so no need to explicitly
bzero() it.


6949 07-Mar-1995 dg

Increased number of buffers to 1/12 of (page_count - 1024). This makes the
cache minimum closer to 10% in the usual case.


6912 05-Mar-1995 joerg

pcvt is still using the XSERVER option; document this.


6902 05-Mar-1995 wpaul

Changed the printf()s in npxattach a bit so you don't end up with
messages like this:

wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <ST506>
wd0: size unknown, using BIOS values: 615 cyl, 4 head, 17 sec, bytes/sec 512
npx0 at 0xf0-0xff irq 13 on motherboard
npx0: changing root device to wd0a
^^^^^^

The spurious 'npx0: ' pops up if you have a 386 with a 387 FPU.


6874 04-Mar-1995 dg

Removed obsolete vtrace() and reorganized a little.


6865 03-Mar-1995 dg

Preserve reverse link integraty while doing the queue insertion.


6846 03-Mar-1995 dg

Use copyout to install the sigframe rather than directly writing to the
user's stack.


6820 02-Mar-1995 jkh

Changes to incorporate the Matsushita CDROM driver (otherwise known as
the "Sound blaster CDROM").
Submitted by: Frank Durda IV <bsdmail@nemesis.lonestar.org>


6817 01-Mar-1995 dg

Use su/fubyte instead of directly touching the user's address space.


6807 01-Mar-1995 dg

Various changes from John and myself that do the following:

New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.


6806 01-Mar-1995 dg

Slight change to include file order to accommodate upcoming changes.


6734 26-Feb-1995 bde

Replace all remaining instances of `i386/include' by `machine' and fix
nearby #include inconsistencies.


6715 25-Feb-1995 phk

Change EISA size to 256 instead of 4096.
Neither are correct, but 256 does least damage.


6710 25-Feb-1995 phk

Read K&R and get the { } right :-)


6708 25-Feb-1995 phk

I belive I finally got the "on eisa" right.

| if (!(isdp->id_iobase & 0xf300)) {
| printf(" on motherboard\n");
| } else if (isdp->id_iobase >= 0x1000 &&
| !(isdp->id_opbase & 0x300)) {
| printf (" on eisa slot %d\n",
| isdp->id_iobase >> 12);
| } else {
| printf (" on isa\n");
| }
| }

Based on info in "The undocumented PC" p.165


6706 25-Feb-1995 se

Keep PCI_CONF_MODE in a safe place for later reference, if #defined.

Reviewed by: se
Submitted by: seb@erix.ericsson.se (Sebastian Strollo)


6664 23-Feb-1995 bde

Submitted by: seb@erix.ericsson.se (Sebastian Strollo)

Remove over-cautious early fnop() synchronization. It caused the probe to
hang on systems without an FPU.


6579 20-Feb-1995 dg

Use of vm_allocate() and vm_deallocate() has been deprecated.


6547 18-Feb-1995 wpaul

Do away with 'options SWAP_GENERIC' once and for all: I get ill
just thinking about it.

Two changes need to be made to allow 'config kernel swap generic' to
work properly without requiring any compile-time flags:

/usr/src/usr.sbin/config/mkswapconf.c: we need to define a dummy stub
for the setconf() function to replace the one in swapgeneric.c that
isn't available in non-generic configurations.

/usr/src/sys/i386/i386/autoconf.c: the -a boot flag causes setroot()
to be skipped and lets setconf() prompt the user for a root device.
If you skip setroot() in a non-generic kernel, you could get severely
hosed. To avoid this, we silently ignore the -a flag if rootdev != NODEV.
(rootdev is always initialized to NODEV in swapgeneric.c, so if
we find that rootdev is something other than NODEV, we know we're
not using a generic configuration.)


6535 17-Feb-1995 bde

Undo the busy latch changes made in the previous revision. They broke
some 386/387 systems.

Don't print the IRQ number twice in the boot diagnostics.


6512 17-Feb-1995 phk

This is the latest version of the APM stuff from HOSOKAWA, I have looked
briefly over it, and see some serious architectural issues in this stuff.

On the other hand, I doubt that we will have any solution to these issues
before 2.1, so we might as well leave this in.

Most of the stuff is bracketed by #ifdef's so it shouldn't matter too much
in the normal case.

Reviewed by: phk
Submitted by: HOSOKAWA, Tatsumi <hosokawa@mt.cs.keio.ac.jp>


6503 16-Feb-1995 bde

Fix syntax errors in #ifdefed out code.


6461 15-Feb-1995 joerg

Include three lines about the pcvt console driver, so we don't ever need
a different config file for it.


6439 15-Feb-1995 dg

Use proc0's proc struct rather than curproc's when calling sync.


6423 15-Feb-1995 dg

Killed the pmap_use_pt and pmap_unuse_pt prototypes as they are now in
machine/pmap.h.


6380 14-Feb-1995 sos

First attempt to run linux binaries. This is only the changes needed to
the generic kernel. The actual emulator is a separate LKM. (not finished
yet, sorry).
Submitted by: sos@freebsd.org & sef@kithrup.com


6377 14-Feb-1995 phk

Removed a YF comment.


6369 14-Feb-1995 phk

Whoops! back out last commit partly.


6368 14-Feb-1995 phk

YFfix.


6367 14-Feb-1995 phk

susword -> systm.h


6354 14-Feb-1995 phk

Yves has sent us a ~600 Kb patch, which shuts up gcc entirely for the
entire kernel.
Unfortunately we didn't send him a copy of the style guide before he did it.
I'm trying to find all the benign and downright sound bits and will commit
them without any other explanation than "YF fix" if they are merely cosmetic.

Reviewed by: phk
Submitted by: yves@dutncp8.tn.tudelft.nl (Yves Fonk)


6327 12-Feb-1995 dg

Carefully choose the low limit for number of buffers to acheive the best
performance on small memory machines.


6325 12-Feb-1995 dg

Fixed a bogus comment and made a stylistic change (testl instead of orl
to test for zero).


6308 11-Feb-1995 phk

Intels App Note AP-485 applied.
We will now tell a good deal more about the CPU if Intel made it.

What is a i486DX2 Write-Back Enhanced CPU ?


6301 10-Feb-1995 dg

Changed extended memory test so that it's non-destructive and not a
complete test (it never was "complete", which is why it was bogus). Now
only a single longword is checked in each page.


6299 10-Feb-1995 dg

Removed obsolete and unused vmtime() function.


6297 10-Feb-1995 dg

Removed unnecessary check for pr_scale in the AST/OWEUPC case.


6296 10-Feb-1995 dg

Check P_PROFIL flag for profiling rather than pr_scale as it makes more
sense.


6281 09-Feb-1995 se

Initialisation of interrupt masks changed.

Reviewed by: se
Submitted by: wolf (Wolfgang Stanglmeier)


6270 09-Feb-1995 jkh

Add PPP to the generic kernel. Now that Poul has made us all this space,
maybe I can get us back into the slip/ppp game without having to tell users
to reconfigure their kernels all the time! :)


6126 02-Feb-1995 dg

Mostly cosmetic changes. Use KERNBASE instead of UPT_MAX_ADDRESS in
some comparisons as it is more correct (we want the kernel page tables
included).
Reorganized some of the expressions for efficiency.
Fixed the new pmap_prefault() routine - it would sometimes pick up the
wrong page if the page in the shadow was present but the page in object
was paged out. The routine remains unused and commented out, however.
Explicitly free zero reference count page tables (rather than waiting
for the pagedaemon to do it).

Submitted by: John Dyson


6105 01-Feb-1995 se

Reviewed by: se
Submitted by: wolf (Wolfgang Stanglmeier)
PCI specific code moved to /sys/pci.


6104 01-Feb-1995 se

Reviewed by: se
Submitted by: wolf (Wolfgang Stanglmeier)
New ISA dependend file for PCI bus support.
Replaces sys/i386/pci/pcibios.c.


6008 29-Jan-1995 bde

Fix disassembly of `bt[crs] $Ib,E'.


5999 29-Jan-1995 ats

Correct a name of one structure member in the sigaltstack structure.
Now it matches the man page and also the only other commercial implementation
i have found so far ( Solaris 2.x).
Changed the name from ss_base to ss_sp.


5981 28-Jan-1995 jkh

Add soundblaster CD to generic kernel. Hope this doesn't run us out
of space!


5952 27-Jan-1995 phk

New and far better NCR5380/NCR53400 scsi-driver.

Handles at least Trantor T130 and ProAudioSpectrum adapters.
The pas driver has consequently been removed.
This driver can be configured without without interrupts.

Manpage to follow when PAS16 has been edited in.

Reviewed by: phk
Submitted by: Serge Vakulenko, <vak@cronyx.ru>


5943 26-Jan-1995 dg

Fix from Doug Rabson for a panic related to not initializing the kernel's
PTD.

Submitted by: John Dyson


5921 26-Jan-1995 ache

Remove FAT_CURSOR, it is already non-existent during several
last syscons versions


5916 26-Jan-1995 dg

Comment out pmap_prefault for the time being (perhaps until after 2.1).
The object_init_pt routine is still enabled and used, however, and this
is where most of the 'pre-faulting' performance improvement comes from.


5914 26-Jan-1995 dg

Make sure that the pages being 'pre-faulted' are currently on a queue.


5910 25-Jan-1995 dg

Be a bit less fast and loose about setting non-cacheablity of pages.


5908 25-Jan-1995 bde

Load the kernel symbol table in the boot loader and not at compile time.
(Boot with the -D flag if you want symbols.)

Make it easier to extend `struct bootinfo' without losing either forwards
or backwards compatibility.

ddb_aout.c:
Get the symbol table from wherever the loader put it.
Nuke db_symtab[SYMTAB_SPACE].

boot.c:
Enable loading of symbols. Align them on a page boundary. Add printfs
about the symbol table sizes.
Pass the memory sizes to the kernel.
Fix initialization of `unit' (it got moved out of the loop).
Fix adding the bss size (it got moved inside an ifdef).
Initialize serial port when RB_SERIAL is toggled on.
Fix comments.
Clean up formatting of recently added code.

io.c:
Clean up formatting of recently added code.

netboot/main.c, machdep.c, wd.c:
Change names of bootinfo fields.

LINT:
Nuke SYMTAB_SPACE.
Fix comment about DODUMP.

Makefile.i386:
Nuke use of dbsym.
Exclude gcc symbols from kernel unless compiling with -g.
Remove unused macro.
Fix comments and formatting.

genassym.c:
Generate defines for some new bootinfo fields. Change names of old ones.

locore.s:
Copy only the valid part of the `struct bootinfo' passed by the loader.
Reserve space for symbol table, if any.

machdep.c:
Check the memory sizes passed by the loader, if any. Don't use them yet.

bootinfo.h:
Add a size field so that we can resolve some mismatches between the loader
bootinfo and the kernel boot info. The version number is not so good for
this because of historical botches and because it's harder to maintain.
Add memory size and symbol table fields. Change the names of everything.

Hacks to save a few bytes:

asm.S, boot.c, boot2.S:
Replace `ouraddr' by `(BOOTSEG << 4)'.

boot.c:
Don't statically initialize `loadflags' to 0. Disable the "REDUNDANT"
code that skips the BIOS variables. Eliminate `total'. Combine some
more printfs.

boot.h, disk.c, io.c, table.c:
Move all statically initialzed data to table.c.

io.c:
Don't put the A20 gate bits in a variable.


5897 25-Jan-1995 jmz

Changed address of the game controller to 0x201 (was 0x200)
joy.c: joystick driver


5862 24-Jan-1995 paul

is to lnc changes


5838 24-Jan-1995 dg

Moved various pmap 'bit' test/set functions back into real functions; gcc
generates better code at the expense of more of it.

Submitted by: John Dyson


5837 24-Jan-1995 dg

Changed buffer allocation policy (machdep.c)
Moved various pmap 'bit' test/set functions back into real functions; gcc
generates better code at the expense of more of it. (pmap.c)
Fixed a deadlock problem with pv entry allocations (pmap.c)
Added a new, optional function 'pmap_prefault' that does clustered page
table preloading (pmap.c)
Changed the way that page tables are held onto (trap.c).

Submitted by: John Dyson


5771 21-Jan-1995 bde

Don't use mi_switch() to terminate cpu_exit(). Calling it just happened to
work (mi_switch() counted the last timeslice again but this didn't affect
the exiting process' rusage because the rusage has already been finalized).

Remove stale comment.


5770 21-Jan-1995 bde

Remove unused definitions of vm statistics counters. Most of the
counting is now done in C. There are still about 100 unused
definitions for other things.


5769 21-Jan-1995 bde

Don't count context switches here, they are already counted in mi_switch().


5722 19-Jan-1995 ats

Submitted by: Bruce Evans
Put in the much shorter and cleaner version for the calibrate_cycle_counter
for the Pentium that Bruce suggested. Tested here on my Pentium and
it works okay.


5675 17-Jan-1995 bde

The %eflags checking introduced in the previous commit was too zealous.
sigreturn() sometimes failed for ordinary returns from signal handlers.
Failures of ordinary returns "can't happen" and are badly handled.
"Temporary" fix: allow users to corrupt PSL_RF. This is fairly
harmless. A correct fix would involve saving the old %eflags (and
perhaps the old segment registers) where the user can't get at them.


5642 15-Jan-1995 dg

Fixed some page table reference count problems; these changes may not be
complete, but should be closer to correct than before.


5603 14-Jan-1995 bde

Fix security holes in sigreturn(), ptrace() and procfs. sigreturn()
attempted to check for insecure and fatal eflags and segment
selectors, but missed many cases and got the IOPL check back to
front. The other syscalls didn't check at all.

sys_process.c, machdep.c:
Only allow PT_WRITE_U to write to the registers (ordinary and FP).

psl.h, locore.s, machdep.c:
Eliminate PSL_MBZ, PSL_MBO and PSL_USERCLR. We are not supposed
to assume anything about the reserved bits. Use PSL_USERCHANGE
and PSL_KERNEL instead. Rename PSL_USERSET to PSL_USER.

exception.s:
Define a private label for use by doreti when returning to user
mode fails.

machdep.c:
In syscalls, allow changing only the eflags that can be changed on
486's in user mode (no longer attempt to allow benign IOPL changes;
allow changing the nasty PSL_NT; don't allow changing the i586
bits).

Don't attempt to check all the cases involving invalid selectors
and %eip's. Just check for privilege violations and let the invalid
things cause a trap.

procfs_machdep.c:
Call the ptrace register functions to do all the work for reading
and writing ordinary registers and for single stepping.

trap.c:
Ignore traps caused by PSL_NT being set. Previously, users could
cause a fatal trap in user mode by setting PSL_NT and executing an
iret, and a fatal trap in kernel mode by setting PSL_NT and making
a syscall. PSL_NT was cleared too late and not in enough modes to
fix the problem.

Make all traps in user mode (except T_NMI) nonfatal.

Recover from traps caused by attempting to load invalid user
registers in doreti by restarting the traps so that they appear to
occur in user mode.
---

Fix bogons that I noticed while fixing the above:

psl.h:
Fix some comments.

Uniformize idempotency ifdef.

exception.s, machdep.c:
Remove rsvd[0-14]. rsvd0 hasn't been reserved since the 486 came
out. Replace rsvd0 by `align'. rsvd[0-11] used wrong (magic
non-unique) trap numbers. Replace rsvd[1-14] by rsvd.

locore.s:
Enable alignment check flag on 486's and 586's.

machdep.c:
Use a better type for kstack[].

Use TFREGP() to find the registers.

Reformat ptrace functions from SEF to something closer to KNF.

procfs_machdep.c:
The wrong pointer to the registers got fixed as a side effect.

Implement reading and writing of FP registers.

/proc/*/*regs now work (only) for processes that are in memory.

Clean up comments.

trap.c, trap.h:
Remove unused trap types.


5595 14-Jan-1995 jkh

Remove bogus scd0 driver - I should have looked at LINT first, anyway.


5594 14-Jan-1995 bde

Enable define of CR0_AM to prepare for implementing alignment checking.

Uniformize idempotency ifdef.


5593 14-Jan-1995 bde

Declare a real `struct fpreg' to prepare for implementing reading and
writing of FP regs for procfs.

Uniformize idempotency ifdef.


5592 14-Jan-1995 bde

Remove reference to impossible trap type T_KDBTRAP. We don't support
watchpoints.

Uniformize idempotency ifdef.


5588 14-Jan-1995 bde

Eliminate T_KDBTRAP, which will soon go away. It was only used for an
unreachable case label in kdb_trap().

Use the correct case labels in kdb_trap() so that normal ddb entry doesn't
print a message.

Change all printf's to db_printf's. Now you can put a breakpoint at printf,
and ddb entry messages don't spam the syslog output.

Cosmetic:

Use ISPL() instead of magic numbers.

Don't compile the unused function kdb_kbd_trap().

Improve some asms.

Print the arg to Debugger().


5580 14-Jan-1995 dg

Add missing object_lock/unlock.


5577 14-Jan-1995 jkh

Put UCONSOLE back - I was wrong, it's still used in one last place.
Submitted by: ollivier


5563 13-Jan-1995 gibbs

Add in aic7770.c (EISA/VL Adaptors) and aic7870.c (PCI adaptor) dependancies
for the ahc driver.


5546 12-Jan-1995 jkh

1. Remove UCONSOLE. This appears to be well and truly dead (unless it's
hiding someplace in /sys I can't find).
2. Remove NCONS. Soren's latest changes make it a no-op.


5516 11-Jan-1995 jkh

Change GENERIC to SWAP_GENERIC for now. I need the GENERIC kernel to
build by default again! When the furor subsides, maybe something better
can be done, but..


5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


5431 07-Jan-1995 ats

Work around a compiler bug in gcc2.6.3 in handling (long long) variables and
shifting. Also correct the original code as Garrett noticed it in mail.
Leave the mishandled code in to use it later if future versions of gcc
are correct. The code was part of the calibrate_cyclecounter routine to
get the speed of the pentium chip.


5428 07-Jan-1995 jkh

Gunther Schadow <gusw@fub46.zedat.fu-berlin.de>'s
driver for the Genius GS-4500 hand scanner.
Submitted by: gusw@fub46.zedat.fu-berlin.de


5413 05-Jan-1995 se

Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Reviewed by: <wollman>
First hooks and defines for the ISDN driver,
that soon will see the light ...


5378 04-Jan-1995 dg

Corrected the list of volatile registers for outsb, outsw, and outsl.
This bug caused my ethernet driver to break, among other things no doubt.


5351 03-Jan-1995 bde

Use sufficient parentheses in macros.

Remove bogus input operands for fnsave(), fnstcw() and fnstsw().

Change all fwait's to fnop's. This might help avoid hardware bugs.
Wait after fninit with an fnop. This should be safer now.

Fix some spelling and formatting errors.

Use natural sizes for control and status words (u_short, promotes to int).

Don't clobber the SWI_CLOCK_MASK bits in npx0_imask when using IRQ13.

Set the devconf state correctly (always busy, if configured). Improve
code for npx_registerdev() a little (gcc can't keep id->id_unit in a
register for some reason). Don't register a nonexistent npx device.

Print a useful message in npxattach() again (delete references to errors
and not the whole message). Don't print "387 emulator" if there is no
emulator in the kernel.

Use %p for pointers in error messages.

Don't clobber the FPU state when there is an FPU exception. Just clear
the exception flags (after saving the flags as before). This allows
debuggers and SIGFPE handlers to look at the full exception state.
SIGFPE handlers should normally return via longjmp(), which restores a
good FPU state (as before). Returning from a SIGFPE handler may leave
the FPU in the wrong state (as before).

Clear the busy latch _after_ clearing the exception flags so that there
is less chance of getting a bogus h/w interrupt for a control operation.

Clear the saved exception status word when the next FPU instruction is
excuted so that it doesn't stick around until the next exception.

Clear the busy latch after fnsave() in npxsave() in case it was set when
npxsave() was called.


5350 03-Jan-1995 bde

Replace sv_ex_tw by padding (it is no longer used; the tag word in sv_env
is valid).

Expand comment about bogus padding for emulators.

Update prototpe for npxinit().


5321 31-Dec-1994 jkh

From Bill Paul:

- /sys/i386/i386/swapgeneric.c is just plain broke. But fear not, for I
have unbroken it. One thing that swapgeneric.c does is walk through the
list of configured devices searching for a boot device. The only easy
way to accomplish this in 2.0 is to use Garret Wollman's kern_devconf
stuff. *BUT*, the head of the kern_devconf linked list (dc_list) is declared
static in /sys/kern/kern_devconf.c. This means that swapgeneric.c can't
see it at link time. I had to remove the 'static' keyword to get around
this little problem. I hope this doesn't break anything anywhere.

*Furthermore,* there's a small matter of making the call to setconf()
in swapgeneric.c disappear when 'config kernel swap generic' isn't used.
You could change /sbin/config to create a dummy setconf() function in
swapkernel.c, but that seems messy somehow. (It's also someting of an
'it isn't broken, why are you fixing it' situation.) My solution was to
do what the NetBSD people did and put an #ifdef GENERIC around the call
to setconf(). If your kernel is called GENERIC or you define 'options
GENERIC,' then you can use 'config kernel swap generic' and it'll work.

That aside, the upshot is that: a) swapgeneric.c actually works, and
and b) the -a boot flag now works as well. If you boot with -a, as in
"Boot: wd(0,a)/kernel -a" you will be presented with a 'root device?'
prompt after the autoconfig phase, at which point you can specify what
device you want mounted as root. Regrettably, you can't specify an NFS
filesystem. Yet. Three files are affected: /sys/i386/i386/swapgeneric.c,
/sys/i386/i386/autoconf.c and /sys/kern/kern_devconf.c.

Submitted by: wpaul


5291 30-Dec-1994 bde

icu.s:
Move definition of `stat_imask' to clock.c.

clock.c:
Rename `rtcmask' to `stat_imask' and export it. Rename `clkmask' to
`clk_imask' for consistency.

Only calculate TIMER_DIV(hz) once.

Merge debugging and "garbage" code to produce debugging code and format the
output better.

Make writertc() static inline and use it everywhere. Now all accesses to
the clock registers go through rtcin() and writertc().

Move rtc initialization to cpu_initclocks().

Merge enablertclock() with cpu_initclocks() and remove enablertclock().
The extra entry point was just a leftover from 1.1.5.


5220 24-Dec-1994 bde

Obtained from: 1.1.5

Fix single-stepping of emulated FPU instructions.

Don't panic if an FPU instruction is attempted but there is no FPU
and no FPU emulator is configured.


5153 18-Dec-1994 dg

Move page_unhold's in pmap_object_init_pt down one line to gard against
a potential race condition.


5148 18-Dec-1994 jkh

Add a 'vn' to GENERIC


5144 18-Dec-1994 dg

Check for PG_FAKE too in pmap_object_init_pt.


5143 18-Dec-1994 dg

Add two more page table pages to keep 64MB machines happy.


5119 16-Dec-1994 phk

Remove sd1-sd3 & st1, now that we can autoallocate them.

fix the vn driver in LINT. It autoallocates too.

Reviewed by: phk
Submitted by: rgrimes


5037 11-Dec-1994 dg

Removed inappropriate comment.


5036 11-Dec-1994 dg

Add additional comment.


5035 11-Dec-1994 dg

Fix bogus comment.


4931 03-Dec-1994 bde

Disable CLKF_BASEPRI() again. I forgot to edit an unwanted change out of
the diffs for the previous commit.


4929 03-Dec-1994 bde

i386/exception.s,
Keep track of interrupt nesting level. It is normally 0
for syscalls and traps, but is fudged to 1 for their exit
processing in case they metamorphose into an interrupt
handler.

i386/genassym.c;
Remove support for the obsolete pcb_iml and pcb_cmap2.

Add support for pcb_inl.

i386/swtch.s:
Fudge the interrupt nesting level across context switches and in
the idle loop so that the work for preemptive context switches
gets counted as interrupt time, the work for voluntary context
switches gets counted mostly as system time (the part when
curproc == 0 gets counted as interrupt time), and only truly idle
time gets counted as idle time.

Remove obsolete support (commented out and otherwise) for pcb_iml.

Load curpcb just before curproc instead of just after so that
curpcb is always valid if curproc is. A few more changes like
this may fix tracing through context switches.

Remove obsolete function swtch_to_inactive().

include/cpu.h:
Use the new interrupt nesting level variable to implement a
non-fake CLF_INTR() so that accounting for the interrupt state
works.

You can use top, iostat or (best) an up to date systat to see
interrupt overheads. I see the expected huge interrupt overheads
for ISA devices (on a 486DX/33, about 55% for an IDE drive
transferring 1250K/sec and the same for a WD8013EBT network card
transferring 1100K/sec). The huge interrupt overheads for serial
devices are unfortunately normally invisible.

include/pcb.h:
Remove the obsolete pcb_iml and pcb_cmap2. Replace them by
padding to preserve binary compatibility.

Use part of the new padding for pcb_inl.

isa/icu.s:
isa/vector.s:
Keep track of interrupt nesting level.


4829 27-Nov-1994 phk

I made a syntax error yesterday.
Submitted by: John Capo


4819 26-Nov-1994 phk

Set the bootverbose if so desired.
if (bootverbose)
Print the geometries the bios passes to us (through the bootblocks).


4818 26-Nov-1994 phk

Declare "extern int bootverbose", so that device-drivers and others
easily can find it.


4682 19-Nov-1994 phk

I just learned that isa.h is included in assembler files too...


4681 19-Nov-1994 phk

add
extern u_int atdevbase; /* offset in virtual memory of ISA io mem */
here for a moment, to get it into BETA


4650 18-Nov-1994 jkh

Put ie0 above ep0. Otherwise, the ie0 probe clobbers it.
Submitted by: gibbs


4648 18-Nov-1994 gibbs

IO_EISASIZE should be 1 slot, not 2.


4611 18-Nov-1994 jkh

Get IO_EISASIZE properly defined now.
Reviewed by:
Submitted by:
Obtained from:


4600 18-Nov-1994 phk

Grap the bootinfo structure the bootblock passes us.


4569 17-Nov-1994 gibbs

New device-driver entries for the aic7770 driver. These use new features
of config so YOU MUST RECOMPILE CONFIG. Modifying config was the cleanest
solution to integrating this driver into the tree which will become more
obvious in the next commit.


4517 16-Nov-1994 dg

Allow MAXMEM to be larger than the detected physical memory. This change
was supposed to have already been made, but got botched somewhere.
Don't clobber the last page of memory (where the message buffer is). Some
BIOS don't gratuitously wipe it out on reboot.


4514 15-Nov-1994 bde

Add prototype for Debugger().


4501 15-Nov-1994 bde

Make gdt_segs[] public again for APM.

Make ldt[] public again and restore currentldt and _default_ldt for
USER_LDT.


4479 14-Nov-1994 bde

Rewrite almost everything.

Alphabetize.

Write all i/o functions in sleep so that we don't use anything from
NetBSD.

Restore the correct type of u_int for ports. This saves a whole cycle
per i/o on 486's.

Change `inline' back to __inline to avoid compiler warnings with
-Wreally-all.

Don't implement bdb() unless BDE_DEBUGGER is defined. Declare bdb_exists
outside the function to avoid hundreds of compiler warnings.

Let the compiler pick the register in asms if possible.

Implement ffs() using inline asm(). gcc provides a slightly different
one. It was broken in gcc-2.4.5 but works now. Declaring a correct
version inline ensures getting a correct version. FreeBSD-1.1.5 has
an slow inline version but FreeBSD-2.0 has a library version (which
probably never gets used).

Do inb() and outb() without using %edx for constant ports below 0x100.

Remove casts to the same type in queue functions.

Declare prototypes for everything implemented i386/*.s and also for
everything that is normally implemented as an inline here (I don't
like the current complete dependency on gcc). Ifdef out the prototypes
that are declared elsewhere. THere should be a separate header to
declare things implemented in i386/*.s, but then it would be harder
to override declarations with inlines.

${UII}


4478 14-Nov-1994 bde

Log processes that exit with an masked npx exception that would trap
with the current default exception (un)mask. There should be no such
processes unless you change the mask. Someday the mask should be
changed to the IEEE default of everything masked. The npx state
gets saved so that it can be checked and this may have the side effect
of fixing a bug that was reported for 1.1.5. (npx exceptions may
sometimes leak across exits and clobber another process. I can't see
how this can happen.)

Get some missing/wrong declarations from headers now that the headers
have them.


4476 14-Nov-1994 bde

Oops, the previous commit got the diff for the log message instead of
the following.

Move declarations to and from <machine/segments.h>. Make segment stuff
static if possible.

Remove unused (although initialized) global variables _default_ldt,
currentldt, _gsel_tss (rename the latter to the auto variable
gtel_tss).

Use "correct" and consistent types for interrupt handlers.

Remove a mailing address from the code.

Fix type mismatches found by adding prototypes.


4475 14-Nov-1994 bde

(Bogus several hundred line diff for a log message deleted. See rev 1.91
for the intended log message. -DG)


4474 14-Nov-1994 bde

Move declarations of atdevbase and rtcin() to cpufunc.h (a less wrong
place).

Fix spelling error.

Uniformize idempotency ifdef.


4473 14-Nov-1994 bde

Remove 1.5+K of bloat for unused idt entries.

Partly support BDE_DEBUGGER. Still broken by conflict with APM. Does
nothing if BDE_DEBUGGER is not defined.

Clean up prototypes and data declarations. Declare most of the segment
functions that are implemented in support.s. Make data private in
machdep.c if possible.

Parenthesize expressions in macros properly!

${Uniformize idempotency ifdef}.


4471 14-Nov-1994 bde

Declare inline functions as __inline and with new-style parameter lists
to avoid compiler warnings.

Clean up prototypes: alphabetize; don't use redundant `extern' or
meaningless `extern inline'.

Uniformize idempotency ifdef.


4463 14-Nov-1994 bde

Undo a previous change. <sys/disklabel.h> was broken, not these files.


4436 13-Nov-1994 gibbs

Add ep0 line to kernel config files.


4434 13-Nov-1994 nate

Add Matt Thomas' le0 DEPCA driver to the GENERIC kernel. This works
but I can't test to see if it walks on other ethernet drivers. Can the
install folks add this driver to the install script?


4430 13-Nov-1994 dg

Nuked ed2 - it was added for the common 16bit card case where the
irq is 10. This is auto-sensed/configured now in the 'ed' driver.


4402 12-Nov-1994 jkh

Add back ed2. Harrumph..


4396 12-Nov-1994 ache

Revision 1.6 fix was lost: don't write 0 to RTC_DIAG


4385 12-Nov-1994 jkh

ed2 was actually an impossible entry to reach!


4370 12-Nov-1994 phk

Make a kernel sans FFS possible.


4353 11-Nov-1994 dg

Added 'de' ethernet driver.


4350 10-Nov-1994 jkh

Enable floppy-tape support.


4341 10-Nov-1994 ache

Use adjkerntz into inittodr too (for APM stuff)


4319 09-Nov-1994 bde

Don't declare DELAY() here. Callers should include <machine/clock.h>.


4263 08-Nov-1994 jkh

Add back ze0 driver; somebody took it out of _both_ LINT and GENERIC,
kinda hosing the laptop folks.


4221 07-Nov-1994 phk

Added a kernel variable, "dodump" defaulting to zero, which disables dumps.
Somebody should make a mib variable for it.
Just now it is pointless to dump the kernel, since we have nothing which
can read the dump.
Furthermore is should never be the default to dump.
options DODUMP
will enable dumps.


4217 06-Nov-1994 phk

Initialize %fs and %gs from %ds.
This seems to stabilize the APM-bios on my Gateway Handbook, and it makes
sense in general too.


4201 06-Nov-1994 dg

Do a better job at preparing registers for the new process in setregs()
by setting them all to a known state.


4193 06-Nov-1994 bde

Nuke the losing version of microtime. The assembler version now works
for all reasonable HZ's. HZ > 1000 doesn't work because of sloppy
conversions in hzto() (division by (tick / 1000) == 0). This was
fixed in 1.1.5.

Eliminate some extern declarations by including the appropriate header
files that now contain appropriate declarations.


4188 06-Nov-1994 bde

Public function declarations moved to <machine/npx.h>.


4180 05-Nov-1994 bde

Maintain a new variable `timer0_overflow_threshold' so that microtime()
doesn't have to calculate it every call.

Rename `timer0_prescale' to `timer0_prescaler_count' and maintain it
correctly. Previously we lost a few 8253 cycles for every "prescaled"
clock interrupt, and the lossage grows rapidly at 16 KHz. Now we
only lose a few cycles for every standard clock interrupt.

Rename `*_divisor' to `*_max_count'.

Do the calculation of TIMER_DIV(rate) only once instead of 3 times each
time the rate is changed.

Don't allow preposterously large interrupt rates. Bug fixes elsewhere
should allow the system to survive rates that saturate the system, however.

Clean up declarations.

Include <machine/clock.h> to check our own declarations.


4175 05-Nov-1994 bde

Declare all functions exported by the npx driver.

Uniformize idempotency ifdefs.


4174 05-Nov-1994 bde

Declare the full uglyness of the interfaces to the clock driver (except
things declared in machine-independent files).


4173 05-Nov-1994 bde

Disable the direct call from hardclock() to softclock(). Support
for it is incomplete and buggy. There is no problem unless Xintr0()
is reentered or should be reentered, but high clock interrupt
frequencies for pcaudio cause Xintr0() to be reentered (or clock
ticks to be lost when Xintr0() should have been reentered but
wasn't), and we lose little by delaying the call to softclock().

Move declarations related to the clock driver to clock.h.

Move declarations related to the npx driver to npx.h.

Clean up the remaining declarations.


4157 05-Nov-1994 jkh

Argh! Missing quotes.


4156 05-Nov-1994 jkh

We need CD9660 and MSDOS filesystems built-in if the floppy is to have
a hope of getting at these types of filesystems without dragging all
the LKM stuff in.


4131 04-Nov-1994 jkh

__386BSD__ -> __FreeBSD__

I know that many of these entries are bogus and need to be revisited,
but let's get the tree working again for now and then do a pass through
looking at all the __FreeBSD__ entries, shall we?


4118 03-Nov-1994 jkh

Eliminate USERCONFIG. This option is now standard.


4116 03-Nov-1994 jkh

Unconditionalize USERCONFIG. Uh, thanks, David.


4109 03-Nov-1994 jkh

Add extra id_enabled flag for userconfig to manipulate. If id_enabled
is FALSE, the device will not be probed. id_enabled is TRUE by default.


4106 03-Nov-1994 gpalmer

Cosmetic changes in comment at start (it's no longer a GENERICAH config
file!)


4064 01-Nov-1994 bde

Fix a very old, very stupid race clearing the mask bit for the current
interrupt. Other bits in imen and icu+1 are volatile.

INTREN() and INTRDIS() in icu.h need to be changed similarly.

Change #include's to 2.0 style.


4051 01-Nov-1994 ache

DMA automode patch, fix SB16 clicks
Submitted by: tim@cs.city.ac.uk


4038 01-Nov-1994 ache

Implement CPU_ADJKERNTZ in different way: call resettodr()
on writting this variable. adjkerntz pgm changes will follow.


4031 31-Oct-1994 joerg

Added hooks for an easy drop-in of the pcvt concole driver.
Don't panic:-), this is simple stuff just doing exactly the same as for syscons.
(files.i386 did already contain the necessary stuff.)


4014 30-Oct-1994 bde

Fix selector arg to match the (missing) prototype for sdtossd().
Cosmetic.

Return from trap() if trap_fatal() returns. trap_fatal() isn't
fatal if you have ddb. Returning from trap() is usually the right
thing to do and much better than falling through.


4013 30-Oct-1994 bde

Fix selector arg to match the (missing) prototype for ssdtosd().
Cosmetic.


4012 30-Oct-1994 bde

locore.s:
Build a dummy frame at the top of tmpstk to help debuggers trace the stack
when the system is idle.

swtch.s: idle():
Initialize the frame pointer so that debuggers don't try to trace a bogus
stack.

Load the frame pointer, load the stack pointer and switch out the old
stack in the unique order that never leaves one of the pointers pointers
invalid so that debuggers can trace idle(). Disabling interrupts
provides sufficient validity for normal operation, but debuggers use
(trace) traps.


3962 28-Oct-1994 jkh

From: fredriks@mcs.com (Lars Fredriksen)
...
It turns out that these files do not include <sys/dkbad.h> before
<sys/disklabel.h>.
Submitted by: fredriks


3940 27-Oct-1994 jkh

Julian Elischer's disklabel fixes.


3919 26-Oct-1994 bde

Fix compiler warnings.


3918 26-Oct-1994 bde

Move definition and initialization of video_mode_pointer to syscons.c.


3912 26-Oct-1994 jkh

Enable USERCONFIG and document it in LINT.


3907 26-Oct-1994 jkh

Invoke userconfig() if kernel compiled with options USERCONFIG and
-c flag used.


3889 26-Oct-1994 jkh

Fix two very minor nits, one of which caused a warning (no return type for
main).


3871 26-Oct-1994 phk

Fixed a couple of wrong printfs (too few arguments supplied). Also zapped
a couple of unused vars at the same time. Added a #include <sys/proc.h>
to isa.c while here anyway.


3869 25-Oct-1994 se

BEWARE: Interface change of register_intr() !

Changed the fifth parameter to register_intr() from u_int mask into
u_int *maskptr in preparation for new features (shared interrupts and
removable devices, eg. for PCMCIA).


3867 25-Oct-1994 se

BEWARE: Interface change of register_intr() !

Changed the fifth parameter to register_intr() from u_int mask into
u_int *maskptr in preparation for new features (shared interrupts and
removable devices, eg. for PCMCIA).


3861 25-Oct-1994 bde

Use the correct macro for deciding whether syscons' variables should
be accessed.

Remove some unused declarations and document a bogus one.


3846 25-Oct-1994 dg

Allow MAXMEM kernel option to indicate more memory than is detected; it
previously could only be used to limit the amount of memory.


3844 25-Oct-1994 dg

Restricted maximum bufpages to 1500; this is required for machines >64MB
of memory to work without running out of kernel VM (and increasing it to
even more than it is now (96MB) is out of the question. Changed bufpages
calculation to allocation a little less bufer cache (16% of mem-2MB instead
of 20%); this is simply a better figure for most systems.


3842 25-Oct-1994 dg

Moved initialization of tmpstk so that it immediately follows the kernel
text. Fixed rounding bug that caused the last page of kernel text to be
read/write instead of read-only. This is important now that tmpstk can
crash into it. Removed +4 bias of tmpstk because it screws up ddb's
ability to traceback correctly.


3836 24-Oct-1994 sos

Added sea0 - Seagate driver lines to config


3816 23-Oct-1994 wollman

Finished device configuration database work for all ISA devices (except `ze')
and all SCSI devices (except that it's not done quite the way I want). New
information added includes:

- A text description of the device
- A ``state''---unknown, unconfigured, idle, or busy
- A generic parent device (with support in the m.i. code)
- An interrupt mask type field (which will hopefully go away) so that
. ``doconfig'' can be written

This requires a new version of the `lsdev' program as well (next commit).


3795 22-Oct-1994 phk

Autoconf is the one to realize that we are booted disk-less and start the
ball rolling. locore is just moving some data from the boot-program.


3794 22-Oct-1994 phk

NFS-diskless works. Look in sys/i386/boot/netboot for some of the
explanation. More doc needed, but not hard to do, if you want to.

A big hand to Martin Renters for the netboot program !

Anybody want to compete on who can "make world" in the shortest
amount of time ? I have 127 i486DX2/66 and 5 P60's I can use
now. And 3 times 66 Gb file servers to support it... :->

Anyway, NFS will be standard in the GENERIC kernel now, so that
people can use the bin-tarball to set up shop.


3744 21-Oct-1994 wollman

Make my ALLDEVS kernel compile (basically, LINT minus a lot of options).


3732 20-Oct-1994 phk

According to a quick reading of sources, one experiment and Bruce's word:
aha, ahb and bt all on "irq ?" now.


3729 20-Oct-1994 phk

Bruce told me to: Make uha0 use irq ?


3728 20-Oct-1994 phk

Peter Dufaults comconsole changes.

Submitted by: Peter Dufault


3726 19-Oct-1994 bde

Don't check for IRQ conflicts before probing the device, so that
drivers have a chance to change their IRQ before it is checked.
This was implemented in revision 1.21 and broken in revision 1.26.
Drivers that can change their IRQ should probably be configured
with "irq ?".


3722 19-Oct-1994 bde

Fix the test for the code segment being the usual one. Unusual code
segments can still cause panics. Their pc is converted to 0 and 0
is only checked for in one place before use.


3713 19-Oct-1994 wollman

Add support for devconf to a large number of device drivers, and do
the right thing in dev_goawayall() when kdc_goaway is null.


3705 19-Oct-1994 wollman

isa.c isa_device.h: declare & define {e,}isa_{in,ex}ternalize().
fd.c: register devices and implement disk stats.
wd.c: fix disk stats and call isa_externalize() as appropriate.


3703 19-Oct-1994 wollman

Implement disk_externalize().


3682 18-Oct-1994 ache

Remove CPU_COLORDISP, GIO_COLOR now exists


3670 17-Oct-1994 phk

isa_device.h: Added flag for sensitive HW. ed# seems to break if anything
else has been probed. This feature could go away again, if we can curb the
problem another way.

if_ed.c, syscons.c: Set the above flag. ed# because it needs it, syscons
because it looks stupid to "detect" the display you have already filled up
with text :-)

bt742a.c: Check bt_cmd() return-val during probe, thus failing on adaptec's.
Also silenced various printf's during the probe.

isa.c: Probe devices with the above flag set before the rest. Reduce the
number of "conflict" messages per device to one.

***
Please test the GENERIC-kernel now, if nobody can make it fail, GENERICAH
and GENERICBT has a finite and short life-expectancy...
***


3668 17-Oct-1994 phk

GENERIC is our new all singing and dancing kernel. Please report ASAP if
there is anything GENERICAH or GENERICBT can, which this one cannot.

MINI changed to reflect the SCSI-pecking-order.


3661 17-Oct-1994 ache

Ifdef color_display by NSC, pointed by Rod


3627 15-Oct-1994 ache

ADd CPU_COLORDISP sysctl to handle console display type


3625 15-Oct-1994 ache

CPU_COLORDISP sysctl added for console display type


3612 15-Oct-1994 dg

1) Some of the counters in the vmmeter struct don't fit well into the Mach VM
scheme of things, so I've changed them to be more appropriate. page in/ous
are now associated with the pager that did them. Nuked v_fault as the
only fault of interest that wouldn't be already counted in v_trap is a VM
fault, and this is counted seperately.
2) Implemented most of the remaining counters and corrected the counting of
some that were done wrong. They are all almost correct now...just a few
minor ones left to fix.


3513 11-Oct-1994 sos

Ouch, fixed bug in errno translation (ibcs2 support).


3502 10-Oct-1994 phk

minaddr #ifdef lost in previous commit. Sorry.


3495 10-Oct-1994 sos

Hmm, only translate errno when doing an actual return.

Reviewed by: sef@freefall.cdrom.com


3489 10-Oct-1994 phk

locore.s: Made the APM stuff depend on NAPM > 0 rather than a separate
"APM" macro.
machdep.c: Made the APM-descriptors unconditional.
Bruce: if these still conflict with your debugger, please put in a reservation
for your debugger. These three desc. can be anywhere, as long as they are
contiguous, so just move them as needed.


3488 10-Oct-1994 phk

Cosmetics. Added a prototype.


3476 09-Oct-1994 sos

Updated to convert errno return in syscall if conversion tabel present.


3451 09-Oct-1994 dg

Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.


3440 08-Oct-1994 phk

A couple of prototypes moved out from here.


3437 08-Oct-1994 phk

Added prototypes.


3436 08-Oct-1994 phk

db_disasm.c: Unused var zapped.
pmap.c: tons of unused vars zapped, various other warnings silenced.
trap.c: unused vars zapped.
vm_machdep.c: A wrong argument, which by chance did the right thing, was
corrected.


3426 08-Oct-1994 rgrimes

Correct #ifdef for nfs_disless support is #ifdef NFS, there will be no
option DISKLESS for the 2.0 nfs diskless support. A 2.0 diskless kernel
simple needs NFS linked in statically.


3406 07-Oct-1994 dg

#ifdef DISKLESS the copying of the nfs_diskless structure. Not the best
solution, but the only one I have time for at the moment.


3384 06-Oct-1994 rgrimes

1. Eliminate unused esym global from locore, our boot code never supported
that and when it does it will be done differently.

2. The kernel now does a frame setup on entry so it ``looks'' like a
real function call. This will be needed by future boot code and
debuggers.

3. Clean up stack offsets to all be in decimal and use %ebp when copying
parameters in from the boot code.

4. Implement version 1 of the uniform boot code passing mechanism with
support for kernelname passing and nfs_diskless structure passing.

5. Document the 3 different ways the kernel is called depending on what code
is calling it.


3367 04-Oct-1994 ache

Add code to handle CPU_DISRTCSET


3366 04-Oct-1994 ache

Add disable_rtc_set variable to block resettodr() call, needed for
adjkerntz -i, per Bruce suggestion


3365 04-Oct-1994 ache

CPU_DISRTCSET added to disable resettodr(), needed in adjkerntz -i,
per Bruce suggestion


3355 04-Oct-1994 ache

RTC_CENTURY usage ifdefed out by USE_RTC_CENTURY compile option,
pointed by Bruce


3315 02-Oct-1994 phk

Avoid ddb getting a panic if the code-segment isn't the usual one...


3314 02-Oct-1994 dg

Patch from HOSOKAWA Tatsumi to fix bug in the size of apm_current_gdt_pdesc

Submitted by: HOSOKAWA Tatsumi


3307 02-Oct-1994 phk

apm_bios.h: removed the equiv-stuff. Not needed now that the kernel module
works correctly.

clock.h & reg.h: prototypes.


3306 02-Oct-1994 phk

Unused variables, except one with a omnious comment.


3294 02-Oct-1994 rgrimes

If you are building a kernel without NFS statically linked in you
must #define NFS before including <sys/mount.h> to pick up some of
the definitions needed for struct diskless. Be sure to undef it after this
so you do not effect other code.

This is kinda sick, but it does the job. Problem found by davidg.


3291 02-Oct-1994 dg

"idle priority" support. Based on code from Henrik Vestergaard Draboel,
but substantially rewritten by me.


3284 02-Oct-1994 rgrimes

1. Remove all references to cyloffset, it has been unused for some time.

2. New detection code so we know what boot code called us.

3. Remove old DISKLESS support code and halt if we are called by that boot
code as it will NOT work with the new nfs_diskless structure.

This is really in preperation for new boot code and new diskless support.

Reviewed by: davidg


3283 02-Oct-1994 rgrimes

Add code to generate NFSDISKLESS_SIZE for use in locore for copying the
nfs_diskless structure in from the boot code.
Reviewed by: davidg


3258 01-Oct-1994 dg

Laptop Advanced Power Management support by HOSOKAWA Tatsumi.

Submitted by: HOSOKAWA Tatsumi


3224 30-Sep-1994 swallace

Add #ifndef ALLOW_CONFLICT_IRQ
Reviewed by: jkh


3185 29-Sep-1994 sos

Updated pcaudio.c to latest from 1.1.5.1
Enabled timer reprogramming in clock.c (this could use more work).

Obtained from: FreeBSD-1.1.5.1


3156 28-Sep-1994 bde

Ensure normal selection and alignment of the text and data sections before
including files. vector.s sometimes left the data section misaligned
(depending on the configuration) so all the time-critical globals in icu.s
were sometimes misaligned.


3117 26-Sep-1994 pst

Make Cyrix CPU flush internal cache any time it goes into hold state.
(Meant to commit this a long time ago... oh well).


3102 25-Sep-1994 dg

Inlined ins/outs functions.

Obtained from: NetBSD


3099 25-Sep-1994 dg

Undo last change: the ins/outs functions DO NOT return a pointer!


3098 25-Sep-1994 phk

While in the real world, I had a bad case of being swapped out for a lot of
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.


3058 24-Sep-1994 dg

Shuffled macros and definitions around to facilitate architecture
independance.


3047 24-Sep-1994 dg

Nuked splnet before sync. Not only is this unnecessary, but it appears
to cause problems by making it impossible to sync NFS related buffers
when rebooting.


3021 23-Sep-1994 dg

Increased SHMMAXPGS from 512 to 1024 now that there is plenty of kernel
virtual memory.


2977 22-Sep-1994 dg

From 1.1.5:

>revision 1.8
>date: 1994/06/03 06:42:30; author: davidg; state: Exp; lines: +2 -2
>Patch from Bruce Evans: npxintr() needs to mask softclock().


2941 20-Sep-1994 bde

Don't provide bogus source operands in some asms. This probably shouldn't
matter, but similar bogusness in npx.c causes compiling without -O to fail.

Use __volatile in all asms.

Parenthesize macro args.

Change the names of the macros to avoid namespace pollution.

Remove unnecessary "#ifdef __i386__".

Sort #defines.

Add comments.


2933 20-Sep-1994 bde

Don't supply the `usermode' arg to softclock(). The 2.0 softclock() doesn't
take an arg.


2932 20-Sep-1994 bde

Don't lose the RTC interrupt in resettodr().


2918 20-Sep-1994 bde

Remove the alias splnone() for spl0(). It was used only once.


2914 20-Sep-1994 ache

resettodr() now exists, enable it


2913 20-Sep-1994 ache

resettodr() implemented, inittodr() fixed
Submitted by: me & chris@gnome.co.uk


2874 18-Sep-1994 bde

The previous revision got the wrong log message (for clock.c). It should
have got the following:

Back out the changes in the previous revision. Function-like macros
were replaced by compound statements that work in less contexts.

Unoformize idempotency #ifdef.


2873 18-Sep-1994 bde

Remove some unnecessary #includes.

Restore the simple leap year calculation as a macro and document it so
that it doesn't become complicated again. The simple version works
for all leap years covered by 32-bit time_t's. The complicated version
doesn't work for all leap years covered by 64-bit time_t's since among
other reasons, the solar system is not stable for long enough.

Fix declarations.

Nuke spinwait().


2866 18-Sep-1994 bde

Clean up #includes. <machine/spl.h> has to be included by almost everything
in case an spl inline is used, so this is not the place to include it.

Uniformize idempotency #ifdef.


2858 18-Sep-1994 wollman

Redo Kernel NTP PLL support, kernel side.

This code is mostly taken from the 1.1 port (which was in turn taken from
Dave Mills's kern.tar.Z example). A few significant differences:

1) ntp_gettime() is now a MIB variable rather than a system call. A few
fiddles are done in libc to make it behave the same.

2) mono_time does not participate in the PLL adjustments.

3) A new interface has been defined (in <machine/clock.h>) for doing
possibly machine-dependent things around the time of the clock update.
This is used in Pentium kernels to disable interrupts, set `time', and
reset the CPU cycle counter as quickly as possible to avoid jitter in
microtime(). Measurements show an apparent resolution of a bit more than
8.14usec, which is reasonable given system-call overhead.


2826 16-Sep-1994 dg

Removed inclusion of pio.h and cpufunc.h (cpufunc.h is included from
systm.h). Merged functionality of pio.h into cpufunc.h. Cleaned up some
related code.


2824 16-Sep-1994 jkh

Deal with outw being defined - the declaration clashes.


2822 16-Sep-1994 phk

Made the kernel compile even without "ether".


2819 16-Sep-1994 ache

CPU_ADJKERNTZ added for resettodtr()


2818 16-Sep-1994 ache

CPU_ADJKERNTZ added to cpu_sysctl


2804 15-Sep-1994 paul

Include pio.h so that all those drivers that only include cpufunc.h
get the faster io macros/inline code rather than call the routines
in support.s

This whole area needs some going over.....


2802 15-Sep-1994 paul

Removed some macros that are now in cpufunc.h
Reviewed by: Bruce


2801 15-Sep-1994 paul

Added MCOUNT_ENTER and MCOUNT_EXIT macros to profile.h

Removed inb function since it's more correctly in pio.h

Copied write_eflags and read_eflags over from npx.c

(Some changes to the macros suggested by Bruce were not made at this
time since his suggestions probably apply to all the macros and
these inlined/macro definitions need a lot of cleaning up at some
point in the future.)

Reviewed by: Bruce


2792 15-Sep-1994 dg

Brought over from 1.1.5:

From Bruce Evans:
Protect against reentering Debugger().


2789 15-Sep-1994 dg

Brought over from 1.1.5:

Fix from Bruce Evans. There were missing sets of parantheses:

1. The checks for the standard data selectors were botched, so %ss == 0
and probably %cs == 0 were allowed. A fix is enclosed. The checks
for the standard selectors could be omitted without losing anything
since the standard selectors pass the valid_ldt_sel() tests.


2783 15-Sep-1994 sos

Added support for many more videomodes, including graphic modes up til
320x200 256col VGA. This is nessesary for the iBCS stuff to work right.
(And we get the benefit of more video modes). Uses the videocard BIOS
to optain mode tables.
Added a "green" saver, switches off the syncs for "green" monitors.

Reviewed by:
Submitted by:
Obtained from:


2772 14-Sep-1994 wollman

Beginnings of support for loadable protocol domains. In particular,
don't hard-code netisr values in icu.s, but rather, use an array of
function pointers and set them all up in machdep.c for statically-linked
protocol families. (This will eventually be done differently.)


2770 14-Sep-1994 ache

1. adjkerntz variable added for preparation to resettodr() implementation
2. Leap year calculations fixed


2739 13-Sep-1994 phk

Reversed my patch from yesterday. "eisa" if >= 0x1000.
pas0 will be in "eisa", even though it isn't.


2718 13-Sep-1994 phk

Only say eisa if ((ioaddr & 0xfff) >= 0x400)


2689 12-Sep-1994 dg

Eliminated a whole pile of ancient (we're taking 4.3BSD) VM system
related #define constants. Corrected incorrect VM_MAX_KERNEL_ADDRESS.

Reviewed by: John Dyson


2660 11-Sep-1994 dg

Be more careful about dereferencing curproc, p_vmspace, and curpcb,
otherwise the machine will overflow the stack in a recursive fault loop
(causing the machine to spontaneously reboot because of the stack fault
that ultimately happens).

Submitted by: Inspired by Bruce Evans, but this change is different
than what he suggested.


2631 09-Sep-1994 wollman

Define new MIB variable, hw.floatingpoint, which is true if FP hardware
is present, and false if an emulator is being used.


2579 08-Sep-1994 bde

Get all the definitions from DEFS.h and not directly from asmacros.h
if KERNEL is not defined. lib/msun/i387/*.S include asmacros.h to
get the definitions of ENTRY(), etc. This is bogus since asmacros.h
is only supposed to give definitions suitable for the kernel. The
current definitions for the kernel almost worked but are missing
the ".type" declarations. This caused the linker to print warnings
about doubtful relocations for almost anything linked to libm[sun].

Uniformize name and use of idempotence identifier.


2578 08-Sep-1994 bde

Remove <machine/eflags.h> and all dependencies on it. eflags.h is just
the Mach/i386 version of the BSD/vax(?) <machine/psl.h>. The Mach
version has slightly better names for many macros but is now out of
date and little used. It was originally used even less (for spelling
PSL_T as EFL_TF in <machine/db_machdep.h>).


2512 05-Sep-1994 bde

Fix comments.


2500 05-Sep-1994 dg

DOn't allow I/O register access in process 1 (oops).


2497 04-Sep-1994 dg

Improved some comments.


2495 04-Sep-1994 pst

Detect if we're running on a Cyrix 486DLC and enable automatic cache
negation whenever we access memory between 640k and 1M.

Original code from NetBSD 1.0-BETA. The exact origins are unclear but
Theo de Raadt, Charles, and Michael V. may have contributed to it.

Submitted by: pst


2493 04-Sep-1994 dg

Rewrote last vestige of code that used gs (copyinstr). The use of gs in
this routine caused problems for machines that don't set it up properly
before boot (such was the case on an EVEREX machine sitting next to me).


2492 04-Sep-1994 dg

Added pmap_mapdev() function to map device memory.


2486 04-Sep-1994 dg

Initialize eflags register - brought over from 1.1.5.


2466 02-Sep-1994 ats

Reviewed by:
Submitted by:
1) if_ie.c:
Changed a printf and put a space in it. Formerly the "<3C507>"
confused the syslog. He tried to see that as the priority to
log that message.

2) isa_device.h:
Changed the iobase variable from short to u_short. EISA
Adresses can go up to 0xf000 and the sign extension doesn't
look good in the probe output. Example:
ep1 at 0xffff8000-0xffff8000f is not good :-), i like more a
ep1 at 0x8000-0x8000f.

3) isa.c:
Changed a string constant from "probe" to "prob", it gets
later already an "ed" tagged on the end.


2457 02-Sep-1994 dg

Converted P_LINK -> P_FORW, P_RLINK -> P_BACK, minor optimization.


2455 02-Sep-1994 dg

Removed all vestiges of tlbflush(). Replaced them with calls to pmap_update().
Made pmap_update an inline assembly function.


2452 02-Sep-1994 dg

It's not necessary to make page tables write-through, so get rid of this
(this was an experimental change which probably shouldn't have been
committed). I/O pages are still marked non-cacheable, however.


2441 01-Sep-1994 dg

Realtime priority scheduling support.

Submitted by: Henrik Vestergaard Draboel


2440 01-Sep-1994 dg

Got rid of some old, unused junk.


2430 31-Aug-1994 se

Reviewed by: Stefan Esser <se>
Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Added PCI support (call of pci_config(), if NPCI > 0).


2426 31-Aug-1994 dg

Fixed bug that surfaced with last commit for NOBOUNCE -> BOUNCE_BUFFERS by
adding appropriate #ifdefs and changing some variables to externs (as they
should have always been).


2422 31-Aug-1994 dg

Rather than exclude bounce buffers support with NOBOUNCE, include it
with BOUNCE_BUFFERS. This is more intuitive, and is better for future
multiplatform support. Added BOUNCE_BUFFERS option to the GENERIC and
LINT kernel config files.


2410 30-Aug-1994 bde

Don't define LOCORE (as nothing) in sources. It is now defined
consistently (as 1) in Makefile.i386 for all assembler sources.


2400 29-Aug-1994 ache

Fake floppy partition RAW_PART=2 now


2357 28-Aug-1994 bde

Don't test if a u_int is < 0. The remaining test is sufficient and the
extra one caused a warning.


2320 27-Aug-1994 dg

1) Changed ddb into a option rather than a pseudo-device (use options DDB
in your kernel config now).
2) Added ps ddb function from 1.1.5. Cleaned it up a bit and moved into its
own file.
3) Added \r handing in db_printf.
4) Added missing memory usage stats to statclock().
5) Added dummy function to pseudo_set so it will be emitted if there
are no other pseudo declarations.


2257 24-Aug-1994 sos

Changes preparing for iBCS support
Reviewed by:
Submitted by:


2254 24-Aug-1994 sos

Changes preparing for iBCS2 support

Reviewed by:
Submitted by:


2246 23-Aug-1994 dg

Corrected some comments regarding ptes/pdes.


2245 23-Aug-1994 paul

Re-enabled inlining of inb.
Changed u_int_inb to just inb and deleted define.

The code generated is identical to that generated with the cast so
the problem was obviously fixed at some point after gcc 1.4

Reviewed by:
Submitted by:


2244 23-Aug-1994 paul

I've disabled this piece of code since it's what's
hosing syscons. Doesn anyone know anything about this
or can we just delete it now?

/*
* This roundabout method of returning a u_char helps stop gcc-1.40 from
* generating unnecessary movzbl's.
*/
#ifdef disable_for_gcc-2_6_0
#define inb(port) ((u_char) u_int_inb(port))
#endif

static inline u_int
u_int_inb(u_int port)
{
u_char data;
/*
* We use %%dx and not %1 here because i/o is done at %dx and
not at
* %edx, while gcc-2.2.2 generates inferior code (movw instead
of movl)
* if we tell it to load (u_short) port.
*/
__asm __volatile("inb %%dx,%0" : "=a" (data) : "d" (port));
return data;
}

Reviewed by:
Submitted by:


2216 22-Aug-1994 bde

Pad `_cpu_vendor' to finish on a 32-bit boundary so that most of the
locore globals aren't misaligned.


2166 21-Aug-1994 paul

Made idempotent.
Reviewed by:
Submitted by:


2152 20-Aug-1994 dg

Implemented filesystem clean bit via:

machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
Removed unused rootfs global.

vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.


2138 19-Aug-1994 dg

Removed bogus save of CMAP2.


2124 19-Aug-1994 dg

Terry Lambert's loadable kernel module support w/improvements from the
NetBSD group.


2123 19-Aug-1994 jkh

1. Make this idempotent.
2. Hack.

Hack is to define RCSID() to null macro so that new msun stuff
will compile. This does NOT belong here, and I DON'T want it to
stay, I just need to put this here for now to enable msun and we need
to talk about what our RCSID story is supposed to be. We talked about
supporting RCSID() one day, and everyone seemed to like the idea
reasonably well of making it a macro you could just no-op this way,
but we never did anything. Now I see that JTCs code has it and I'm
loath to remove it or do anything until we've discussed it some more.

Well, so how about it? What's our story vis-a-vis RCSID() going to
be?

Submitted by: jkh


2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


2103 18-Aug-1994 dg

Bruce Evans' dynamic interrupt support.

/usr/src/sys/i386/isa/clock.c:
o Garrett's statclock changes.
o Wire xxxintr, not Vclk.
o Wire using register_intr(), not setidt().

/usr/src/sys/i386/isa/icu.s:
o Garrett's statclock changes.
o Removed unused variable high_imask.
o Fake int 8 for rtc as well as int 0 for clk. Required for kernel
profiling with statclock, harmless otherwise.

/usr/src/sys/i386/isa/isa.c:
o Allow isdp->id_irq and other things in *isdp to be changed by
probes. Changing interrupts later requires direct calls to
register_intr() and unregister_intr() and more care.
ALLOW_CONFLICT_* is brought over from 1.1.5, except
ALLOW_CONFLICT_IRQ is not supported. IRQ conflict checking is
delayed until after probing so that drivers can change the IRQ
to a free one; real conflicts require more cooperation between
drivers to handle.
o Too many details to list.
o This file requires splitting and a lot more work.

/usr/src/sys/i386/isa/isa_device.h:
o Declare more things more completely.

/usr/src/sys/i386/isa/sio.c:
o Prepare to register interrupt handlers as fast.

/usr/src/sys/i386/isa/vector.s:
o Generate entry code for 16 fast interrupt handlers and 16 normal
interrupt handlers. Changed some constants to variables:
# $unit is now intr_unit[intr]. Type is int. Someday it should
be a cookie suitable for the handler (e.g., a struct com_s for
sio).
# $handler is now intr_handler[intr].
# intrcnt_actv[id_num] is now *intr_countp[intr]. The indirection
is required to get a contiguous range of counters for vmstat
and so that the drivers depend more in the driver than on the
interrupt number (drivers could take turns using an interrupt
and the counts would remain correct). There is a separate
counter for each device and for each stray interrupt. In
1.1.5, stray interrupt 7 clobbers the count for device 7 or
something worse if there is no device 7 :-(.
# mask is now intr_mask[intr] (was already indirect).
o Entry points are now _XintrI and _XfastintrI (I = intr = 0-15),
not _VdevU (U = unit).
o Removed BUILD_VECTORS stuff. There's a trace of it left for
the string table for vmstat but config now generates the
string in one piece because nothing more is required.
o Removed old handling of stray interrupts and older comments
about it.

Submitted by: Bruce Evans


2074 15-Aug-1994 wollman

Enable use of the RTC chip for the statistical clock. While this does
not provide the full accuracy of a randomized statistical clock, it does
provide greater accuracy than the previous method, while not significantly
increasing overhead. It also provides profiling support at 1024 Hz.

You must re-compile config before making a new kernel, or you will end
up with unresolved symbols.

Reviewed uy: Bruce evans said it worked for him.


2071 14-Aug-1994 ats

Submitted by: Bruce Evans
Delete the ifdef GPL_EMULATE case here and made the padding work for
both types of emulators so that there is no longer a need to compile
ps and friends new if you are using the GPL math emulator instead the
normal one.


2059 13-Aug-1994 dg

Made the kernel compile cleanly with gcc 2.6.0. Thanks go to Bruce
Evans for suggesting a method to detect various versions of gcc.


2056 13-Aug-1994 wollman

Change all #includes to follow the current Berkeley style. Some of these
``changes'' are actually not changes at all, but CVS sometimes has trouble
telling the difference.

This also includes support for second-directory compiles. This is not
quite complete yet, as `config' doesn't yet do the right thing. You can
still make it work trivially, however, by doing the following:

rm /sys/compile
mkdir /usr/obj/sys/compile
ln -s M-. /sys/compile
cd /sys/i386/conf
config MYKERNEL
cd ../../compile/MYKERNEL
ln -s /sys @
rm machine
ln -s @/i386/include machine
make depend
make


2028 11-Aug-1994 jkh

Change outb() as per Bruce's instructions so that it doesn't explicitly
try to pass its argument in the ax register.
Reviewed by:
Submitted by:


2017 11-Aug-1994 wollman

For Pentium machines, use a faster version of microtime with 8 usec
resolution (can probably be improved somewhat). Other machines take
a three-instruction hit if I586_CPU is defined, none otherwise.


2014 10-Aug-1994 wollman

Tell Pentium users their CPU speed. (More changes to make use of this
to come later.)


2001 10-Aug-1994 wollman

Handle NMI's in accordance with data in van Gilluwe book.


1999 10-Aug-1994 wollman

Some programs (like GNU configure programs) depend on the output of
`uname -s' to be something reasonable (traditionally, `i386') rather
than `PC-Class'. Make it so.


1998 10-Aug-1994 wollman

Add back in CPU detection copde from 1.1.5. As an added bonus, the
hw.model MIB variable is now declared correctly.


1977 09-Aug-1994 jkh

Merge in the necessary bits from 1.1.5.1 to make exec.h and reloc.h
happy campers again (e.g. match our own exec format). This should
make ld happy.
Submitted by: jkh


1975 09-Aug-1994 dg

Removed ntohl and ntohs functions. These were already inlined assembly in
endian.h.


1896 07-Aug-1994 dg

Made pmap_kenter "TLB safe". ...and then removed all the pmap_updates that
are no longer needed because of this.


1895 07-Aug-1994 dg

Provide support for upcoming merged VM/buffer cache, and fixed a few bugs
that haven't appeared to manifest themselves (yet).

Submitted by: John Dyson


1894 07-Aug-1994 dg

Don't kremove process VM pages (oops!). This was the cause of the instability
that was introduced last night.

Submitted by: John Dyson


1890 06-Aug-1994 dg

Fixed various prototype problems with the pmap functions and the subsequent
problems that fixing them caused.


1889 06-Aug-1994 dg

Incorporated 1.1.5 improvements to the bounce buffer code (i.e. make it
actually work), and additionally improved it's performance via new pmap
routines and "pbuf" allocation policy.

Submitted by: John Dyson


1888 06-Aug-1994 dg

Made the tmpstk start at tmpstk. Not doing so causes problems for the
debugger.

Submitted by: John Dyson


1887 06-Aug-1994 dg

Incorporated post 1.1.5 work from John Dyson. This includes performance
improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/
pmap_kremove. These routine allow fast mapping of pages for those
architectures that have "normal" MMUs. Also included is a fix to the
pageout daemon to properly check a queue end condition.

Submitted by: John Dyson


1862 05-Aug-1994 wollman

Delete redundant #ifdef __i386__, be consistent about idempotency
protection.

Submitted by: Bruce Evans


1838 04-Aug-1994 dg

Added assembly versions of ffs() and bcmp().


1837 04-Aug-1994 dg

Inlined insque and remque.


1834 04-Aug-1994 wollman

Move ieeefp.h over, and put it in the correct subdirectory this time.

Submitted by: Andrew Moore


1829 04-Aug-1994 dg

Nuked #if 0'd _insque and _remque routines - they are now inlined in
cpufunc.h.


1825 03-Aug-1994 dg

Merged in post-1.1.5 work done by myself and John Dyson. This includes:

me:
1) TLB flush optimization that effectively eliminates half of all of the
TLB flushes. This works by only flushing the TLB when a page is "present"
in memory (i.e. the valid bit is set in the page table entry). See section
5.3.5 of the Intel 386 Programmer's Reference Manual.
2) The handling of "CMAP" has been improved to catch attempts at multiple
simultaneous use.

John:
1) Added pmap_qenter/pmap_qremove functions for fast mapping of pages into
the kernel. This is for future optimizations and support for the upcoming
merged VM/buffer cache.

Reviewed by: John Dyson


1817 02-Aug-1994 dg

Added $Id$


1810 01-Aug-1994 dg

Removed all code related to the pagescan daemon, and changed 'act_count'
adjustments to compensate for a world without the pagescan daemon.


1705 11-Jun-1994 dg

Fix from Bruce Evans:
Set npx_exists = 0 in the case of broken error reporting.


1704 11-Jun-1994 dg

Fixed minor spelling error.


1703 11-Jun-1994 dg

Bruce found a bug in my changes to stop using the gs selector.

From Bruce Evans:

fu[i]byte() checked the wrong register. This caused interesting behaviour
in the GPL math emulator. The emulator does not check the values returned
by fu*() or su*() (:-() and it interpreted the address of -12(%ebp) as
-1(%ebp). The same probably occurs for all signed 8-bit offsets from
registers.

I cleaned up the new bzero() a bit.


1691 06-Jun-1994 dg

Added some missing cld's (OOPS!) and changed the position of some of
the others to make them easier to spot.


1690 06-Jun-1994 dg

trap.c:
Vastly improved trap.c from me. This rewritten version has a variety of
features, amoung them: higher performance and much higher code quality.

support.s, cpufunc.h:
No longer use gs override to enforce range limits - compare directly
against VM_MAXUSER_ADDRESS instead. The old way caused problems in
preserving the gs selector...and this method is just as fast or faster.


1689 06-Jun-1994 dg

Back out previous change for the moment - I need to commit some other
changes first.


1688 06-Jun-1994 dg

Added some missing cld's (OOPS!) and changed the position of some of
the others to make them easier to spot.


1678 04-Jun-1994 dg

Removed extra (bogus) declaration of Xrsvd14 that was confusing me.


1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


1543 25-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources


1445 03-May-1994 ats

Add two routines insl and outsl, that should do 32bit string ins and outs.
Both are completely untested in the moment. They are used from the
if_ep.c driver for the EISA card.


1442 02-May-1994 sos

Update the reprogram timer stuff, now the frequency of timer 0
can only be changed at the "right" times. Accuracy should be
assured.


1440 02-May-1994 dg

Removed some tlbflush optimizations as some of them were bogus and lead
to some strange behavior.


1432 29-Apr-1994 gclarkii

Deleted on ifdef dontdef
Added ifdef for GPL_MATH_EMULATE so we get the extra padding that is needed
in the save87 struct.


1431 29-Apr-1994 gclarkii

Added ifdef for GPL_MATH_EMULATE to keep the sytem from panicing when
using it.


1415 25-Apr-1994 dg

From John Dyson:

Fixed physio in the 386 case - write faults weren't properly implemented.


1407 23-Apr-1994 wollman

Define new option, INACCURATE_MICROTIME_IS_OK. When this is defined,
the NTP kernel PLL is disabled, and acquire_timer0() is enabled, thus
opening the door for microtime() (and hence gettimeofday()) to return
bogus timestamps. This option is necessary for the `pca' driver to
work, but is implemented to underscore the fact that accurate timekeeping
and the `pca' driver are incompatible at present. If someone writes a version
of microtime() that works when the `pca' driver is being used, this can get
junked.


1392 21-Apr-1994 sos

Added IO_PPI define, pulled timer related stuff


1391 21-Apr-1994 sos

Pulled out timer related functions -> now in clock.c


1390 21-Apr-1994 sos

New support for sharing the timers
acquire_timer / release_timer

Pulled in timer related functions from isa.c


1379 20-Apr-1994 dg

Bug fixes and performance improvements from John Dyson and myself:

1) check va before clearing the page clean flag. Not doing so was
causing the vnode pager error 5 messages when paging from
NFS. (pmap.c)
2) put back interrupt protection in idle_loop. Bruce didn't think
it was necessary, John insists that it is (and I agree). (swtch.s)
3) various improvements to the clustering code (vm_machdep.c). It's
now enabled/used by default.
4) bad disk blocks are now handled properly when doing clustered IOs.
(wd.c, vm_machdep.c)
5) bogus bad block handling fixed in wd.c.
6) algorithm improvements to the pageout/pagescan daemons. It's amazing
how well 4MB machines work now.


1362 14-Apr-1994 dg

Changes from John Dyson and myself:

1) Removed all instances of disable_intr()/enable_intr() and changed
them back to splimp/splx. The previous method was done to improve
the performance, but Bruces recent changes to inline spl* have
made this unnecessary.
2) Cleaned up vm_machdep.c considerably. Probably fixed a few bugs, too.
3) Added a new mechanism for collecting page statistics - now done by
a new system process "pagescan". Previously this was done by the
pageout daemon, but this proved to be impractical.
4) Improved the page usage statistics gathering mechanism - performance is
much improved in small memory machines.
5) Modified mbuf.h to enable the support for an external free routine when
using mbuf clusters. Added appropriate glue in various places to
allow this to work.
6) Adapted a suggested change to the NFS code from Yuval Yurom to take
advantage of #5.
7) Added fault/swap statistics support.


1342 07-Apr-1994 dg

Make Bruce happy: silently enter ddb on a BPT or trace trap if ddb is
configured in the kernel.


1335 05-Apr-1994 dg

from John Dyson:

1) fixed some bugs related to the bounce buffer code
2) vnode pager now supports clustered pageouts
3) experimental code for clustering all I/O via a new "cldisksort"
4) added >16MB check to Bustek driver
5) made some experimental algorithmic changes to the pageout daemon
6) fixed bugs in truncating mapped files (esp when mapped via NFS)
7) reorganized vnode pager I/O code


1323 02-Apr-1994 ache

Change from Bruce:
isa_dmarangecheck() is off by one error.
> ISARAM_END should be >= ISARAM_END. Only the first page above 16M
was mishandled.


1321 02-Apr-1994 dg

New interrupt code from Bruce Evans. In additional to Bruce's attached
list of changes, I've made the following additional changes:

1) i386/include/ipl.h renamed to spl.h as the name conflicts with the
file of the same name in i386/isa/ipl.h.
2) changed all use of *mask (i.e. netmask, biomask, ttymask, etc) to
*_imask (net_imask, etc).
3) changed vestige of splnet use in if_is to splimp.
4) got rid of "impmask" completely (Bruce had gotten rid of netmask),
and are now using net_imask instead.
5) dozens of minor cruft to glue in Bruce's changes.

These require changes I made to config(8) as well, and thus it must
be rebuilt.

-DG

from Bruce Evans:

sio:
o No diff is supplied. Remove the define of setsofttty(). I hope
that is enough.

*.s:
o i386/isa/debug.h no longer exists. The event counters became too
much trouble to maintain. All function call entry and exception
entry counters can be recovered by using profiling kernel (the new
profiling supports all entry points; however, it is too slow to
leave enabled all the time; it also). Only BDBTRAP() from debug.h
is now used. That is moved to exception.s. It might be worth
preserving SHOW_BITS() and calling it from _mcount() (if enabled).
o T_ASTFLT is now only set just before calling trap().
o All exception handlers set SWI_AST_MASK in cpl as soon as possible
after entry and arrange for _doreti to restore it atomically with
exiting. It is not possible to set it atomically with entering
the kernel, so it must be checked against the user mode bits in
the trap frame before committing to using it. There is no place
to store the old value of cpl for syscalls or traps, so there are
some complications restoring it.

Profiling stuff (mostly in *.s):
o Changes to kern/subr_mcount.c, gcc and gprof are not supplied yet.
o All interesting labels `foo' are renamed `_foo' and all
uninteresting labels `_bar' are renamed `bar'. A small change
to gprof allows ignoring labels not starting with underscores.
o MCOUNT_LABEL() is to provide names for counters for times spent
in exception handlers.
o FAKE_MCOUNT() is a version of MCOUNT() suitable for exception
handlers. Its arg is the pc where the exception occurred. The
new mcount() pretends that this was a call from that pc to a
suitable MCOUNT_LABEL().
o MEXITCOUNT is to turn off any timer started by MCOUNT().

/usr/src/sys/i386/i386/exception.s:
o The non-BDB BPTTRAP() macros were doing a sti even when interrupts
were disabled when the trap occurred. The sti (fixed) sti is
actually a no-op unless you have my changes to machdep.c that make
the debugger trap gates interrupt gates, but fixing that would
make the ifdefs messier. ddb seems to be unharmed by both
interrupts always disabled and always enabled (I had the branch in
the fix back to front for some time :-().
o There is no known pushal bug.
o tf_err can be left as garbage for syscalls.

/usr/src/sys/i386/i386/locore.s:
o Fix and update BDE_DEBUGGER support.
o ENTRY(btext) before initialization was dangerous.
o Warm boot shot was longer than intended.

/usr/src/sys/i386/i386/machdep.c:
o DON'T APPLY ALL OF THIS DIFF. It's what I'm using, but may require
other changes.
Use the following:
o Remove aston() and setsoftclock().
Maybe use the following:
o No netisr.h.
o Spelling fix.
o Delay to read the Rebooting message.
o Fix for vm system unmapping a reduced area of memory
after bounds_check_with_label() reduces the size of
a physical i/o for a partition boundary. A similar
fix is required in kern_physio.c.
o Correct use of __CONCAT. It never worked here for non-
ANSI cpp's. Is it time to drop support for non-ANSI?
o gdt_segs init. 0xffffffffUL is bogus because ssd_limit
is not 32 bits. The replacement may have the same
value :-), but is more natural.
o physmem was one page too low. Confusing variable names.
Don't use the following:
o Better numbers of buffers. Each 8K page requires up to
16 buffer headers. On my system, this results in 5576
buffers containing [up to] 2854912 bytes of memory.
The usual allocation of about 384 buffers only holds
192K of disk if you use it on an fs with a block size
of 512.
o gdt changes for bdb.
o *TGT -> *IDT changes for bdb.
o #ifdefed changes for bdb.

/usr/src/sys/i386/i386/microtime.s:
o Use the correct asm macros. I think asm.h was copied from Mach
just for microtime and isn't used now. It certainly doesn't
belong in <sys>. Various macros are also duplicated in
sys/i386/boot.h and libc/i386/*.h.
o Don't switch to and from the IRR; it is guaranteed to be selected
(default after ICU init and explicitly selected in isa.c too, and
never changed until the old microtime clobbered it).

/usr/src/sys/i386/i386/support.s:
o Non-essential changes (none related to spls or profiling).
o Removed slow loads of %gs again. The LDT support may require
not relying on %gs, but loading it is not the way to fix it!
Some places (copyin ...) forgot to load it. Loading it clobbers
the user %gs. trap() still loads it after certain types of
faults so that fuword() etc can rely on it without loading it
explicitly. Exception handlers don't restore it. If we want
to preserve the user %gs, then the fastest method is to not
touch it except for context switches. Comparing with
VM_MAXUSER_ADDRESS and branching takes only 2 or 4 cycles on
a 486, while loading %gs takes 9 cycles and using it takes
another.
o Fixed a signed branch to unsigned.

/usr/src/sys/i386/i386/swtch.s:
o Move spl0() outside of idle loop.
o Remove cli/sti from idle loop. sw1 does a cli, and in the
unlikely event of an interrupt occurring and whichqs becoming
zero, sw1 will just jump back to _idle.
o There's no spl0() function in asm any more, so use splz().
o swtch() doesn't need to be superaligned, at least with the
new mcounting.
o Fixed a signed branch to unsigned.
o Removed astoff().

/usr/src/sys/i386/i386/trap.c:
o The decentralized extern decls were inconsistent, of course.
o Fixed typo MATH_EMULTATE in comments. */
o Removed unused variables.
o Old netmask is now impmask; print it instead. Perhaps we
should print some of the new masks.
o BTW, trap() should not print anything for normal debugger
traps.

/usr/src/sys/i386/include/asmacros.h:
o DON'T APPLY ALL OF THIS DIFF. Just use some of the null macros
as necessary.

/usr/src/sys/i386/include/cpu.h:
o CLKF_BASEPRI() changes since cpl == SWI_AST_MASK is now normal
while the kernel is running.
o Don't use var++ to set boolean variables. It fails after a mere
4G times :-) and is slower than storing a constant on [3-4]86s.

/usr/src/sys/i386/include/cpufunc.h:
o DON'T APPLY ALL OF THIS DIFF. You need mainly the include of
<machine/ipl.h>. Unfortunately, <machine/ipl.h> is needed by
almost everything for the inlines.

/usr/src/sys/i386/include/ipl.h:
o New file. Defines spl inlines and SWI macros and declares most
variables related to hard and soft interrupt masks.

/usr/src/sys/i386/isa/icu.h:
o Moved definitions to <machine/ipl.h>

/usr/src/sys/i386/isa/icu.s:
o Software interrupts (SWIs) and delayed hardware interrupts (HWIs)
are now handled uniformally, and dispatching them from splx() is
more like dispatching them from _doreti. The dispatcher is
essentially *(handler[ffs(ipending & ~cpl)]().
o More care (not quite enough) is taken to avoid unbounded nesting
of interrupts.
o The interface to softclock() is changed so that a trap frame is
not required.
o Fast interrupt handlers are now handled more uniformally.
Configuration is still too early (new handlers would require
bits in <machine/ipl.h> and functions to vector.s).
o splnnn() and splx() are no longer here; they are inline functions
(could be macros for other compilers). splz() is the nontrivial
part of the old splx().

/usr/src/sys/i386/isa/ipl.h
o New file. Supposed to have only bus-dependent stuff. Perhaps
the h/w masks should be declared here.

/usr/src/sys/i386/isa/isa.c:
o DON'T APPLY ALL OF THIS DIFF. You need only things involving
*mask and *MASK and comments about them. netmask is now a pure
software mask. It works like the softclock mask.

/usr/src/sys/i386/isa/vector.s:
o Reorganize AUTO_EOI* macros.
o Option FAST_INTR_HANDLER_USERS_ES for people who don't trust
fastintr handlers.
o fastintr handlers need to metamorphose into ordinary interrupt
handlers if their SWI bit has become set. Previously, sio had
unintended latency for handling output completions and input
of SLIP framing characters because this was not done.

/usr/src/sys/net/netisr.h:
o The machine-dependent stuff is now imported from <machine/ipl.h>.

/usr/src/sys/sys/systm.h
o DON'T APPLY ALL OF THIS DIFF. You need mainly the different
splx() prototype. The spl*() prototypes are duplicated as
inlines in <machine/ipl.h> but they need to be duplicated here
in case there are no inlines. I sent systm.h and cpufunc.h
to Garrett. We agree that spl0 should be replaced by splnone
and not the other way around like I've done.

/usr/src/sys/kern/kern_clock.c
o splsoftclock() now lowers cpl so the direct call to softclock()
works as intended.
o softclock() interface changed to avoid passing the whole frame
(some machines may need another change for profile_tick()).
o profiling renamed _profiling to avoid ANSI namespace pollution.
(I had to improve the mcount() interface and may as well fix it.)
The GUPROF variant doesn't actually reference profiling here,
but the 'U' in GUPROF should mean to select the microtimer
mcount() and not change the interface.


1314 30-Mar-1994 dg

Eliminated the "physstrat" wart and merged it into kern_physio.c. This
patch also fixes a bug which causes a kernel VM leak.


1313 30-Mar-1994 dg

Eliminated the "physstrat" wart and merged it into kern_physio.c. This
patch also fixes a bug which causes a kernel VM leak.


1312 30-Mar-1994 dg

New routine "pmap_kenter", designed to take advantage of the special
case of the kernel pmap.


1310 25-Mar-1994 dg

ifdef KERNEL the pmap_kextract inline function; ps is unhappy otherwise.
Pointed out by Frank Terhaar-Yonkers <fty@vislab.epa.gov>.


1307 24-Mar-1994 dg

From John Dyson: performance improvements to the new bounce buffer
code.


1298 23-Mar-1994 dg

Bounce buffers. From John Dyson with help from me.


1291 21-Mar-1994 ache

Now printf("changing root... indicates raw partition for floppy
f.e. fd1d


1290 21-Mar-1994 ache

Fix printf for root system mounted on second floppy


1289 21-Mar-1994 ache

Fix for root system mounted on second floppy


1288 21-Mar-1994 dg

Changed dynamic stack grow code to grow by "SGROWSIZ" amount. Initially
allocate SGROWSIZ amount of stack. Also set vm_ssize to the initial
stack VM size. Increased DFLSSIZ stack rlimit default to 8MB.


1281 19-Mar-1994 wollman

Added cpu_model and machine variables.


1262 14-Mar-1994 dg

Performance improvements from John Dyson.

1) A new mechanism has been added to prevent pages from being paged
out called "vm_page_hold". Similar to vm_page_wire, but
much lower overhead.
2) Scheduling algorithm has been changed to improve interactive
performance.
3) Paging algorithm improved.
4) Some vnode and swap pager bugs fixed.


1247 07-Mar-1994 dg

1) enhanced in_cksum from Bruce Evans.
2) minor comment change in machdep.c
3) enhanced bzero from John Dyson (twice as fast on a 486DX/33)


1246 07-Mar-1994 dg

1) "Pre-faulting" in of pages into process address space
Eliminates vm_fault overhead on process startup and
mmap referenced data for in-memory pages.

(process startup time using in-memory segments *much* faster)

2) Even more efficient pmap code. Code partially cleaned up.
More comments yet to follow.

(generally more efficient pte management)

3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein
clustering.)

(much faster paging performance on non-write behind disk
subsystems, slightly faster performance on other systems.)

4) Slightly changed vm_pageout code for more efficiency and
better statistics. Also, resist swapout a little more.

(less likely to pageout a recently used page)

5) Slight improvement to the page table page trap efficiency.

(generally faster system VM fault performance)

6) Defer creation of unnamed anonymous regions pager until needed.

(speeds up shared memory bss creation)

7) Remove possible deadlock from swap_pager initialization.

8) Enhanced procfs to provide "vminfo" about vm objects and user
pmaps.

9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net &
socket performance and to prepare for things to come.

John Dyson
dyson@implode.root.com
David Greenman
davidg@root.com


1209 24-Feb-1994 hsu

Correct definitions of flags used by sigreturn to validate sigcontext.


1208 24-Feb-1994 hsu

validate sigcontext before restoring it


1151 13-Feb-1994 dg

Fixed bug in handling of COW - the original code was bogus and it was
only accidental that it worked. Also, don't cache non-managed pages.


1139 10-Feb-1994 dg

Patch from John Dyson:

a pv chain was being traversed while interrupts were
fully enabled in pmap_remove_all ... this is bogus, and
has been fixed in pmap.c. (sorry for adding the splimp)


1129 08-Feb-1994 dg

From: Dave Matthews <dave@prlng.co.uk>

Description:
The integer overflow instruction (into) and the interrupt instruction with
value 4 (int #4) both give rise to SIGBUS signals rather than SIGFPE. The
problem is that overflow is a trap not a fault (unlike the BOUND instruction).


1127 08-Feb-1994 dg

Fixed bugs in stack grow code, and moved it back into a seperate function
like it was originally. Also added back call to "grow" in sendsig now
that this routine actually works.


1124 08-Feb-1994 dg

Fixes from John Dyson to fix out-of-memory hangs and other problems (such
as increased swap space usage) related to (incorrectly) paging out the
page tables.


1116 07-Feb-1994 dg

Fixed calculation of physmem when the special MAXMEM kernel config overide
is used. This bug caused the buffer cache to be WAY too big when memory
was being restricted - resulting in hangs and other out of memory problems.


1104 06-Feb-1994 dg

At the suggestion of Bruce Evans, don't zero RTC diag register. Doing so
was causing problems for some machines.


1072 01-Feb-1994 dg

Minor cleanup. Decode state information better in the case of a fatal
trap.


1066 01-Feb-1994 dg

Bug fix from previous WINE commit. From Jeffrey Hsu.


1058 01-Feb-1994 dg

Removed all uses of "USE_486_WRITE_PROTECT" and made this automatic.
Reordered and removed some NOP's.


1056 31-Jan-1994 dg

Added four pattern memory test routine that is done at startup.
...added filli - "fill integer" support routine.


1055 31-Jan-1994 dg

Added four pattern memory test routine that is done at startup.


1051 31-Jan-1994 dg

WINE/user LDT support from John Brezak, ported to FreeBSD by Jeffrey Hsu
<hsu@soda.berkeley.edu>.


1046 31-Jan-1994 dg

Make I/O memory explicitly non-cacheable. This is purely an asthetic
change.


1045 31-Jan-1994 dg

VM system performance improvements from John Dyson and myself. The
following is a summary:

1) increased object cache back up to a more reasonable value.
2) removed old & bogus cruft from machdep.c (clearseg, copyseg,
physcopyseg, etc).
3) inlined many functions in pmap.c
4) changed "load_cr3(rcr3())" into tlbflush() and made tlbflush inline
assembly.
5) changed the way that modified pages are tracked - now vm_page struct
is kept updated directly - no more scanning page tables.
6) removed lots of unnecessary spl's
7) removed old unused functions from pmap.c
8) removed all use of page_size, page_shift, page_mask variables - replaced
with PAGE_ constants.
9) moved trunc/round_page, atop, ptoa, out of vm_param.h and into i386/
include/param.h, and optimized them.
10) numerous changes to sys/vm/ swap_pager, vnode_pager, pageout, fault
code to improve performance. LRU algorithm modified to be more
effective, read ahead/behind values tuned for better performance,
etc, etc...


1029 27-Jan-1994 dg

Removed no longer used "wire" element in pv struct.


1028 27-Jan-1994 dg

Made pmap_is_managed a static inline function.


1002 22-Jan-1994 rgrimes

Now prints ``on eisa'' if id_iobase >= 0x1000, and made a slight code
cleanup for the other 2 cases of ``on motherboard'' and ``on isa''.


991 21-Jan-1994 dg

Remove some old, unused, major UGLY code.


990 21-Jan-1994 dg

System V IPC code from Danny Boulet, chewed on a bit by the NetBSD group
and then some more by Jeffrey Hsu (who provided this port for FreeBSD).


989 20-Jan-1994 dg

Pointed out by Wolfgang Solfrank:
Correct parameters of sync


988 20-Jan-1994 dg

Removed some more old unused code/comments. Added hack to "fix" the
problem with some chipsets (UMC) remapping the 'hole' memory even when
you've got 16MB. People were led to believe that since there was only
16MB of memory in the machine, that they were okay wrt the ISA DMA
limit. This hack simply causes the extra memory to be ignored if it
appears around the 16MB limit.


987 20-Jan-1994 dg

Improved algorithm that calculates the pages in the base memory - If the
BIOS says that the amount is *between* 0-640K, believe it. Cleaned up
the comments a bit, removed some old cruff, etc.


981 17-Jan-1994 dg

Improvements mostly from John Dyson, with a little bit from me.

* Removed pmap_is_wired
* added extra cli/sti protection in idle (swtch.s)
* slight code improvement in trap.c
* added lots of comments
* improved paging and other algorithms in VM system


980 17-Jan-1994 rgrimes

Add missing paren so that it now compiles.


976 16-Jan-1994 ats

Updated the TODO file with missing things.
Changed the output of the isa probe routine, that only devices, that
have an IO address and are smaller than 0x100 to be on the motherboard.
The seagate SCSI adapter is an example of a card, that doesn't have
an IO address and works only memory mapped.


975 16-Jan-1994 martin

NFS Diskless booting support added.


974 14-Jan-1994 dg

"New" VM system from John Dyson & myself. For a run-down of the
major changes, see the log of any effected file in the sys/vm
directory (swap_pager.c for instance).


965 10-Jan-1994 ache

Correct Vresume size, we have now 32 bits for it.


948 05-Jan-1994 rgrimes

Fixed comment that refered to 8252 (we really have 8253's).
Per some one on the mailing list.


926 03-Jan-1994 dg

Increased maximum and default 'size' limits to more reasonable values.


924 03-Jan-1994 dg

Convert syscall to trapframe. Based on work done by John Brezak.


911 22-Dec-1993 dg

Raised minimum buffer cache from 128k to 256k.


907 21-Dec-1993 dg

Changed pointer type from caddr_t to void * for fillw, insw, outsw, and
outsb.


884 20-Dec-1993 wollman

Document use of counters 29 and 30 for CCITT netisrs.


879 19-Dec-1993 wollman

Make everything compile with -Wtraditional. Make it easier to distribute
a binary link-kit. Make all non-optional options (pagers, procfs) standard,
and update LINT to reflect new symtab requirements.

NB: -Wtraditional will henceforth be forgotten. This editing pass was
primarily intended to detect any constructions where the old code might
have been relying on traditional C semantics or syntax. These were all
fixed, and the result of fixing some of them means that -Wall is now a
realistic possibility within a few weeks.


856 13-Dec-1993 dg

added some panics to catch the condition where pmap_pte returns null
- indicating that the page table page is non-resident.


849 12-Dec-1993 dg

1) Added proc file system from Paul Kranenburg with changes from
John Dyson to make it reliably work under FreeBSD.
2) Added and enabled PROCFS in the GENERICxx and LINT kernels.
3) New execve() from me. Still work to be done here, but this version
works well and is needed before other changes can be made. For
a description of the design behind this, see freebsd-arch or
ask me.
4) Rewrote stack fault code; made user stack VM grow as needed rather
than all up front; improves performance a little and reduces
process memory requirements.
5) Incorporated fix from Gene Stark to fault/wire a user page table
page to fix a problem in copyout. This is a temporary fix and
is not appropriate for pageable page tables. For a description
of the problem, see Gene's post to the freebsd-hackers mailing
list.
6) Tighten up vm_page struct to reduce memory requirements for it. ifdef
pager page lock code as it's not being used currently.
7) Introduced new element to vmspace struct - vm_minsaddr; initial
(minimum) stack address. Compliment to vm_maxsaddr.
8) Added a panic if the allocation for process u-pages fails.
9) Improve performance and accuracy of kernel profiling by putting in
a little inline assembly instead of spl().
10) Made serial console with sio driver work. Still has problems with
serial input, but is almost useable.
11) Added -Bstatic to SYSTEM_LD in Makefile.i386 so that kernels will
build properly with the new ld.


827 03-Dec-1993 alm

From: Jeffrey Hsu <hsu@soda.berkeley.edu>

The following patch adds the addr argument to signal handlers.

The kernel with the patch is no more and no less in compliance or in
violation of POSIX and ANSI C than the kernel before the patch.

The added functionality this addr argument provides is quite useful. It
enables an entire class of algorithms which use mprotect to trace memory
references. Beside garbage collectors, I have heard of this technique being
applied to debuggers and profilers. The only benchmarking I've performed is
using akcl to compile maxima: without the kernel patch, it takes 7 hours to
compile maxima, while with stratified garbage collection, it only takes 50
minutes.

Basically, I can't think of a reason not to add the addr argument and there
is a compelling need for it.

If you find the patch acceptable, please let me know so I can send my
FreeBSD akcl config files to wfs for inclusion in the core akcl release.
The old 386BSD config files there won't work on either NetBSD or FreeBSD.


806 28-Nov-1993 dg

Patch from Gene Stark:

Subject: Page fault in PTE area fails in copyout
Index: sys/i386/i386/trap.c FreeBSD-1.0.2

Description:
Reading files of several megabytes into Emacs, or many small
files all at once, would fail with "IO error - bad address".

Repeat-By:
The bug can be exercised by a test program that malloc()'s
a 5MB chunk of memory, and then, without accessing the memory
first, filling it with data from a file using read().
(I read 64k chunks from /dev/wd0d into successive 64k regions
of the 5MB chunk.) The read() will fail with EFAULT at the first
virtual address boundary that is a multiple of 0x400000.

Fix:
The problem was code in sys/i386/i386/trap.c that tries to
figure out what kind of trap occurred and to handle it appropriately.
It was interpreting any page fault with virtual address
>= vm->vm_maxsaddr as being a user stack segment fault.
In fact, addresses >= USRSTACK are in the user structure/PTE area,
and if they are handled as stack faults, the proper PTE will
not be paged in when it is supposed to be. This situation comes
up in copyout() and copyoutstr(), if PTE's are accessed for the
first time ever. The page fault on accessing the nonexistent PTE
is mishandled as a stack fault, and then the fault that occurs on
the subsequent access to the page itself causes copyout to fail
with EFAULT.


798 25-Nov-1993 wollman

Make the LINT kernel compile with -W -Wreturn-type -Wcomment -Werror, and
add same (sans -Werror) to Makefile for future compilations.


790 22-Nov-1993 dg

patches from Julian Elischer -
Added support for mmapping /dev/mem


778 17-Nov-1993 wollman

Fixed comments that start within a comment, so code compiles cleanly with
-Wcomment.


775 17-Nov-1993 ache

If netmask == 0, new value changed from 0x8000 to 0x10000
(don't mess with IRQ15)


774 16-Nov-1993 dg

new process tracing code from Sean Eric Fagen (sef@kithrup.com).
...also, fixed up the syscall args to make GCC happy.


765 14-Nov-1993 ache

if netmask == 0, then the loopback code can do some really
bad things.
workaround for this: if netmask == 0, set it to 0x8000,
which is value used by splsoftclock


760 14-Nov-1993 rgrimes

Add _bde_exists: label so that the global is really defined. Fix spelling
error (mount -> amount)


757 13-Nov-1993 dg

First steps in rewriting locore.s, and making info useful
when the machine panics.

i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.

i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr

i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.

i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.

kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c

i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.

i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s

...and various changes to various header files to make all of the
above happen.


736 09-Nov-1993 alm

Applied David Greenman's hack to disable IRQ conflict checking
when COM_MULTIPORT is defined.


724 07-Nov-1993 wollman

Get rid of WFJ's use of sleep() for more user-friendly tsleep().


720 07-Nov-1993 wollman

Made all header files idempotent and moved incorrect common data from
headers into a related source file. Also fixed a bug in ed_probe() where
it was possible to fall off the end of the function


719 07-Nov-1993 wollman

Made all header files idempotent and moved incorrect common data from
headers into a related source file. Added cons.h as first step towards
moving i386/i386/cons.h to machine/cons.h where it belongs.


718 07-Nov-1993 wollman

Made all header files idempotent and moved incorrect common data from
headers into a related source file. (This is the only change to locore.s).
Also fixed pg() to be properly declared and use stdargs.


701 04-Nov-1993 dg

splnone()'s in the trap code can be deadly. Save/restore previous priority
instead.


700 04-Nov-1993 ache

DST offset calculation removed, it is wrong in any case.


695 03-Nov-1993 paul

Restored comments that were removed from npx.c using # comment
format rather than /* */, as per advise from Jordan.


690 03-Nov-1993 paul

Removed comments from within asm block.

New gas fails to parse comments within asm blocks properly. Simply
remove them until gas gets fixed.


689 01-Nov-1993 chmr

Modified the "rude stack hack" that it only applies to addresses within
the stack area and not memory above VM_MAXUSER_ADDRESS.
That way, copyout and friends now work for pages whose page table entries
have not yet been allocated/been paged out.


683 29-Oct-1993 dg

Whoops, the algorithm I last used was messed up - I left off parans, and
should have used PGSHIFT instead of PAGE_SHIFT.


682 29-Oct-1993 dg

Change filesystem buffer cache size calculation to be less for 4MB
machines (now 20% of all memory after the first 3MB). This is necessary
in order for 4MB machine to be able to rebuild the entire source tree
and not run out of physical memory because of fixed memory requirements
of processes and kernel VM.


630 18-Oct-1993 rgrimes

>From: Julian Elischer <julian@jules.dialix.oz.au>
Date: Tue, 19 Oct 1993 02:22:41 -40962758 (WST)

As the subject line says:
I can;t believe this typo is still here.

Has NOBODY used the isa_dmastart() routine for 16bit DMA?

I know I just hit the dma regs directly for the AHA1542,
and it appears that either everybody else does as well, or
they only use 8bit DMA (e.g. floppy)

Editors Note:
The definition of DMA2_CHN was incorrectly using IO_DMA1!


625 16-Oct-1993 rgrimes

Removed all patch kit headers, sccsid and rcsid strings, put $Id$ in, some
minor cleanup. Added $Id$ to files that did not have any version info, etc


621 16-Oct-1993 rgrimes

Removed all patch kit headers, sccsid and rcsid strings, put $Id$ in, some
minor cleanup. Added $Id$ to files that did not have any version info, etc


620 16-Oct-1993 rgrimes

Removed all patch kit headers, sccsid and rcsid strings, put $Id$ in, some
minor cleanup. Added $Id$ to files that did not have any version info, etc


619 16-Oct-1993 rgrimes

Removed all patch kit headers, sccsid and rcsid strings, put $Id$ in, some
minor cleanup. Added $Id$ to files that did not have any version info, etc


608 15-Oct-1993 rgrimes

genassym.c:
Remove NKMEMCLUSTERS, it is no longer define or used.

locores.s:
Fix comment on PTDpde and APTDpde to be pde instead of pte
Add new equation for calculating location of Sysmap
Remove Bill's old #ifdef garbage for counting up memory,
that stuff will never be made to work and was just cluttering
up the file.

Add code that places the PTD, page table pages, and kernel
stack below the 640k ISA hole if there is room for it, otherwise
put this stuff all at 1MB. This fixes the 28K bogusity in
the boot blocks, that can now go away!

Fix the caclulation of where first is to be dependent on
NKPDE so that we can skip over the above mentioned areas.
The 28K thing is now 44K in size due to the increase in
kernel virtual memory space, but since we no longer have
to worry about that this is no big deal.

Use if NNPX > 0 instead of ifdef NPX for floating point code.

machdep.c
Change the calculation of for the buffer cache to be
20% of all memory above 2MB and add back the upper limit
of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL
of the kernel memory space on large memory machines, note
that this will not even come into effect unless you have
more than 32MB. The current buffer cache limit is 6.7MB
due to this caclulation.

It seems that we where erroniously allocating bufpages pages
for buffer_map. buffer_map is UNUSED in this implementation
of the buffer cache, but since the map is referenced in
several if statements a quick fix was to simply allocate
1 vm page (but no real memory) to it.

pmap.h
Remove rcsid, don't want them in the kernel files!

Removed some cruft inside an #ifdef DEBUGx that caused
compiler errors if you where compiling this for debug.

Use the #defines for PD_SHIFT and PG_SHIFT in place of
constants.

trap.c:
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Remove a now completly invalid check for a maximum virtual
address, the virtual address now ends at 0xFFFFFFFF so
there is no more MAX!! (Thanks David, I completly missed
that one!)

vm_machdep.c
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Replace several 0xFE00000 constants with KERNBASE


607 15-Oct-1993 rgrimes

param.h:

Mark the fact that PGSHIFT and PDRSHIFT are really the same as
PG_SHIFT and PD_SHIFT, these should be collapsed some day soon.

Document that KERNBASE should really be KPTDPTDI << PDRSHIFT, for
now leave it as the constant 0xFE000000 until I make a seperate
common header file for this stuff (vmaddresses.h?)

Remove NKMEMCLUSTERS define, it was only being used to define
VM_KMEM_SIZE, so why have all the indirection. Besides who wants
to work in CLBYTE sizes chuncks.


pmap.h:

Fix $Id$ and some other minor format clean ups.

Remove the XXX comment about NKPDE, since it now has the correct value
of 7.

Remove unused LASTPTDI and move the APTD into the very end of memory to
free up 4MB of kernel virtual address space.
Remove unused RSVDPTDI and free up 12MB of kernel virtual address space.


vmparam.h

Fix $Id$.

Increase SHMMAXPGS to 512 (2MB) now that there is room for it to be
bigger. The XXX comment stays until the kernel moves down in memory
to free up enough space to use the proper default of 4MB.

VM_KMEM_SIZE is now a direct constant stating the size of the kernel
malloc region. Increased the value from 3MB to 16MB.


604 14-Oct-1993 rgrimes

>From David Greenman

Bruce Evans had limited the kernel virtual address space to not include the
last 4MB since it was not being used. Other changes are being made that will
reloate the Alternate Page Directory Table (APDT) into this area so the limit
is being fixed to be the last virtual address. (Infact with this patch you
can now do that relocation)


593 13-Oct-1993 rgrimes

ALL:

Removed patch kit headers and rcsid strings, add $Id$.

isa.c:

Removed old #ifdef notyet isa_configure code, since it will never be
used, and I have done 90% of what it attempted to.

Add conflict checking code that searchs back through the devtab's looking
for any device that has already been found that may conflict with what
we are about to probe. Checks are mode for I/O address, memory address,
IRQ, and DRQ. This should stop the screwing up of any device that has
alread been found by other device probes.
Print out messages when we are not going to probe a device due to
a conflict so the user knows WHY something was not found. For example:

aha0 not probed due to irq conflict with ahb0 at 11

Now print out a message when a device is not found so the user knows
that it was probed for, but could not be found. For example:

ed1 not found at 0x320

For devices that have I/O address < 0x100 say that they are on the
motherboard, not on isa! The 0x100 magic number is per ISA spec. It
may seem funny that pc0 and sc0 report as being on the motherboard, but
this is due to the fact that the I/O address used is that of the keyboard
controller which IS on the motherboard. We really need to split the
keyboard probe from the display probe. It is completly legal to build
a pc with out one or the other, or even with out both!

npx.c:

Return -1 from the probe routine if we are using the Emulator so
that the i/o addresses are not printed, this is the same trick used
for 486's.

Do not print the ``Errors reported via Exception 16'', and
``Errors reported via IRQ 13'' messages any more, since these just lead
to more user confusion that anything. It still prints the message
``Error reporting broken, using 387 emulator'' so that the person is
aware that there mother board is ill.


592 13-Oct-1993 rgrimes

Removed hack that did the R_SHIFT of unsigned numbers, no longer need
to do this as I have changed to using PDTI's as the bases for the vm
system layout.

Eliminate constants SYSPDROFF and SYSPDREND, now use NKPTE to control the size
of the kernel virtual space.

Eliminate constant PDRPDROFF, now use PDTDTPI to control location of PTD,
PTDmap and PTDpde

Eliminate constant APDRPDROFF, now use APTDPTDI to control location of APTD,
APTDmap and APTDpde.

Still need to fix Sysmap location (it is still a constant).

.globl statements are now consistent with respect to <comma><space>, the
<space> being removed from all .globl statements.

Document the fillkpt macro as to what registers control what.

Fix some comments that went past column 80, and clean/line some others up.

Remove constand for _Crtat, now use KERNBASE+constant, this still needs work.

Replace constants for offsets of sigcode parameters with symbolic names
from assym.s

Mark the sigreturn() call with XXX since we use the hardcoded constant
for the system call number, this is bogus and should be a #define or
something some place!

The kernel before and after this change was verified with cmp, not one
byte changed. These are all cosmetic clean up changes that makes the
code more correct and easier to move the kernels virtual address space
and size.


590 12-Oct-1993 rgrimes

Add Page Table Directory Indexes (NKPDE, KPTDI, PTDPTDI, APTDPTDI) to
be used to replace more constants in locore.


589 12-Oct-1993 rgrimes

KPTDI_LAST renamed to KPTDI


588 12-Oct-1993 rgrimes

Eliminate definition of I386_PAGE_SIZE and use NBPG instead

Cleaned up tabs vs spaces after #define to make file consistent.
Removed now unused definitions of I386_PAGE_SIZE and I386_PDR_SIZE

Note That these two where unused and had the wrong values anyway!
Changed I386_KPDES to NKPDE
Changed I386_UPDES to NUPDE

Redid constant assignments of *PTDI's to be sizeable and relative.


587 12-Oct-1993 rgrimes

Eliminate definition of I386_PAGE_SIZE and use NBPG instead
Replace 0xFE000000 constants with KERNBASE
Use new definition NKPDE in place of a first-last+1 calculation.


570 10-Oct-1993 rgrimes

SYSPDROFF and SYSPDREND are now calculated using KERNBASE, KERNSIZE and
PDRSHIFT.

The SYSTEM constant that was defined in this file has been replaced
with KERNBASE from param.h.

Changed almost all # style comments to /* */ C style comments. Several
comments cleaned up so that they make a little more since.

In the comments that describe C calling conventions to assembler routines
used a comma space sequence to seperate arguments and removed the space
between the function name and the argument list.

Removed useless comments like /* clr eax */.

Changed all comma space sequences on assemble instructions to just be comma.

Removed spaces after $ operators to make the file consistent, this may need
to change again (ie: $KERNBASE should probably be $(KERNBASE), but for now
it all seems to work just fine.) This may become a problem with the C
pre-processor.

Changed several double blank lines to single blank lines that where used
to seperate the I/O routines, these routines are blocked enough that we
don't need double blank lines between them.

Changed sequence of I/O routines to be all input functions, all output
functions instead of just the opposite.

Moved the SHOW_A_LOT debug stuff to near the end of the file.

Changed two occurances of the constant 0xfff to NBPG-1.


569 10-Oct-1993 rgrimes

Added a compile time #error so that if the user does not specify on of
the proper I_X86CPU in the config file the following error will occur
while building the kernel: (had to line wrap the error for this message)

../../i386/i386/machdep.c:343: #error This kernel is not configured for one \
of the supported CPUs


567 10-Oct-1993 rgrimes

Added PDRSHIFT and KERNSIZE so that the PDR offsets can be calculated in
locore.s instead of being constants (3F8, 3FA).


561 09-Oct-1993 dg

Correct spelling of "SHMMAXPGS" so the config override will actually work.


557 08-Oct-1993 rgrimes

All:
Remove patch kit headers, and add $Id$
This is mostly to align some more code with NetBSD.

cpu.h:
Remove the old function vs. include configuration stuff that was
ifdefed out when we went to inline functions.
Remove the define of resettodr that made it a nop, there is
already a function that makes it a nop, no need to #define one.
Remove the #defines of processor types, they are now defined
in cputypes.h, #include that file.
Add struct cpu_nameclass for support of cpu types.

frame.h:
include sys/signal.h, it will be needed in the future.
put the sigframe structure here that was in machdep.c

pcb.h:
Add multiple inclusion protection.
Add pcb_ldt and pcb_ldt_len to pcb structure, this is for the
user mode ldt.


556 08-Oct-1993 rgrimes

All:
removed patch kit headers and sccsids, add $Id$. This is a general
clean up and reallignment with NetBSD-current where possible.

genassym.c:
removed extranious include of reg.h
removed old FP_* defines that have been ifdefed out since the patch kit
removed PCB_SIGC that is not referenced anywhere
add trapframe and sigframe defines
add KERNBASE define for use in locore.s

locore.s:
include npx.h and use NNPX for turning on and off FPU
include machine/cputypes.h for the types of cpu (used in cpu_identify)
change SYSPDREND to be one higher, this is really the base of the
next area, and will be changing again next time I revise the file
Reverse the NOP defines, you now get slow NOP's by default, this
may be what is casuing us trouble with some systems. If you want
the NOPS to be null you now need to have options DUMMY_NOPS.
Now get esym from the boot blocks which don't pass it yet, and
it is not used, but this will be changing.
Move the bit_colors stuff to be in with the rest of Bruces SHOW_A_LOT
things for debugging.
Added NetBSD's CPU type probe code, we now know what type of CPU
we are running on.
Adjust kernel pde calcuation to correct for change in SYSPDREND, no
longer need the +1.

machdep.c
include npx.h and use NNPX for turning on and off FPU
include isa.h, map.h(new file), exec.h in preperation for
changes that are still in process.
Add some of the code for MACHINE_NONCONTIG that will alow us
to better map around the BIOS memory area.
Now print the version, cpu id, real memory and availiable memory
during boot.
Correct the calculation of bufpages, the code was mixing pages
and bytes, it now does the right things. Removed Bill's hack
for limiting the erronous calculation.
add the identifycpu print out code from NetBSD.
remove the definition of the sigframe struct, it belongs in
frame.h
put in printf's about syncing disks on a halt/reboot.
Change the halted message to be a little easier reading.
Clean up of the dump messages, makes the source and the output
much more readable.
Change 0,0 in several places to have spaces after the commas.


553 08-Oct-1993 rgrimes

Define the types of cpu's there are, from NetBSD


550 08-Oct-1993 rgrimes

Architecture specific syscalls (i386) from NetBSD


549 08-Oct-1993 rgrimes

Removed patch kit headers, and rcsid, add $Id$, relocate Terry Lamberts
copyright to match the location that it is in NetBSD.

Remove the __main() {} dummy function, it belongs in kern/init_main.c


528 30-Sep-1993 rgrimes

This is a fix for the 32K DMA buffer region that was not accounted for,
it relocates it to be after the BIOS memory hole instead of right below
the 640K limit.
THANK YOU CHRIS!!!

From: <cgd@postgres.Berkeley.EDU>
Date: Wed, 29 Sep 93 18:49:58 -0700
basically, reserve a new 32k space right after firstaddr,
and put the buffer space there...

the diffs are below, and are in ~cgd/sys/i386/i386 (in machdep.c)
on freefall. i obviously can't test them, so if some of you would
look the diffs over and try them out...


519 29-Sep-1993 rgrimes

Add symbolic name for system page directory end, and change constant to
a calculation for the system page directory tables.


511 27-Sep-1993 rgrimes

define SHMMAXPGS where it is suppose to be, you can over ride this with
a kernel config options "SHMAXPGS=xxx", default is currently 64 pages
due to limit kernel map space.


504 24-Sep-1993 rgrimes

>From: rich@id.slip.bcm.tmc.edu.cdrom.com (Rich Murphey)
Date: Sun, 12 Sep 1993 18:19:05 -0500
This will allow you to compile and run a freebsd kernel with shared
memory support. I haven't tested the shm*() calls yet.

You run out of page table descriptors if you specify 4Mb of sharable
memory (SHMMAXPGS=1024). I don't know what the limit is, but
SHMMAXPGS=64 works. Rich


434 10-Sep-1993 nate

Removed volatile functions which were causing grief in the system, since
volatile functions are undefined, and there is no reason to have them
in our kernel.


433 10-Sep-1993 rgrimes

This is just to shut the compiler up
===================================================================
RCS file: /a/cvs/386BSD/src/sys/i386/i386/vm_machdep.c,v
retrieving revision 1.3
diff -c -r1.3 vm_machdep.c
*** 1.3 1993/07/27 10:52:21
--- vm_machdep.c 1993/09/10 20:12:53
***************
*** 179,184 ****
--- 179,186 ----
#endif
splclock();
swtch();
+ /*NOTREACHED*/
+ for(;;);
}

cpu_wait(p) struct proc *p; {


424 09-Sep-1993 rgrimes

Changed the pg("ptdi> %x") to a printf and then a panic, since we are
going to panic shortly after this anyway. Destroys less state, and
keeps the machine from waiting for someone to smash the return key
a few times before it panics!


396 06-Sep-1993 rgrimes

Removed patch kit header, added $Id$
Added support of DONET({IMP,NS,ISO}) so you can now compile with options
NS and ISO, still missing some IMP code, but since the imp is old and
gone I doubt this will ever be used.


373 01-Sep-1993 rgrimes

Increased stack size to 8MB just to be on the real safe side.


351 28-Aug-1993 rgrimes

Changed trap.c so that a panic will occur if we do not have hardware
FP and we try to call the emulator when it is not compiled in.
Removed the #if defined(i486) || defined(i387) that use to call the
panic if we did not have a math emulator.
Removed an extranious include of i386/i386/math_emu.h from math_emulate.c.


348 28-Aug-1993 rgrimes

Changed MAXSSIZ from MAXDSIZ to 2MB


338 27-Aug-1993 alm

prefixed inline functions' parameter names with _ and declared
the return type explicitly.


326 25-Aug-1993 alm

adding fpgetround(3) IEEE floating point environment support


322 24-Aug-1993 rgrimes

Corrected off by 2 error in DELAY macro (it was delaying for 2 * value).
From Bruce Evans.


269 10-Aug-1993 rgrimes

Removed one more reverence to the old Adaptec 1542b as.c driver, one less
dependent for autoconf.c.


265 09-Aug-1993 rgrimes

Moved _eintr{names,cnt} so that vmstat -i does not report all the debugging
stuff of the fast interrupt code.


259 09-Aug-1993 rgrimes

From guido@gvr.win.tue.nl Sat Aug 7 06:58:04 1993

I posted some patches on the 386bsd_patchkit list to prohibit io access.
Because of a noninitialised filed in the tss, this was possible.
It is included below as the patch to machdep.c
However, when you do this *necessary* fix (security), it will be
impossible form within user space to do io.

therefor, I included another fix: when you open /dev/io, you
get the access. Of course you can rewrite it to use another minor
and thus giving access to the iospace when /dev/mem is opened, e.g.

NOTE: The /dev/io entry has not been added to /dev/MAKEDEV yet.
The patch is in NetBSD.


256 08-Aug-1993 rgrimes

Removed the asking for a root file system when booting from floppy as that
is now handled by the new boot blocks immediatly after the kernel is loaded.


200 27-Jul-1993 dg

* Applied fixes from Bruce Evans to fix COW bugs, >1MB kernel loading,
profiling, and various protection checks that cause security holes
and system crashes.
* Changed min/max/bcmp/ffs/strlen to be static inline functions
- included from cpufunc.h in via systm.h. This change
improves performance in many parts of the kernel - up to 5% in the
networking layer alone. Note that this requires systm.h to be included
in any file that uses these functions otherwise it won't be able to
find them during the load.
* Fixed incorrect call to splx() in if_is.c
* Fixed bogus variable assignment to splx() in if_ed.c


140 18-Jul-1993 paul

Added volatile void to cpu_exit() in the hope that it would
stop warning about returning from gcc.

It hasn't but the declaration is still correct.


134 16-Jul-1993 dg

New locore from Christoph Rubitschko.


132 16-Jul-1993 dg

Updated kernel files to move occurances of "struct args" syscall
argument definitions outside of the function parameter list. This is
to reduce the copious warning messages that (non-Jolitz) gcc produces.
Also fixed some bogus variable declarations and casts to make the
compiler happy.


126 15-Jul-1993 dg

Modified attach printf's so that the output is compatible with the "new"
way of doing things. There still remain several drivers that need to
be updated. Also added a compile-time option to pccons to switch the
control and caps-lock keys (REVERSE_CAPS_CTRL) - added for my personal
sanity.


118 12-Jul-1993 rgrimes

Fixed two occarances of ldos which should have been lods.
(From Christoph Robitschko)


90 03-Jul-1993 root

Increased default data size (DFLDSIZ) to 16MB. Need to rebuild libutil,
kernel, ps and w for this to work!


80 30-Jun-1993 nate

Added (protection) around negative constants, in case a program wants
to use the negative of that constant.

#define NEG_NUM -3
#define SAFE_NEG_NUM (-3)

i = -NEG_NUM; /* Error --3 */
j = -SAFE_NEG_NUM /* Okay -(-3) */


79 29-Jun-1993 nate

Setting up for updated (usable) FPE atof/vfprintf/vfscanf fixes


24 18-Jun-1993 rgrimes

Obsolete if_we.c driver, more attach call to where it belongs.
Still need to fix all the drivers.


8 18-Jun-1993 paul

Upgrade to GCC 2.X


5 12-Jun-1993 rgrimes

This commit was generated by cvs2svn to compensate for changes in r4,
which included commits to RCS files with non-trunk default branches.