Cross Reference: /freebsd-10.0-release/sys/ia64/ia64/

History log of /freebsd-10.0-release/sys/ia64/ia64/
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
259065	07-Dec-2013	gjb	- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1 [1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.0-release/sys/conf/newvers.sh /freebsd-10.0-release/sys/sys/param.h
256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
255724	20-Sep-2013	alc	The pmap function pmap_clear_reference() is no longer used. Remove it. pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division
255289	06-Sep-2013	glebius	On those machines, where sf_bufs do not represent any real object, make sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely empty functions. Reviewed by: alc, kib, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix
255028	29-Aug-2013	alc	Significantly reduce the cost, i.e., run time, of calls to madvise(..., MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
254667	22-Aug-2013	kib	Revert r254501. Instead, reuse the type stability of the struct pmap which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
254138	09-Aug-2013	attilio	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
254025	07-Aug-2013	jeff	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
253559	23-Jul-2013	marcel	In ia64_mca_init(), don't limit the allocation of the info block to fall within the first 256MB of memory. The origin/reason for that limitation is not known, but it's not believed to be required for proper initialization. What is known is that the Altix 350 does not have physical memory at that address (by virtue of the address space bits). Keep the boundary at 256MB so that the info block will be covered by a single direct-mapped translation. While here, change the flags to M_NOWAIT to eliminate confusion. It does not change the behaviour of contigmalloc(). What is does is makes the flags argument explicitly say what the actual behaviour is.
253558	23-Jul-2013	marcel	In pmap_mapdev(), if the physical memory range is not covered by an EFI memory descriptor, don't return NULL as the virtual address, return the direct-mapped uncacheable virtual address for it. At first, this was needed only for the Altix 350, but now even some high-end HP machines have devices mapped to physical addresses that aren't covered by the EFI memory map.
250884	21-May-2013	attilio	o Relax locking assertions for vm_page_find_least() o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc
248508	19-Mar-2013	kib	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
248280	14-Mar-2013	kib	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks
248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
247463	28-Feb-2013	mav	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.
247454	28-Feb-2013	davide	MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.
247251	25-Feb-2013	marcel	kernacc() expects all KVAs to be covered in the kernel map. With the introduction of the PBVM, this stopped being the case. Redefine the VM parameters so that the PBVM is included in the kernel map. In particular this introduces VM_INIT_KERNEL_ADDRESS to point to the base of region 5 now that VM_MIN_KERNEL_ADDRESS points to the base of region 4 to include the PBVM. While here define KERNBASE to the actual link address of the kernel as is intended. PR: 169926
246890	17-Feb-2013	marcel	Close a race relating to setting the PCPU pointer (r13). Register r13 points to the TLS in user space and points to the PCPU structure in the kernel. The race is the result of having the exception handler on the one hand and the RPC system call entry on the other. The EPC syscall path is non-atomic in that interrupts are enabled while the two stacks are switched. The register stack is switched last as that is the stack used to determine whether we're going back to user space by the exception handler. If we go back to user space, we restore r13, otherwise we leave r13 alone. The EPC syscall path however set r13 to the PCPU structure before switching the register stack, which means that there was a window in which the exception handler would restore r13 when it was already pointing to the PCPU structure. This is fatal when the exception happened on CPU x, but left from the exception on anotehr CPU. In that case r13 would point to the PCPU of the CPU the thread was running on. This immediately results in getting the wrong value for curthread. The fix is to make sure we assign r13 after we set ar.bspstore to point to the kernel register stack for the thread.
246882	16-Feb-2013	marcel	Return EFAULT when the address is not a kernel virtual address.
246715	12-Feb-2013	marcel	Eliminate the PC_CURTHREAD symbol and load the current thread's thread structure pointer atomically from r13 (the pcpu pointer) for the current CPU/core. Add a CTASSERT in machdep.c to make sure that pc_curthread is in fact the first field in struct pcpu. The only non-atomic operations left were those related to process- space operations, such as casuword, subyte, suword16, fubyte, fuword16, copyin, copyout and their variations. The casuword function has been re-structured more complete than the others. This way we have an example of a better bundling without introducing a lot of risk when we get it wrong. The other functions can be rebundled in separate commits and with the appropriate testing.
246713	12-Feb-2013	kib	Reform the busdma API so that new types may be added without modifying every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback. The cam changes unify the bus_dmamap_load* handling in cam drivers. The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map. Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
246712	12-Feb-2013	marcel	Now that we actually use more memory descriptors, make sure to dump them as well.
243040	14-Nov-2012	kib	Flip the semantic of M_NOWAIT to only require the allocation to not sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks
242534	03-Nov-2012	attilio	Rework the known rwlock to benefit about staying on their own cache line in order to avoid manual frobbing but using struct rwlock_padalign. Reviewed by: alc, jimharris
242218	28-Oct-2012	kib	Fix compilation on ia64 when page size is configured for 16KB. Reviewed by: alc, marcel
242121	26-Oct-2012	alc	Port the new PV entry allocator from amd64/i386. This allocator has two advantages. First, PV entries are roughly half the size. Second, this allocator doesn't access the paging queues, and thus it allows for the removal of the page queues lock from this pmap. Replace all uses of the page queues lock by a R/W lock that is private to this pmap. Tested by: marcel
241020	28-Sep-2012	alc	Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.
240244	08-Sep-2012	attilio	userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week
239379	18-Aug-2012	marcel	Use pmap_kextract(x) rather than pmap_extract(kernel_pmap, x). The former knows about all the special mappings, like PBVM. The kernel text and data are in the PBVM.
239376	18-Aug-2012	marcel	Remove support for SKI: HP's Itanium simulator. It's pretty much not used, serves very little value given that FreeBSD runs on real H/W for a long time. Note that SKI is open-source (see http://ski.sourceforge.net), so if there's interest and value again, then this code can be revived. Discussed with: jhb
239331	16-Aug-2012	jhb	Add locking for sscdisk(4) and mark it MPSAFE. Since this driver just makes calls out to the emulator, the locking is fairly simple. A global mutex protects the list of ssc disks, and each ssc disk has a mutex to protect it's bioq. Approved by: marcel
239065	05-Aug-2012	kib	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
238257	08-Jul-2012	marcel	Move PCPU initialization to a new function called cpu_pcpu_setup(). This makes it easier to add additional CPU or platform information to the per-CPU structure without duplicated code.
238256	08-Jul-2012	marcel	Unleash the APs at SI_SUB_KICK_SCHEDULER so that we have them all up and running to service interrupts. This is especially important when the firmware has bound interrupts to CPUs, like for the SGI Altix 350. We wake up APs at SI_SUB_CPU time and they sit and spin until we unleash them, so there's nothing fundamentally different from a MD perspective.
238190	07-Jul-2012	marcel	Implement ia64_physmem_alloc() and use it consistently to get memory before VM has been initialized. This includes: 1. Replacing pmap_steal_memory(), 2. Replace the handcrafted logic to allocate a naturally aligned VHPT, 3. Properly allocate the DPCPU for the BSP. Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't cross into the next PBVM page. If we were to cross into the next page, then there wouldn't be a PTE entry on the page table for it and we would end up with a MCA following a page fault. As such, this commit fixes MCAs occasionally seen.
238184	07-Jul-2012	marcel	Hide the creation of phys_avail behind an API to make it easier to do it correctly. We now iterate the EFI memory descriptors once and collect all the information in a single pass. This includes: 1. The I/O port base address, 2. The PAL memory region. Have the physmem API track this. 3. Memory descriptors of memory we can't use, like bad memory, runtime services code & data, etc. Have the physmem API track these. 4. memory descriptors of memory we can use or re-use, such as free memory, boot time services code & data, loader code & data, etc. These are added by the physmem API. Since the PBVM page table and pages are in memory described as loader data, inform the physmem API of chunks that need to be delated from the available physical memory. While here, remove Maxmem and replace it with the better named paddr_max. Maxmem was defined as physmem, which is generally wrong. Now, paddr_max is properly defined as the largesty physical address. The upshot of all this is that: 1. We properly determine realmem. 2. We maximize physmem by re-using memory where possible. 3. We remove complexity from ia64_init() in machdep.c. 4. Remove confusion about realmem, physmem & Maxmem. The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c, as well as replace the handcrafted allocation of the VHPT for the BSP in pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation of phys_avail after it is being created.
236375	01-Jun-2012	alc	pmap_alloc_vhpt() doesn't need the pages that it allocates to be mapped into the kernel map, so vm_page_alloc_contig() can be used in place of contigmalloc(). Reviewed by: marcel
235041	04-May-2012	marcel	Don't assume we have legacy PICs (i.e. 8259A in cascade) at the legacy I/O port addresses. Even if we do, this is hardly the place to mask interrupts. It's not clear that this was at all needed. The code came with CVS revision 1.2 of nexus.c when interrupt support was first added. What is known is that ia64 has always been designed around the IOSAPIC, and that doing I/O like this prevents Altix from booting.
232356	01-Mar-2012	jhb	- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned long for specifying a boundary constraint. - Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary constraints. These allow boundary constraints to be fully expressed for cases where sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a driver to properly specify a 4GB boundary in a PAE kernel. Note that this cannot be safely MFC'd without a lot of compat shims due to KBI changes, so I do not intend to merge it. Reviewed by: scottl
232250	28-Feb-2012	gavin	Correct capitalization of "Hz" in user-visible text (manpages, printf(), etc). MFC after: 3 days
231177	08-Feb-2012	marcel	Rev. 228360 moved the call to cpu_set_upcall() to happen before td_proc gets initialized in td (=newtd). Use td0 instead. MFC after: 3 days
228631	17-Dec-2011	avg	kern cons: introduce infrastructure for console grabbing by kernel At the moment grab and ungrab methods of all console drivers are no-ops. Current intended meaning of the calls is that the kernel takes control of console input. In the future the semantics may be extended to mean that the calling thread takes full ownership of the console (e.g. console output from other threads could be suspended). Inspired by: bde MFC after: 2 months
228522	15-Dec-2011	alc	Eliminate vestiges of page coloring.
227309	07-Nov-2011	ed	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
227293	07-Nov-2011	ed	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.
225841	28-Sep-2011	kib	Remove locking of the vm page queues from several pmaps, which only protected the dirty mask updates. The dirty mask updates are handled by atomics after the r225840. Submitted by: alc Tested by: flo (sparc64) MFC after: 2 weeks
225617	16-Sep-2011	kmacy	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
225474	11-Sep-2011	kib	Inline the syscallenter() and syscallret(). This reduces the time measured by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks
225418	06-Sep-2011	kib	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
224746	09-Aug-2011	kib	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
224668	06-Aug-2011	marcel	Fix kernel core dumps now that the kernel is using PBVM. The basic problem to solve is that we don't have a fixed mapping from kernel text to physical address so that libkvm can bootstrap itself. We solve this by passing the physical address of the bootinfo structure to the consumer as the entry point of the core file. This way, libkvm can extract the PBVM page table information and locate the kernel in the core file. We also need to dump memory chunks of type loader data, because those hold the kernel and the PBVM page table (among other things). Approved by: re (blanket)
224663	05-Aug-2011	marcel	Follow-up commit: refactor pmap_kextract() to make it easier to catch and debug issues like the one fixed in the previous commit: Replace all return statements with goto statements so that we end up at a single place with a value for the physical address. Print a message for all unknown KVA addresses. Approved by: re (blanket)
224662	05-Aug-2011	marcel	Remove stray semicolon in pmap_kextract() that turned the conditional "return (0)" into an unconditional one and as such broke PBVM address queries -- such as during kernel core dumps. Approved by: re (blanket)
224216	19-Jul-2011	attilio	On 64 bit architectures size_t is 8 bytes, thus it should use an 8 bytes storage. Fix the sintrcnt/sintrnames specification. No MFC is previewed for this patch. Reported, reviewed and tested by: marcel Approved by: re (kib)
224187	18-Jul-2011	attilio	- Remove the eintrcnt/eintrnames usage and introduce the concept of sintrcnt/sintrnames which are symbols containing the size of the 2 tables. - For amd64/i386 remove the storage of intr* stuff from assembly files. This area can be widely improved by applying the same to other architectures and likely finding an unified approach among them and move the whole code to be MI. More work in this area is expected to happen fairly soon. No MFC is previewed for this patch. Tested by: pluknet Reviewed by: jhb Approved by: re (kib)
224184	18-Jul-2011	jhb	Implement bus_adjust_resource() for the ia64 nexus driver. Reviewed by: marcel Approved by: re (kib)
224116	16-Jul-2011	marcel	Don't assume pmap_mapdev() gets called only for memory mapped I/O addresses (i.e. uncacheable). ACPI in particular uses pmap_mapdev() rather excessively (number of calls) just to get a valid KVA. To that end, have pmap_mapdev(): 1. cache the last result so that we don't waste time for multiple consecutive invocations with the same PA/SZ. 2. find the memory descriptor that covers the PA and return NULL if none was found or when the PA is for a common DRAM address. 3. Use either a region 6 or region 7 KVA, in accordance with the memory attribute.
224114	16-Jul-2011	marcel	Don't send EOI to the CPU before we handled the interrupt. This could potentially trigger multiple pending interrupts for level-sensitive interrupts. However, the event timer interrupt does need EOI before being handled to avoid missing clock events. These conflicting requirements are handled by having the XIV handler inform the dispatch code whether or not it send EOI to the CPU. If not, the dispatch code will do it. This allows handlers to send EOI before doing potentially long-running activities, while still have a sensible default behaviour.
224112	16-Jul-2011	marcel	Add a few more helper functions for working with memory descriptors: o efi_md_find() - returns the md that covers the given address o efi_md_last() - returns the last md in the list o efi_md_prev() - returns the md that preceeds the given md.
223873	08-Jul-2011	marcel	Implement basic support for memory attributes. At this time we only distinguish between UC and WB memory so that we can map the page to either a region 6 address (for UC) or a region 7 address (for WB). This change is only now possible, because previously we would map regions 6 and 7 with 256MB translations and on top of that had the kernel mapped in region 7 using a wired translation. The introduction of the PBVM moved the kernel into its own region and freed up region 7 and allowed us to revert to standard page-sized translations. This commit inroduces pmap_page_to_va() that respects the attribute.
223758	04-Jul-2011	attilio	With retirement of cpumask_t and usage of cpuset_t for representing a mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient. Remove them and replace their usage with custom pc_cpuid magic (as, atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))). This change is not targeted for MFC because of struct pcpu members removal and dependency by cpumask_t retirement. MD review by: marcel, marius, alc Tested by: pluknet MD testing by: marcel, marius, gonzo, andreast
223732	02-Jul-2011	alc	When iterating over a paging queue, explicitly check for PG_MARKER, instead of relying on zeroed memory being interpreted as an empty PV list. Reviewed by: kib
223700	30-Jun-2011	marcel	Change the management of nested faults by switching to physical addressing while reading or writing the trap frame. It's not possible to guarantee that the one translation cache entry that we depend on is not going to get purged by the CPU. We already know that global shootdowns (ptc.g and/or ptc.ga) can (and will) cause multiple TC entries to get purged and we initialize tried to handle that by serializing kernel entry with these operations. However, we need to serialize kernel exit as well. But even if we can serialize, it appears that CPU threads within a core can affect each other's TC entries beyond the global shootdown. This would mean serializing any and all translatation cache updates with the threads in a core with the kernel entry and exit of any thread in that core. This is just too painful and complicated. Since we already properly coded for the 2 nested faults that we can get, all we need to do is use those to obtain the physical address of the trap frame, switch to physical mode and in that way eliminate any further faults. The trap frame is already aligned to 1KB boundaries to make sure we don't cross the page boundary, this is safe to do. We still need to serialize ptc.g or ptc.ga across CPUs because the platform can only have 1 such operation outstanding at the same time. We can now use a regular (spin) lock for this. Also, it has been observed that we can get a nested TLB faults for region 7 virtual addresses. This was unexpected. For now, we enhance the nested TLB fault handler to deal with those as well, but it needs to be understood.
223677	29-Jun-2011	alc	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib
223544	25-Jun-2011	marcel	Oops. The sec field of struct bintime is not a 32-bit type. It's time_t, which is 64 bits on ia64.
223542	25-Jun-2011	marcel	Define the minimum fractional period in terms of hz. We know hz is a magnitude smaller than itc_freq. A minimum period of 10*hz is sufficient precision. As a side-effect, the number of clocks per second, when the machine is idle, dropped by more than 50%. Be anal and define the maximum period to be at least 4G seconds. With a 64-bit counter and an ITC frequency that's expected to be always less than 4Ghz, it takes longer than that to wrap around.
223529	25-Jun-2011	marcel	Replace the original copyright notice with my own. Everything in this file is written by me and has no bearing on the initial or original version.
223528	25-Jun-2011	marcel	Update copyright.
223526	25-Jun-2011	marcel	Switch to the event timers infrastructure. This includes: o Setting td_intr_frame to the XIVs trap frame because it's referenced by the ET event handler. o Signal EOI to the CPU before calling the registered XIV handlers. This prevents lost ITC interrupts, which cause starvation in one-shot mode. o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters. o Have the APs call cpu_initclocks() so as to limited the scattering of clock related initialization. cpu_initclocks() calls the <self>_bsp() or <self>_ap() version accordingly. o Uncomment the ET clock handling in cpu_idle(). o Update the DDB 'show pcpu' output for the new MD fields. o Entirely rewritten ia64_ih_clock(). Note that we don't create as many clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale. We can only have 240 XIVs and we can have more CPUs than that. There's a single intrcnt index for the cumulative clock ticks and we keep per CPU counts in the PCPU stats structure. o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order). Open issues: o Clock interrupts can still be lost. Some tweaking is still necessary. Thanks to: mav@ for his support, feedback and explanations. ET stats while committing: eris% sysctl machdep.cpu \| grep nclks machdep.cpu.0.nclks: 24007 machdep.cpu.1.nclks: 22895 machdep.cpu.2.nclks: 13523 machdep.cpu.3.nclks: 9342 machdep.cpu.4.nclks: 9103 machdep.cpu.5.nclks: 9298 machdep.cpu.6.nclks: 10039 machdep.cpu.7.nclks: 9479 eris% vmstat -i \| grep clock clock 108599 50
223478	23-Jun-2011	marcel	Unblock the outgoing thread after we performed pmap_switch() to switch the region registers. pmap_switch() returns the pmap for which the region register are currently programmed, which needs to be re-programmed on the CPU the ougoing thread gets switched in. This change does not noticibly change anything or fix known bugs, but does give me a warm fuzzy feeling by being more correct.
223171	17-Jun-2011	marcel	Improve on style(9)
223170	17-Jun-2011	marcel	Properly serialize the global shootdown with the instruction stream of the local processor. Also explicitly invalidate the ALAT. This is done on the other CPUs in the coherence domain by virtue of the ptc.ga instruction, but does not apply to the local CPU.
222971	11-Jun-2011	marcel	Add the model number for the Montvale processor (marketed as Itanium 2 9100). At this time we're missing just one: Tukwila (Itanium 2 9300).
222813	07-Jun-2011	attilio	etire the cpumask_t type and replace it with cpuset_t usage. This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being. Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES. Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64). Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC. People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately. Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development. (I really hope I didn't forget anyone, if it happened I apologize in advance).
222800	07-Jun-2011	marcel	Call set_cputicker() to have the time counter use the ITC register. Note that the ITC frequency is fixed.
222769	06-Jun-2011	marcel	Improve cpu_idle(): o cpu_idle_hook is expected to be called with interrupts disabled and re-enables interrupts on return. o sync with x86: don't idle when the CPU has runnable tasks o have callers of ia64_call_pal_static() disable interrupts and re-enable interrupts. o add, but compile-out, support for idle mode. This will be enabled at some later time, after proper testing.
222531	31-May-2011	nwhitehorn	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb
221894	14-May-2011	marcel	Prefer switching the memory stack from user to kernel before switching the register stack. While the ordering doesn't matter, it creates an invariant not previously there: the memory stack pointer will always be larger than the register stack pointer. With this invariant in place, it's easier to add instrumentation code that detects a stack overflow because in such a scenario the memory stack pointer and register stack pointers have crossed each other. Aside: basic kernel operation needs about half the stack size (~16K) at most. We have plenty of head room on the kernel stack...
221893	14-May-2011	marcel	Sharpening the saw: o Clobber the register that holds the restart token immediately after crossing the restart point. This prevents false positives (i.e. a nested exception that we don't know can happen and that is being treated as one we know by virtue of a lingering restart token). o Now that the bootstrap kernel stack is free, switch onto it and call trap() for nested traps that we don't know about. In trap we panic() so that we can analyze the condition.
221606	07-May-2011	marcel	In pmap_kextract(), return the physical address for PBVM virtual addresses as well (incl. the PBVM page table).
221271	30-Apr-2011	marcel	Stop linking against a direct-mapped virtual address and instead use the PBVM. This eliminates the implied hardcoding of the physical address at which the kernel needs to be loaded. Using the PBVM makes it possible to load the kernel irrespective of the physical memory organization and allows us to replicate kernel text on NUMA machines. While here, reduce the direct-mapped page size to the kernel's page size so that we can support memory attributes better.
221218	29-Apr-2011	jhb	Change rman_manage_region() to actually honor the rm_start and rm_end constraints on the rman and reject attempts to manage a region that is out of range. - Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of ~0ul). - To preserve existing behavior, change rman_init() to set rm_start and rm_end to allow managing the full range (0 to ~0ul) if they are not set by the caller when rman_init() is called.
221173	28-Apr-2011	attilio	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks
219841	21-Mar-2011	marcel	Fix switching to physical mode as part of calling into EFI runtime services or PAL procedures. The new implementation is based on specific functions that are known to be called in certain scenarios only. This in particular fixes the PAL call to obtain information about translation registers. In general, the new implementation does not bank on virtual addresses being direct-mapped and will work when the kernel uses PBVM. When new scenarios need to be supported, new functions are added if the existing functions cannot be changed to handle the new scenario. If a single generic implementation is possible, it will become clear in due time. While here, change bootinfo to a pointer type in anticipation of future development.
219808	21-Mar-2011	marcel	Change region 4 to be part of the kernel. This serves 2 purposes: 1. The PBVM is in region 4, so if we want to make use of it, we need region 4 freed up. 2. Region 4 and above cannot be represented by an off_t by virtue of that type being signed. This is problematic for truss(1), ktrace(1) and other such programs.
219758	18-Mar-2011	marcel	o Move the IVT and supporting functions to the front of the text segment so that it's always mapped by the loader. o Change the alternate fault handlers to account for PBVM. Since currently the region is handled by the VHPT, no alternate faults will be generated for it.
219756	18-Mar-2011	marcel	Remove inclusion of unneeded bootinfo.h header.
219741	18-Mar-2011	marcel	Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though it's not used anywhere in the source tree.
219523	11-Mar-2011	mdf	Mostly revert r219468, as I had misremembered the C standard regarding the size of an extern array. Keep one change from strncpy to strlcpy.
219468	10-Mar-2011	mdf	Use MAXPATHLEN rather than the size of an extern array when copying the kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh
219405	08-Mar-2011	dchagin	Extend struct sysvec with new method sv_schedtail, which is used for an explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week
218195	02-Feb-2011	mdf	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week
217688	21-Jan-2011	pluknet	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
217519	17-Jan-2011	jkim	Remove empty dev_mem_md_init() stubs.
215052	09-Nov-2010	jhb	Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.
214835	05-Nov-2010	jhb	Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month
213282	29-Sep-2010	neel	Fix bogus error message from bus_dmamem_alloc() about incorrect alignment. The check for alignment should be made against the physical address and not the virtual address that maps it. Sponsored by: NetApp Submitted by: Will McGovern (will at netapp dot com) Reviewed by: mjacob, jhb
212413	10-Sep-2010	avg	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days
211515	19-Aug-2010	jhb	Remove unused KTRACE includes.
210939	06-Aug-2010	jhb	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month
209775	07-Jul-2010	marcel	Remove pointless BOOTP conditional.
209749	06-Jul-2010	marcel	Provide more examples for error injection.
209671	03-Jul-2010	marcel	Allocate and setup an interrupt vector for corrected machine checks. For now, just print when we get the interrupt, but eventually we need to collect the details and provide a more useful report.
209613	30-Jun-2010	jhb	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
209085	12-Jun-2010	marcel	The ptc.g operation for the Mckinley and Madison processors has the side-effect of purging more than the requested translation. While this is not a problem in general, it invalidates the assumption made during constructing the trapframe on entry into the kernel in SMP configurations. The assumption is that only the first store to the stack will possibly cause a TLB miss. Since the ptc.g purges the translation caches of all CPUs in the coherency domain, a ptc.g executed on one CPU can cause a purge on another CPU that is currently running the critical code that saves the state to the trapframe. This can cause an unexpected TLB miss and with interrupt collection disabled this means an unexpected data nested TLB fault. A data nested TLB fault will not save any context, nor provide a way for software to determine what caused the TLB miss nor where it occured. Careful construction of the kernel entry and exit code allows us to handle a TLB miss in precisely orchastrated points and thereby avoiding the need to wire the kernel stack, but the unexpected TLB miss caused by the ptc.g instructution resulted in an unrecoverable condition and resulting in machine checks. The solution to this problem is to synchronize the kernel entry on all CPUs with the use of the ptc.g instruction on a single CPU by implementing a bare-bones readers-writer lock that allows N readers (= N CPUs entering the kernel) and 1 writer (= execution of the ptc.g instruction on some CPU). This solution wins over a rendez-vous approach by not interrupting CPUs with an IPI. This problem has not been observed on the Montecito. PR: ia64/147772 MFC after: 6 days
209048	11-Jun-2010	alc	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)
209026	11-Jun-2010	marcel	Bump MAX_BPAGES from 256 to 1024. It seems that a few drivers, bge(4) in particular, do not handle deferred DMA map load operations at all. Any error, and especially EINPROGRESS, is treated as a hard error and typically abort the current operation. The fact that the busdma code queues the load operation for when resources (i.e. bounce buffers in this particular case) are available makes this especially problematic. Bounce buffering, unlike what the PR synopsis would suggest, works fine. While on the subject, properly implement swi_vm(). PR: 147502 MFC after: 1 week
208990	10-Jun-2010	alc	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@
208659	30-May-2010	alc	Simplify the inner loop of get_pv_entry(): While iterating over the page's pv list, there is no point in checking whether or not the pv list is empty, wait instead until the loop completes.
208646	29-May-2010	alc	Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.
208574	26-May-2010	alc	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
208514	24-May-2010	kib	Change ia64' struct syscall_args definition so that args is a pointer to the arguments array instead of array itself. ia64 syscall arguments are readily available in the frame, point args to it, do not do unnecessary bcopy. Still reserve the array in syscall_args for ia32 emulation. Suggested and reviewed by: marcel MFC after: 1 month
208504	24-May-2010	alc	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
208453	23-May-2010	kib	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
208392	21-May-2010	jhb	- Adjust the whitespace for the lines that output fields in 'show pcpu' in DDB so that all the fields line up. - Print out the tid of the per-CPU idlethread instead of the pid since the idle process is now shared across all idle threads. MFC after: 1 month
208175	16-May-2010	alc	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.
207796	08-May-2010	alc	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
207410	30-Apr-2010	kmacy	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
207373	29-Apr-2010	alc	MFamd64/i386 r207205 Clearing a page table entry's accessed bit and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it. Moreover, on ia64, don't set the page's dirty field unless pmap_protect() is removing write access.
207329	28-Apr-2010	attilio	- Extract the IODEV_PIO interface from ia64 and make it MI. In the end, it does help fixing /dev/io usage from multithreaded processes. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved, where necessary, by the necessity to define very small things now. Manpage update will happen shortly. Sponsored by: Sandvine Incorporated PR: threads/116181 Reviewed by: emaste, marcel MFC after: 3 weeks
207155	24-Apr-2010	alc	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!
206570	13-Apr-2010	marcel	Populate the sysctl tree with any MCA records we collected. The sequence number is used as the name of a sysctl node, under which we add the MCA records using the CPU id as the leaf name. Add the hw.mca.inject sysctl to provide a way to inject MC errors and trigger machine checks. PR: ia64/113102
206558	13-Apr-2010	marcel	Change the (generic) argument to ia64_store_mca_state() from the cpuid to the struct pcpu of the CPU. We casting between pointer types only then.
205726	27-Mar-2010	marcel	Implement interrupt to CPU binding. Assign interrupts to CPUs in a round-robin fashion, starting with the highest priority interrupt on the highest-numbered CPU and cycling downwards.
205723	27-Mar-2010	marcel	Remove nx_pcibus from the nexus resource. Nexus is not involved with PCI busses. Remove nexus_read_ivar() and nexus_write_ivar() to give default behaviour. Remove <machine/nexusvar.h> as well, because there's nothing in it that's being used.
205713	26-Mar-2010	marcel	Rename disable_intr() to ia64_disable_intr() and rename enable_intr() to ia64_enable_intr(). This reduces confusion with intr_disable() and intr_restore(). Have configure_final() call ia64_finalize_intr() instead of enable_intr() in preparation of adding support for binding interrupts to all CPUs.
205665	26-Mar-2010	marcel	Only use the interval timer for clock interrupts on the BSP and have the BSP use IPIs to trigger clock interrupts on the APs. This allows us to run on hardware configurations for which the ITC has non-uniform frequencies across CPUs. While here, change the clock XIV to type IPI so as to protect the interrupt delivery against CPU re-balancing once that's implemented.
205660	26-Mar-2010	nwhitehorn	Fix the ia64 build. Pointy hat to: me
205642	25-Mar-2010	nwhitehorn	Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb
205454	22-Mar-2010	marcel	o Remove the pmap argument to pmap_invalidate_all() as it's not used other than in a potentially dangerous KASSERT. o Hand-inline pmap_remove_page() as it's only called from 1 place and the abstraction that pmap_remove_page() provides is not enough to warrant the obfuscation. Eliminate the dangerous KASSERT in the process. o In pmap_remove_pte(), remove the KASSERT for pmap being the current one as it's not safe in the face of CPU migration.
205435	22-Mar-2010	marcel	Drop the pmap argument to pmap_invalidate_page(). It's not used other than in a KASSERT. The KASSERT is broken in that it's done outside the critical section and as such isn't protected against CPU migration. Improve pmap_invalidate_page() as follows: o calculate vhpt_ofs inside the critical region for exactly the same reason. o calculate the tag outside the FOREACH loop, as it's loop-invariant. This is more efficient. o Replace the test and set with an atomic cmpset operation because we are changing other CPU's VHPT tables and this avoids invalidating after the entry got modified. Not necessarily a problem, but better safe than sorry.
205434	22-Mar-2010	marcel	With preemption, the high FP registers may get enabled by cpu_switch() before we grab the mutex. Don't assert that they must be disabled at that point. We pretty much bypass all logic in that case anyway and leave immediately, so there's no harm.
205433	22-Mar-2010	marcel	Fix interrupt handling by extending the critical region so that preemption doesn't happen until after all pending interrupt have been services. While here again, simplify the EOI handling by doing it after we call the XIV-specific handlers, rather than in each of them. The original thought was that we may want to do an EOI first and the actual IPI handling next, but that's mostly a micro-optimization.
205429	21-Mar-2010	marcel	Print MD fields in the pcpu to aid debugging.
205357	20-Mar-2010	marcel	Don't check for boot_verbose in the environment. The loader does that already and sets RB_VERBOSE. The loader has always done it.
205234	17-Mar-2010	marcel	Revamp the interrupt code based on the previous commit: o Introduce XIV, eXternal Interrupt Vector, to differentiate from the interrupts vectors that are offsets in the IVT (Interrupt Vector Table). There's a vector for external interrupts, which are based on the XIVs. o Keep track of allocated and reserved XIVs so that we can assign XIVs without hardcoding anything. When XIVs are allocated, an interrupt handler and a class is specified for the XIV. Classes are: 1. architecture-defined: XIV 15 is returned when no external interrupt are pending, 2. platform-defined: SAL reports which XIV is used to wakeup an AP (typically 0xFF, but it's 0x12 for the Altix 350). 3. inter-processor interrupts: allocated for SMP support and non-redirectable. 4. device interrupts (i.e. IRQs): allocated when devices are discovered and are redirectable. o Rewrite the central interrupt handler to call the per-XIV interrupt handler and rename it to ia64_handle_intr(). Move the per-XIV handler implementation to the file where we have the XIV allocation/reservation. Clock interrupt handling is moved to clock.c. IPI handling is moved to mp_machdep.c. o Drop support for the Intel 8259A because it was broken. When XIV 0 is received, the CPU should initiate an INTA cycle to obtain the interrupt vector of the 8259-based interrupt. In these cases the interrupt controller we should be talking to WRT to masking on signalling EOI is the 8259 and not the I/O SAPIC. This requires adriver for the Intel 8259A which isn't available for ia64. Thus stop pretending to support ExtINTs and instead panic() so that if we come across hardware that has an Intel 8259A, so have something real to work with. o With XIVs for IPIs dynamically allocatedi and also based on priority, define the IPI_* symbols as variables rather than constants. The variable holds the XIV allocated for the IPI. o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV assigned to IPI_STOP is delivered.
205172	15-Mar-2010	marcel	Have cpu_throw() loop on blocked_lock as well. This bug has existed a long time and has gone unnoticed just as long, because I kept using sched_4bsd (due to sched_ule not working with preemption), but GENERIC had sched_ule by default -- including SMP. While here, remove unused inclusion of <machine/clock.h>, remove totally bogus inclusion of <i386/include/specialreg.h>.
205014	11-Mar-2010	nwhitehorn	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
204905	09-Mar-2010	marcel	Remove inclusion of <i386/include/psl.h> While here move inclusion of <sys/lock.h> in a better place.
204904	09-Mar-2010	marcel	Remove support for SYS_RES_DRQ.
204425	27-Feb-2010	marcel	Interrupt related cleanups: o Assign vectors based on priority, because vectors have implied priority in hardware. o Use unordered memory accesses to the I/O sapic and use the acceptance form of the mf instruction. o Remove the sapicreg.h and sapicvar.h headers. All definitions in sapicreg.h are private to sapic.c and all definitions in sapicvar.h are either private or interface functions. Move the interface functions to intr.h. o Hide the definition of struct sapic.
204184	22-Feb-2010	marcel	Prefer I-units and M-units for nop instructions. This works around McKinley flaws. It also avoids using the F-unit in the kernel for no reason.
204183	21-Feb-2010	marcel	Normalize nop instructions: Only use 0 for the immediate operand.
204182	21-Feb-2010	marcel	Remove pm_active from struct pmap as it serves no purpose. MFC after: 1 week
203883	14-Feb-2010	marcel	Some code churn: o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used by calling either bus_space_map() or pmap_mapdev(). o Implement bus_space_map() in terms of pmap_mapdev() and implement bus_space_unmap() in terms of pmap_unmapdev(). o Have ia64_pib hold the uncached virtual address of the processor interrupt block throughout the kernel's life and access the elements of the PIB through this structure pointer. This is a non-functional change with the exception of using ia64_ld1() and ia64_st8() to write to the PIB. We were still using assignments, for which the compiler generates semaphore reads -- which cause undefined behaviour for uncacheable memory. Note also that the memory barriers in ipi_send() are critical for proper functioning. With all the mapping of uncached memory done by pmap_mapdev(), we can keep track of the translations and wire them in the CPU. This then eliminates the need to reserve a whole region for uncached I/O and it eliminates translation traps for device I/O accesses.
203572	06-Feb-2010	marcel	Fix single-stepping when the kernel was entered through the EPC syscall path. When the taken branch leaves the kernel and enters the process, we still need to execute the instruction at that address. Don't raise SIGTRAP when we branch into the process, but enable single-stepping instead.
203054	27-Jan-2010	marcel	In cpu_switch(), use an atomic operation to set the td_lock of the old thread to the mutex that's passed. Pointed out by: attilio, jhb
202904	23-Jan-2010	marcel	Remove cpu_boot() and call efi_reset_system() directly from cpu_reset().
202273	14-Jan-2010	marcel	Add ioctl requests to /dev/io on ia64 for reading and writing EFI variables. The primary reason for this is that it allows sysinstall(8) to add a boot menu item for the newly installed FreeBSD image.
202272	14-Jan-2010	marcel	Fix previous commitr:. efi_var_set() was copied from efi_var_get(), but wasn't actually changed.
202271	14-Jan-2010	marcel	Add wrappers for the RT Variable Services. While here, translate the EFI status into a standard errno value and change efi_set_time() to return a standard error. MFC after: 1 week
202097	11-Jan-2010	marcel	Use io(4) for I/O port access on ia64, rather than through sysarch(2). I/O port access is implemented on Itanium by reading and writing to a special region in memory. To hide details and avoid misaligned memory accesses, a process did I/O port reads and writes by making a MD system call. There's one fatal problem with this approach: unprivileged access was not being prevented. /dev/io serves that purpose on amd64/i386, so employ it on ia64 as well. Use an ioctl for doing the actual I/O and remove the sysarch(2) interface. Backward compatibility is not being considered. The sysarch(2) approach was added to support X11, but support for FreeBSD/ia64 was never fully implemented in X11. Thus, nothing gets broken that didn't need more work to begin with. MFC after: 1 week
201269	30-Dec-2009	marcel	Revamp bus_space access functions: o Optimize for memory mapped I/O by making all I/O port acceses function calls and marking the test for the IA64_BUS_SPACE_IO tag with __predict_false(). Implement the I/O port access functions in a new file, called bus_machdep.c. o Change the bus_space_handle_t for memory mapped I/O to the virtual address rather than the physical address. This eliminates the PA->VA translation for every I/O access. The handle for I/O port access is still the port number. o Move inb(), outb(), inw(), outw(), inl(), outl(), and their string variants from cpufunc.h and define them in bus.h. On ia64 these are not CPU functions at all. In bus.h they are merely aliases for the new I/O port access functions defined in bus_machdep.h. o Handle the ACPI resource bug in nexus_set_resource(). There we can do it once so that we don't have to worry about it whenever we need to write to an I/O port that is really a memory mapped address. The upshot of this change is that the KBI is better defined and that I/O port access always involves a function call, allowing us to change the actual implementation without breaking the KBI. For memory mapped I/O the virtual address is abstracted, so that we can change the VA->PA mapping in the kernel without causing an KBI breakage. The exception at this time is for bus_space_map() and bus_space_unmap(). MFC after: 1 week.
201223	29-Dec-2009	rnoland	Update d_mmap() to accept vm_ooffset_t and vm_memattr_t. This replaces d_mmap() with the d_mmap2() implementation and also changes the type of offset to vm_ooffset_t. Purge d_mmap2(). All driver modules will need to be rebuilt since D_VERSION is also bumped. Reviewed by: jhb@ MFC after: Not in this lifetime...
201145	28-Dec-2009	antoine	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
200889	23-Dec-2009	marcel	Export the bus, cpu and itc frequencies under the hw.freq sysctl node. The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The frequencies are rounded to the nearest whole MHz. While here, rename and re-type bus_frequency, processor_frequency and itc_frequency to bus_freq, cpu_freq and itc_freq and make them static. As unsigned integers, the hw.freq.cpu sysctl can more easily be made generic (across all architectures) making porting easier. MFC after: 3 days
200240	08-Dec-2009	marcel	In exception_save, write-back ar.rnat after switching the backing- store. Writing to ar.bspstore is defined to leave ar.rnat undefined. PR: ia64/120315 MFC after: 3 days
200207	07-Dec-2009	marcel	Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id excluded, as it's used by MI code) and mode the sysctl variables from pcpu_stats to pcpu_md. Adjust all references accordingly. While nearby, change the PCPU sysctl tree so that they match the CPU device sysctl tree -- they are now children of a static node called "machdep.cpu" and are named only with their cpu ID.
200200	07-Dec-2009	marcel	Allocate the VHPT for each CPU in cpu_mp_start(), rather than allocating MAXCPU VHPTs up-front. This allows us to max-out MAXCPU without memory waste -- MAXCPU is now 32 for SMP kernels. This change also eliminates the VHPT scaling based in the total memory in the system. It's the workload that determines the best size of the VHPT. The workload can be affected by the amount of memory, but not necessarily. For example, there's no performance difference between VHPT sizes of 256KB, 512KB and 1MB when building the LINT kernel. This was observed with a system that has 8GB of memory. By default the kernel will allocate a 1MB VHPT. The user can tune the system with the "machdep.vhpt.log2size" tunable.
200051	03-Dec-2009	marcel	Make sure bus space accesses use unorder memory loads and stores. Memory accesses are posted in program order by virtue of the uncacheable memory attribute. Since GCC, by default, adds acquire and release semantics to volatile memory loads and stores, we need to use inline assembly to guarantee it. With inline assembly, we don't need volatile pointers anymore. Itanium does not support semaphore instructions to uncacheable memory.
199893	28-Nov-2009	marcel	Eliminate teh use of MAXCPU in static arrays of interrupt counters by adding statistics counters to the PCPU structure. Export the counters through sysctl by giving each PCPU structure its own sysctl context. While here, fix cnt.v_intr by not just having it count clock interrupts, but every interrupt and add more counters for each interrupt source.
199868	27-Nov-2009	alc	Simplify the invocation of vm_fault(). Specifically, eliminate the flag VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib
199727	24-Nov-2009	marcel	Improve upon revision 196196 by removing the newly added comment in the wrong place and instead add a KASSERT in the right place.
199574	20-Nov-2009	marcel	No need to include opt_kstack_pages.h, because KSTACK_PAGES is already defined through genassym.c
199566	20-Nov-2009	marcel	Add a seatbelt to the Nested TLB Fault handler to give us a chance to panic when we have an unexpected TLB fault while interrupt collection is disabled. Use a token rather than the actual address of the restart point to avoid the need for the movl instruction. The token is arbitrary. For the drummers: it's based on a single paradiddle.
199502	19-Nov-2009	marcel	opt_* headers are included using the quoted form.
199135	10-Nov-2009	kib	Extract the code that records syscall results in the frame into MD function cpu_set_syscall_retval(). Suggested by: marcel Reviewed by: marcel, davidxu PowerPC, ARM, ia64 changes: marcel Sparc64 tested and reviewed by: marius, also sunv reviewed MIPS tested by: gonzo MFC after: 1 month
198733	31-Oct-2009	marcel	Reimplement the lazy FP context switching: o Move all code into a single file for easier maintenance. o Use a single global lock to avoid having to handle either multiple locks or race conditions. o Make sure to disable the high FP registers after saving or dropping them. o use msleep() to wait for the other CPU to save the high FP registers. This change fixes the high FP inconsistency panics. A single global lock typically serializes too much, which may be noticable when a lot of threads use the high FP registers, but in that case it's probably better to switch the high FP context synchronuously. Put differently: cpu_switch() should switch the high FP registers if the incoming and outgoing threads both use the high FP registers.
198507	27-Oct-2009	kib	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month
198341	21-Oct-2009	marcel	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.
197729	03-Oct-2009	bz	Make sure that the primary native brandinfo always gets added first and the native ia32 compat as middle (before other things). o(ld)brandinfo as well as third party like linux, kfreebsd, etc. stays on SI_ORDER_ANY coming last. The reason for this is only to make sure that even in case we would overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo would still be there and the system would be operational. Reviewed by: kib MFC after: 1 month
196268	16-Aug-2009	marcel	Decouple ACPI CPU Ids from FreeBSD's cpuid. The ACPI Ids can be sparse, which causes a kernel assert. Approved by: re (kensmith)
196196	13-Aug-2009	attilio	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)
195840	24-Jul-2009	jhb	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
195625	11-Jul-2009	marcel	On exec(2), when loading the ELF image, pmap_enter_object() is called to prefault pages. This is an obvious place for making sure the I-cache is coherent. It was missing though. As such, execution over NFS and ZFS file systems was failing. NFS was fixed the wrong way (by flushing the D-cache as part of the NFS code) in a previous commit. ZFS problems were encountered after that and indicated that something else was wrong... Approved by: re (kib)
194784	23-Jun-2009	jeff	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
194524	20-Jun-2009	marcel	Drop the high FP state of an exiting thread in cpu_thread_exit() and not in cpu_exit(). The latter is called after td_md.md_highfp_mtx has been destroyed, which results in a race condition when another thread wants to use the high FP registers on the CPU that still has the high FP registers in question.
193530	05-Jun-2009	jkim	Import ACPICA 20090521.
193066	29-May-2009	jamie	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)
193018	29-May-2009	ed	Last minute TTY API change: remove mutex argument from tty_alloc(). I don't want people to override the mutex when allocating a TTY. It has to be there, to keep drivers like syscons happy. So I'm creating a tty_alloc_mutex() which can be used in those cases. tty_alloc_mutex() should eventually be removed. The advantage of this approach, is that we can just remove a function, without breaking the regular API in the future.
192918	27-May-2009	rink	ia64: Move MCA information retrieval to a per-CPU kthread Once AP's are launched, their MCA state information is stored and later obtainable using a sysctl. Since the size of the MCA state information is unknown, it will be malloc'ed as needed. However, when 'ia64_ap_startup' runs, it's not yet safe to call malloc and this may cause 'panic: blockable sleep lock (sleep mutex) 8192 @ /usr/src/sys/vm/uma_core.c'. This commit avoids this issue by scheduling a separate kthread to obtain this information, which immediately terminates afterwards.
192324	18-May-2009	marcel	Rename ia64_invalidate_icache() to ia64_sync_icache(). We're not invalidating anything.
192323	18-May-2009	marcel	Add cpu_flush_dcache() for use after non-DMA based I/O so that a possible future I-cache coherency operation can succeed. On ARM for example the L1 cache can be (is) virtually mapped, which means that any I/O that uses temporary mappings will not see the I-cache made coherent. On ia64 a similar behaviour has been observed. By flushing the D-cache, execution of binaries backed by md(4) and/or NFS work reliably. For Book-E (powerpc), execution over NFS exhibits SIGILL once in a while as well, though cpu_flush_dcache() hasn't been implemented yet. Doing an explicit D-cache flush as part of the non-DMA based I/O read operation eliminates the need to do it as part of the I-cache coherency operation itself and as such avoids pessimizing the DMA-based I/O read operations for which D-cache are already flushed/invalidated. It also allows future optimizations whereby the bcopy() followed by the D-cache flush can be integrated in a single operation, which could be implemented using on-chips DMA engines, by-passing the D-cache altogether.
191201	17-Apr-2009	jhb	Restore bus DMA bounce pages to an offset of 0 when they are released by a tag that has BUS_DMA_KEEP_PG_OFFSET set. Otherwise the page could be reused with a non-zero offset by a tag that doesn't have BUS_DMA_KEEP_PG_OFFSET leading to data corruption. Sleuthing by: avg Reviewed by: scottl
191011	13-Apr-2009	kib	The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the uio_td to extract pages from, instead of unconditionally use kernel pmap. Submitted by: Jason Harmening <jason.harmening gmail com> (amd64 version) PR: amd64/133592 Reviewed by: scottl (original patch), jhb MFC after: 2 weeks
190708	05-Apr-2009	dchagin	Fix KBI breakage by r190520 which affects older linux.ko binaries: 1) Move the new field (brand_note) to the end of the Brandinfo structure. 2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer is valid. 3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old modules won't have the flag set, so the new field brand_note would be ignored. Suggested by: jhb Reviewed by: jhb Approved by: kib (mentor) MFC after: 6 days
189771	13-Mar-2009	dchagin	Implement new way of branding ELF binaries by looking to a ".note.ABI-tag" section. The search order of a brand is changed, now first of all the ".note.ABI-tag" is looked through. Move code which fetch osreldate for ELF binary to check_note() handler. PR: 118473 Approved by: kib (mentor)
188453	10-Feb-2009	marcel	Mark the BSP as being awake. This supresses the message that not all usable CPUs could be woken up...
188350	08-Feb-2009	imp	When bouncing pages, allow a new option to preserve the intra-page offset. This is needed for the ehci hardware buffer rings that assume this behavior. This is an interim solution, and a more general one is being worked on. This solution doesn't break anything that doesn't ask for it directly. The mbuf and uio variants with this flag likely don't work and haven't been tested. Universe builds with these changes. I don't have a huge-memory machine to test these changes with, but will be happy to work with folks that do and hps if this changes turns out not to be sufficient. Submitted by: alfred@ from Hans Peter Selasky's original
188119	04-Feb-2009	jhb	Tweak the ia64 machine check handling code to not register new sysctl nodes while holding a spin mutex. Instead, it now shoves the machine check records onto a queue that is later drained to add sysctl nodes for each record. While a routine to drain the queue is present, it is not currently called. Reviewed by: marcel
187381	18-Jan-2009	alc	Correct an error in revision 1.170 of this file. When get_pv_entry() is forced to reclaim pv entries, the one pv entry that it returns should not be freed.
185169	22-Nov-2008	kib	Add sv_flags field to struct sysentvec with intention to provide description of the ABI of the currently executing image. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures to determine ABI features. Discussed with: dchagin, imp, jhb, peter
184205	23-Oct-2008	des	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months
184062	19-Oct-2008	marcel	Atomically increment the number of awoken APs as all APs will be unleashed here. Pointed out by: christian.kandeler@hob.de
183527	01-Oct-2008	peter	Collect N identical (or near identical) mkdumpheader() implementations into one, as threatened in the comment. Textdump magic can be passed in.
183439	28-Sep-2008	marius	Remove ipi_all() and ipi_self() as the former hasn't been used at all to date and the latter also is only used in ia64 and powerpc code which no longer serves a real purpose after bring-up and just can be removed as well. Note that architectures like sun4u also provide no means of implementing IPI'ing a CPU itself natively in the first place. Suggested by: jhb Reviewed by: arch, grehan, jhb
183397	27-Sep-2008	ed	Replace all calls to minor() with dev2unit(). After I removed all the unit2minor()/minor2unit() calls from the kernel yesterday, I realised calling minor() everywhere is quite confusing. Character devices now only have the ability to store a unit number, not a minor number. Remove the confusion by using dev2unit() everywhere. This commit could also be considered as a bug fix. A lot of drivers call minor(), while they should actually be calling dev2unit(). In -CURRENT this isn't a problem, but it turns out we never had any problem reports related to that issue in the past. I suspect not many people connect more than 256 pieces of the same hardware. Reviewed by: kib
183322	24-Sep-2008	kib	Change the static struct sysentvec and struct Elf_Brandinfo initializers to the C99 style. At least, it is easier to read sysent definitions that way, and search for the actual instances of sigcode etc. Explicitely initialize sysentvec.sv_maxssiz that was missed in most sysvecs. No objection from: jhb MFC after: 1 month
183299	23-Sep-2008	obrien	The kernel implemented 'memcmp' is an alias for 'bcmp'. However, memcmp and bcmp are not the same thing. 'man bcmp' states that the return is "non-zero" if the two byte strings are not identical. Where as, 'man memcmp' states that the return is the "difference between the first two differing bytes (treated as unsigned char values" if the two byte strings are not identical. So provide a proper memcmp(9), but it is a C implementation not a tuned assembly implementation. Therefore bcmp(9) should be preferred over memcmp(9).
181905	20-Aug-2008	ed	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
181803	17-Aug-2008	bz	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
180533	15-Jul-2008	alc	Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is specified when appropriate. Reviewed by: scottl
180354	07-Jul-2008	marcel	Add inline function ia64_fc_i() to abstract inline assembly. Use the new inline function in ia64_invalidate_icache(). While there, add proper synchronization so that we know the fc.i instructions have taken effect when we return.
179256	23-May-2008	marcel	Account for IPI_PREEMPT. We don't want to call sched_preempt() with interrupts disabled or with td_intr_nesting_level non-zero.
179229	23-May-2008	alc	The VM system no longer uses setPQL2(). Remove it and its helpers.
179190	22-May-2008	marcel	Create the bucket mutexes with MTX_NOWITNESS. There's now a hard limit of 512 pending mutexes in the witness code and we can easily have 1 million bucket mutexes initialized before witness is up and running. Bumping the limit from 512 to 1M is not really an option here...
179173	21-May-2008	marcel	We can call ia64_flush_dirty() when the corresponding process is locked or not. As such, use PROC_LOCKED() to determine which case it is and lock the process when not.
179081	18-May-2008	alc	Retire pmap_addr_hint(). It is no longer used.
178893	09-May-2008	alc	Add a stub for pmap_align_superpage() on machines that don't (yet) implement pmap-level support for superpages.
178494	25-Apr-2008	marcel	Unbreak previous commit. While here, refactor the code a bit.
178471	25-Apr-2008	jeff	- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted. Sponsored by: Nokia
178429	22-Apr-2008	phk	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.
178309	19-Apr-2008	marcel	Sanitize the malloc types: M_PMAP is not used in pmap.c, so don't define it there. Don't use M_PMAP in mp_machdep.c; define M_SMP instead.
178222	15-Apr-2008	marcel	Use genclock for RTC handling. This eliminates the MD versions for inittodr() and resettodr(). Have nexus double as the clock device, because it's the firmware that provides RTC services. We could create a special (pseudo-) device for it, but that wasn't superior enough to actually do it. Maybe later... Requested by: phk
178215	15-Apr-2008	marcel	Support and switch to the ULE scheduler: o Implement IPI_PREEMPT, o Set td_lock for the thread being switched out, o For ULE & SMP, loop while td_lock points to blocked_lock for the thread being switched in, o Enable ULE by default in GENERIC and SKI,
178206	14-Apr-2008	marcel	Revision 1.9 changes the delivery mode from the magic constant 0 (i.e. fixed delivery) to SAPIC_DELMODE_LOWPRI. While the commit log doesn't mention the change in behaviour, it is believed to be deliberate. In the last 5.5 years this hasn't been a problem. Nor do I think did it make any difference, but who knows. However, I do know that it break SMP support for Montecito-based machines. Switch back to fixed-CPU delivery so that SMP works again. This gives me some time to look more closely at the problem, as well as make sure the I-cache validation as it's implemented currently is sufficient in SMP configurations...
178131	11-Apr-2008	jeff	- Pass the irq and not the vector to intr_event_create(). Reviewed by: marcel
178092	11-Apr-2008	jeff	- Add the interrupt vector number to intr_event_create so MI code can lookup hard interrupt events by number. Ignore the irq# for soft intrs. - Add support to cpuset for binding hardware interrupts. This has the side effect of binding any ithread associated with the hard interrupt. As per restrictions imposed by MD code we can only bind interrupts to a single cpu presently. Interrupts can be 'unbound' by binding them to all cpus. Reviewed by: jhb Sponsored by: Nokia
178028	09-Apr-2008	marcel	Unbreak after removal of SI_SUB_MOUNT_ROOT.
177940	05-Apr-2008	jhb	Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt code. - Rename the intr_event 'eoi', 'disable', and 'enable' hooks to 'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric. Also, add a comment describe what the MI code expects them to do. - On amd64, i386, and powerpc this is effectively a NOP. - On arm, don't bother masking the interrupt unless the ithread is scheduled in the non-INTR_FILTER case to match what INTR_FILTER did. Also, don't bother unmasking the interrupt in the post_filter case if we never masked it. The INTR_FILTER case had been doing this by having arm_unmask_irq for the post_filter (formerly 'eoi') hook. - On ia64, stray interrupts are now masked for the non-INTR_FILTER case. They were already masked in the INTR_FILTER case. - On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for both the 'post_filter' and 'post_ithread' hooks to match what the non-INTR_FILTER code did. - On sun4v, retire the ithread wrapper hack by using an appropriate 'post_ithread' hook instead (it's what 'post_ithread'/'enable' was designed to do even in 5.x). Glanced at by: piso Reviewed by: marius Requested by: marius [1], [5] Tested on: amd64, i386, arm, sparc64
177769	30-Mar-2008	marcel	Better implement I-cache invalidation. The previous implementation was a kluge. This implementation matches the behaviour on powerpc and sparc64. While on the subject, make sure to invalidate the I-cache after loading a kernel module. MFC after: 2 weeks
177642	26-Mar-2008	phk	The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers. The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq() the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds. Drop entirely timer2 on pc98, it is not used anywhere at all. Move sysbeep() to kern/tty_cons.c and use the timer_spkr() if they exist, and do nothing otherwise. Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs. This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_() functions existing but the driver does not know this yet and still attaches to the ISA bus. Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine. In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important. Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz. This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].
177325	17-Mar-2008	jhb	Simplify the interrupt code a bit: - Always include the ie_disable and ie_eoi methods in 'struct intr_event' and collapse down to one intr_event_create() routine. The disable and eoi hooks simply aren't used currently in the !INTR_FILTER case. - Expand 'disab' to 'disable' in a few places. - Use function casts for arm and i386:intr_eoi_src() instead of wrapper routines since to trim one extra indirection. Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER} Tested on: {amd64,i386} x {FILTER, !FILTER}
177253	16-Mar-2008	rwatson	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
177181	14-Mar-2008	jhb	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}
177157	13-Mar-2008	jhb	Rework how the nexus(4) device works on x86 to better handle the idea of different "platforms" on x86 machines. The existing code already handles having two platforms: ACPI and legacy. However, the existing approach was rather hardcoded and difficult to extend. These changes take the approach that each x86 hardware platform should provide its own nexus(4) driver (it can inherit most of its behavior from the default legacy nexus(4) driver) which is responsible for probing for the platform and performing appropriate platform-specific setup during attach (such as adding a platform-specific bus device). This does mean changing the x86 platform busses to no longer use an identify routine for probing, but to move that logic into their matching nexus(4) driver instead. - Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it can be overriden. - Expose a nexus_init_resources() routine which initializes the various resource managers so that subclassed nexus(4) drivers can invoke it from their attach routine. - The legacy nexus(4) driver explicitly adds a legacy0 device in its attach routine. - The ACPI driver no longer contains an new-bus identify method. Instead it exposes a public function (acpi_identify()) which is a probe routine that the MD nexus(4) drivers can use to probe for ACPI. All of the probe logic in acpi_probe() is now moved into acpi_identify() and acpi_probe() is just a stub. - On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via acpi_identify() and claims the nexus0 device if the probe succeeds. It then explicitly adds an acpi0 device in its attach routine. - The legacy(4) driver no longer knows anything about the acpi0 device. - On ia64 if acpi_identify() fails you basically end up with no devices. This matches the previous behavior where the old acpi_identify() would fail to add an acpi0 device again leaving you with no devices. Discussed with: imp Silence on: arch@
177126	12-Mar-2008	jeff	- Fix build breakage; there was a reference to a removed syscall in a KASSERT(). Attempt to cleanup the comment to reflect reality.
177091	12-Mar-2008	jeff	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
176734	02-Mar-2008	jeff	- Remove the old smp cpu topology specification with a new, more flexible tree structure that encodes the level of cache sharing and other properties. - Provide several convenience functions for creating one and two level cpu trees as well as a default flat topology. The system now always has some topology. - On i386 and amd64 create a seperate level in the hierarchy for HTT and multi-core cpus. This will allow the scheduler to intelligently load balance non-uniform cores. Presently we don't detect what level of the cache hierarchy is shared at each level in the topology. - Add a mechanism for testing common topologies that have more information than the MD code is able to provide via the kern.smp.topology tunable. This should be considered a debugging tool only and not a stable api. Sponsored by: Nokia
176286	14-Feb-2008	marcel	On Montecito processors, the instruction cache is in fact not coherent with the data caches. Implement a quick fix to allow us to boot on Montecito, while I'm working on a better fix in the mean time. Commit made on Montecito-based Itanium...
175959	04-Feb-2008	marcel	Allocate a stack for thread0 and switch to it before calling mi_startup(). This frees up kstack for static PAL/SAL calls and double-fault handling.
175768	28-Jan-2008	ru	Add a wrapper function that bound checks writes to the dump device.
175067	03-Jan-2008	alc	Add an access type parameter to pmap_enter(). It will be used to implement superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.
175066	03-Jan-2008	imp	Use correct function name in panic message
175065	03-Jan-2008	imp	Fix obsolete comment. pmap_remove_all is the function we're in.
174898	25-Dec-2007	rwatson	Add a new 'why' argument to kdb_enter(), and a set of constants to use for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.
174195	02-Dec-2007	rwatson	Break out stack(9) from ddb(4): - Introduce per-architecture stack_machdep.c to hold stack_save(9). - Introduce per-architecture machine/stack.h to capture any common definitions required between db_trace.c and stack_machdep.c. - Add new kernel option "options STACK"; we will build in stack(9) if it is defined, or also if "options DDB" is defined to provide compatibility with existing users of stack(9). Add new stack_save_td(9) function, which allows the capture of a stacktrace of another thread rather than the current thread, which the existing stack_save(9) was limited to. It requires that the thread be neither swapped out nor running, which is the responsibility of the consumer to enforce. Update stack(9) man page. Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)
173988	27-Nov-2007	jhb	Remove the 'needbounce' variable from the _bus_dmamap_load_buffer() routine. It is not needed as the existing tests for segment coalescing already handle bounced addresses and it prevents legal segment coalescing in certain edge cases. MFC after: 1 week Reviewed by: scottl
173799	21-Nov-2007	scottl	Extend critical section coverage in the low-level interrupt handlers to include the ithread scheduling step. Without this, a preemption might occur in between the interrupt getting masked and the ithread getting scheduled. Since the interrupt handler runs in the context of curthread, the scheudler might see it as having a such a low priority on a busy system that it doesn't get to run for a _long_ time, leaving the interrupt stranded in a disabled state. The only way that the preemption can happen is by a fast/filter handler triggering a schduling event earlier in the handler, so this problem can only happen for cases where an interrupt is being shared by both a fast/filter handler and an ithread handler. Unfortunately, it seems to be common for this sharing to happen with network and USB devices, for example. This fixes many of the mysterious TCP session timeouts and NIC watchdogs that were being reported. Many thanks to Sam Lefler for getting to the bottom of this problem. Reviewed by: jhb, jeff, silby
173708	17-Nov-2007	alc	Prevent the leakage of wired pages in the following circumstances: First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated. Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the pages beyond the EOF are unmapped and freed. However, when the file is mlock(2)ed, the pages beyond the EOF are unmapped but not freed because they have a non-zero wire count. This can be a mistake. Specifically, it is a mistake if the sole reason why the pages are wired is because of wired, managed mappings. Previously, unmapping the pages destroys these wired, managed mappings, but does not reduce the pages' wire count. Consequently, when the file is unmapped, the pages are not unwired because the wired mapping has been destroyed. Moreover, when the vm object is finally destroyed, the pages are leaked because they are still wired. The fix is to reduce the pages' wired count by the number of wired, managed mappings destroyed. To do this, I introduce a new pmap function pmap_page_wired_mappings() that returns the number of managed mappings to the given physical page that are wired, and I use this function in vm_object_page_remove(). Reviewed by: tegge MFC after: 6 weeks
173615	14-Nov-2007	marcel	o Rename cpu_thread_setup() to cpu_thread_alloc() to better communicate that it relates to (is called by) thread_alloc() o Add cpu_thread_free() which is called from thread_free() to counter-act cpu_thread_alloc(). i386: Have cpu_thread_free() call cpu_thread_clean() to preserve behaviour. ia64: Have cpu_thread_free() call mtx_destroy() for the mutex initialized in cpu_thread_alloc(). PR: ia64/118024
173600	14-Nov-2007	julian	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
173361	05-Nov-2007	kib	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
172692	16-Oct-2007	marcel	Set PTE_ACCESSED in the PTE and before inserting it in the VHPT. This avoids back-to-back faults for all TLB misses. This can be improved further in the future by also setting PTE_DIRTY for TLB misses for write accesses. MFC after: 1 week
172691	16-Oct-2007	marcel	The flushrs instruction must be the first in an instruction group. GNU as(1) already made sure of that, but it's better to actually have the code right. MFC after: 1 week
172690	16-Oct-2007	marcel	Print instruction stops to improve analysis of dependency violations. MFC after: 1 week
172189	15-Sep-2007	alc	It has been observed on the mailing lists that the different categories of pages don't sum to anywhere near the total number of pages on amd64. This is for the most part because uma_small_alloc() pages have never been counted as wired pages, like their kmem_malloc() brethren. They should be. This changes fixes that. It is no longer necessary for the page queues lock to be held to free pages allocated by uma_small_alloc(). I removed the acquisition and release of the page queues lock from uma_small_free() on amd64 and ia64 weeks ago. This patch updates the other architectures that have uma_small_alloc() and uma_small_free(). Approved by: re (kensmith)
171740	06-Aug-2007	marcel	Clear pending interrupts before we enable external interrupts. Recently the AP in my Merced box seems to have grown a habit of getting unexpected interrupts, such as redundant wake-ups and legacy interrupts that require an INTA cycle. While here, replace DELAY(0) with cpu_spinwait() so that it's clear what we're doing as well as enable the code to take advantage of cpu_spinwait() when it gets implemented. Approved by: re (blanket)
171739	06-Aug-2007	marcel	Keep interrupts disabled while handling external interrupts. There's no advantage in allowing nested external interrupts. In fact, it leads to a potential stack overrun. While here, put the interrupt vector in the trapframe, so as to compensate for the 36 cycle latency of reading cr.ivr. Further simplify assembly code by dealing with ASTs from C. Approved by: re (blanket)
171722	04-Aug-2007	marcel	Replace "__asm __volatile()" by equivalent support functions from ia64_cpu.h. This improves readability and consistency and aids in auditing the code. Add instruction-serialization after writing to cr.pta. Delay enabling interrupts until after we setup the clocks and after we program the task priority register. Approved by: re (blanket)
171721	04-Aug-2007	marcel	Replace "__asm __volatile()" by equivalent support functions from ia64_cpu.h. This improves readability and consistency and aids in auditing the code. Add data-serialization after writing to the region registers and add instruction-serialization after writing to cr.pta. Approved by: re (blanket)
171720	04-Aug-2007	marcel	Replace "__asm __volatile()" by equivalent support functions from ia64_cpu.h. This improves readability and consistency and aids in auditing the code. Add data-serialization after writing to cr.tpr. Approved by: re (blanket)
171719	04-Aug-2007	marcel	Add required data-serialization after writing to cr.itm and cr.itv. Approved by: re (blanket)
171666	30-Jul-2007	marcel	o Switch to physical addressing before dereferencing the VHPT bucket pointer. The virtual mapping may not be present in the translation cache. This will result in a nested TLB fault at a place we don't handle (and don't want to handle). o Make sure there's a stop after the rfi instruction, otherwise its behaviour is undefined. o Make sure we switch back to virtual addressing before doing a rfi. Behaviour is undefined otherwise. Approved by: re (blanket)
171665	30-Jul-2007	marcel	Add option EXCEPTION_TRACING, which enables KTR-like functionality for processor interruptions. This is especially useful to track unexpected nested TLB faults. Approved by: re (blanket)
171664	30-Jul-2007	marcel	Rework the interrupt code and add support for interrupt filtering (INTR_FILTER). This includes: o Save a pointer to the sapic structure and IRQ for every vector, so that we can quickly EOI, mask and unmask the interrupt. o Add locking to the sapic code now that we can reprogram a sapic on multiple CPUs at the same time. o Use u_int for the vector and IRQ. We only have 256 vectors, so using a 64-bit type for it is rather excessive. o Properly handle concurrent registration of a handler for the same vector. Since vectors have a corresponding priority, we should not map IRQs to vectors in a linear fashion, but rather pick a vector that has a priority in line with the interrupt type. This is left for later. The vector/IRQ interchange has been untangled as much as possible to make this easier. Approved by: re (blacket)
171663	30-Jul-2007	marcel	Explicitly map the VHPT on all processors. Previously we were merely lucky that the VHPT was mapped as a side-effect of mapping the kernel, but when there's enough physical memory, this may not at all be the case. Approved by: re (blanket)
171553	23-Jul-2007	dwmalone	If clock_ct_to_ts fails to convert time time from the real time clock, print a one line error message. Add some comments on not being able to trust the day of week field (I'll act on these comments in a follow up commit). Approved by: re MFC after: 3 weeks
171463	16-Jul-2007	marcel	Restore the value of ar.rnat after the assignment to ar.bspstore. The SDM states that writing to ar.bspstore invalidates the ar.rnat register as a side-effect. This was interpreted as "bits in the ar.rnat register that correspond to registers whose value is on the stack are undefined'. Since we keep the kernel stack NaT- aligned with the user stack (i.e. the lower 9 bits of the backing store pointer remain unchanged when we switch to the kernel stack) bits that need preserving would be preserved. That interpretation is questionable. So, now, the interpretation is more absolute: ar.rnat is undefined after writing to ar.bspstore. As such, we write the saved value of ar.rnat back to ar.rnat after writing to ar.bspstore. Discussed with: christian.kandeler@hob.de Approved by: re (kensmith)
170519	10-Jun-2007	alc	Add the machine-specific definitions for configuring the new physical memory allocator. Set the size of phys_avail[] using one of these definitions. Approved by: re
170507	10-Jun-2007	marcel	Work around a firmware bug in the HP rx2660, where in ACPI an I/O port is really a memory mapped I/O address. The bug is in the GAS that describes the address and in particular the SpaceId field. The field should not say the address is an I/O port when it clearly is not. With an additional check for the IA64_BUS_SPACE_IO case in the bus access functions, and the fact that I/O ports pretty much not used in general on ia64, make the calculation of the I/O port address a function. This avoids inlining the work-around into every driver, and also helps reduce overall code bloat.
170474	09-Jun-2007	marcel	Synchronize the instruction cache after writing to memory. This is needed for breakpoints to work.
170444	09-Jun-2007	marcel	Physical memory regions can be larger than INT_MAX. Change size1 from an int to a long to avoid printing negative byte and page counts.
170403	07-Jun-2007	marcel	Remove remaining references to pc_curtid missed in previous commit.
170402	07-Jun-2007	marcel	Eliminate pmap_install(), which was used to wrap pmap_switch() and grab sched_lock. This would serialize calls to pmap_switch from cpu_switch(). With the introduction of thread_lock, this is not possible anymore, because thread_lock is not a single lock. It varies. Secondly and most importantly, it's not needed at all. The only requirement for pmap_switch() is that it's not preempted while in the middle of updating the CPU and PCPU. In other words, it's a critical region. No locking required.
170390	07-Jun-2007	davidxu	Fix compiling error.
170359	06-Jun-2007	marcel	Include <sys/sched.h> for sched_throw().
170307	05-Jun-2007	jeff	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
170306	04-Jun-2007	jeff	Commit 13/14 of sched_lock decomposition. - Add a new parameter to cpu_switch() that is used to release the lock on the outgoing thread and properly acquire the lock on the incoming thread. This parameter is not required for schedulers that don't do per-cpu locking and architectures which do not support it may continue to use the 4BSD scheduler. This feature is presently not supported on ia64 Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
170305	04-Jun-2007	jeff	- Change comments and asserts to reflect the removal of the global scheduler lock. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
170303	04-Jun-2007	jeff	Commit 10/14 of sched_lock decomposition. - Use sched_throw() rather than replicating the same cpu_throw() code for each architecture. This also allows the scheduler to use any locking it may want to. - Use the thread_lock() rather than sched_lock when preempting. - The scheduler lock is not required to synchronize release_aps. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
170291	04-Jun-2007	attilio	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)
170170	31-May-2007	attilio	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
170162	31-May-2007	piso	In some particular cases (like in pccard and pccbb), the real device handler is wrapped in a couple of functions - a filter wrapper and an ithread wrapper. In this case (and just in this case), the filter wrapper could ask the system to schedule the ithread and mask the interrupt source if the wrapped handler is composed of just an ithread handler: modify the "old" interrupt code to make it support this situation, while the "new" interrupt code is already ok. Discussed with: jhb
170086	29-May-2007	yongari	Honor maxsegsz of less than a page size in a DMA tag. Previously it used to return PAGE_SIZE without respect to restrictions of a DMA tag. This affected all of the busdma load functions that use _bus_dmamap_loader_buffer() as their back-end. Reviewed by: scottl
170026	27-May-2007	marcel	Have the processor defer all faults and exceptions for control speculative loads. This at least makes control speculative loads work. In the future we should analyze which faults/exceptions we want to handle rather than defer to avoid having to call the recovery code when it's not strictly necessary.
169846	22-May-2007	kan	Allow FreeBSD's native ELF image activators to execute shared libraries the same way it was enabled for Linux binares in linuxulator. This allows binaries built with -pie. Many ports auto-detect -fPIE support in GCC 4.2 and build binaries FreeBSD was unable to run.
169814	21-May-2007	marcel	When speculation fails (as determined by the chk instruction) the processor is to jump to recovery code. This branching behaviour may not be implemented by the processor and a Speculative Operation fault is raised. The OS is responsible to emulate the branch. Implement this, because GCC 4.2 uses advanced loads regularly.
169773	19-May-2007	marcel	Fix GCC warning: va = va += PAGE_SIZE contains pointless operation va = va. Fix white space in nearby lines.
169760	19-May-2007	marcel	Add a level of indirection to the kernel PTE table. The old scheme allowed for 1024 PTE pages, each containing 256 PTEs. This yielded 2GB of KVA. This is not enough to boot a kernel on a 16GB box and in general too low for a 64-bit machine. By adding a level of indirection we now have 1024 2nd-level directory pages, each capable of supporting 2GB of KVA. This brings the grand total to 2TB of KVA.
169757	19-May-2007	marcel	Account for the fact that contigmalloc(9) can return a NULL pointer. Fix the flags argument: M_WAITOK is not a valid flag. Its presence leaves the indication that contigmalloc(9) will not return a NULL pointer. The use of contigmalloc(9) in this place is probably not a good idea given the constraints. It's probably better to lift the constraints and instead add a permanent mapping to the ITR. It's possible that the first 256MB of memory is exhausted when we get here. This fixes a kernel panic on a 16GB rx3600.
169667	18-May-2007	jeff	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
169291	05-May-2007	alc	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194
167767	21-Mar-2007	jhb	Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and handles when activating a resource via bus_activate_resource() rather than doing some of the work in bus_alloc_resource() and some of it in bus_activate_resource(). One note is that when using isa_alloc_resourcev() on PC-98, drivers now need to just use bus_release_resource() without explicitly calling bus_deactivate_resource() first. nyan@ has already fixed all of the PC-98 drivers.
167352	09-Mar-2007	mohans	Over NFS, an open() call could result in multiple over-the-wire GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.
167277	06-Mar-2007	scottl	Don't increment total_bounced when doing no-op dmamap_sync ops.
166901	23-Feb-2007	piso	o break newbus api: add a new argument of type driver_filter_t to bus_setup_intr() o add an int return code to all fast handlers o retire INTR_FAST/IH_FAST For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current Reviewed by: many Approved by: re@
166860	21-Feb-2007	alc	Change pmap_protect() so that execute access can be removed without simultaneously removing write access.
166810	18-Feb-2007	alc	Eliminate some acquisitions and releases of the page queues lock that are no longer necessary.
166631	11-Feb-2007	marcel	Now that the free page queue mutex is a sleep mutex, we cannot call vm_page_alloc() from within a critical section in pmap_growkernel(). Since the need for a critical section may never have existed in the first place, simply get rid of it. Discussed with: alc@
165369	20-Dec-2006	davidxu	Add a lwpid field into per-cpu structure, the lwpid represents current running thread's id on each cpu. This allow us to add in-kernel adaptive spin for user level mutex. While spinning in user space is possible, without correct thread running state exported from kernel, it hardly can be implemented efficiently without wasting cpu cycles, however exporting thread running state unlikely will be implemented soon as it has to design and stablize interfaces. This implementation is transparent to user space, it can be disabled dynamically. With this change, mutex ping-pong program's performance is improved massively on SMP machine. performance of mysql super-smack select benchmark is increased about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems which have bunch of cpus and system-call overhead is low (athlon64, opteron, and core-2 are known to be fast), the adaptive spin does help performance. Added sysctls: kern.threads.umtx_dflt_spins if the sysctl value is non-zero, a zero umutex.m_spincount will cause the sysctl value to be used a spin cycle count. kern.threads.umtx_max_spins the sysctl sets upper limit of spin cycle count. Tested on: Athlon64 X2 3800+, Dual Xeon 5130
164936	06-Dec-2006	julian	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
164395	18-Nov-2006	marcel	Since printf also has at least one critical section, we need to initialize pc_curthread. While here, rename early_pcpu to pcpu0 to be conistent (compare thread0 and proc0).
164392	18-Nov-2006	marcel	Now that printf() needs the PCPU, set it up before we call printf(). Change the pc_pcb field from a pointer to struct pcb to struct pcb so that sizeof(struct pcb) includes the PCB we use for IPI_STOP. Statically declare early_pcb so that we don't have to allocate the PCB for thread0. This way we can setup the PCPU before cninit() and thus before we use printf().
164391	18-Nov-2006	marcel	Revert previous commit. PC_CONS_BUFR is not used nor needed by assembly.
164229	12-Nov-2006	alc	Make pmap_enter() responsible for setting PG_WRITEABLE instead of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)
164049	06-Nov-2006	rwatson	Add missing includes of priv.h.
164033	06-Nov-2006	rwatson	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
163928	03-Nov-2006	marcel	Make sure kern_envp is never NULL. If we don't get a pointer to the environment from the loader, use the static environment.
163858	01-Nov-2006	jb	Add a cnputs() function to write a string to the console with a lock to prevent interspersed strings written from different CPUs at the same time. To avoid putting a buffer on the stack or having to malloc one, space is incorporated in the per-cpu structure. The buffer size if 128 bytes; chosen because it's the next power of 2 size up from 80 characters. String writes to the console are buffered up the end of the line or until the buffer fills. Then the buffer is flushed to all console devices. Existing low level console output via cnputc() is unaffected by this change. ithread calls to log() are also unaffected to avoid blocking those threads. A minor change to the behaviour in a panic situation is that console output will still be buffered, but won't be written to a tty as before. This should prevent interspersed panic output as a number of CPUs panic before we end up single threaded running ddb. Reviewed by: scottl, jhb MFC after: 2 weeks
163709	26-Oct-2006	jb	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
163619	23-Oct-2006	marcel	o Eliminate nexus_print_resources(). Use resource_list_print_type() instead. o Eliminate nexus_print_all_resources(). Inline the function body in nexus_print_child().
163603	22-Oct-2006	alc	Eliminate unnecessary PG_BUSY tests.
163492	19-Oct-2006	marcel	Fix previous revision: o day and mday are the same. No need to subtract 1 from mday. o Set dow to -1 as clock_ct_to_ts() checks this field and returns EINVAL on any day of the week but Sunday.
163449	17-Oct-2006	davidxu	o Add keyword volatile for user mutex owner field. o Fix type consistent problem by using type long for old umtx and wait channel. o Rename casuptr to casuword.
163386	15-Oct-2006	hrs	Add a newline to the printf(). Spotted by: Peter Carah <pete@altadena.net> MFC after: 3 days
162966	02-Oct-2006	phk	Use calendrical calculations from subr_clock.c instead of home-rolled.
162958	02-Oct-2006	phk	Second part of a little cleanup in the calendar/timezone/RTC handling. Split subr_clock.c in two parts (by repo-copy): subr_clock.c contains generic RTC and calendaric stuff. etc. subr_rtc.c contains the newbus'ified RTC interface. Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock} sysctls and associated variables into subr_clock.c. They are not machine dependent and we have generic code that relies on being present so they are not even optional.
162954	02-Oct-2006	phk	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.
162361	16-Sep-2006	rwatson	Add audit hooks for ppc, ia64 system call paths. Reviewed by: marcel (ia64) Obtained from: TrustedBSD Project MFC after: 3 days
161675	28-Aug-2006	davidxu	Implement casuword32, compare and set user integer, thank Marcel Moolenarr who wrote the IA64 version of casuword32.
160889	01-Aug-2006	alc	Complete the transition from pmap_page_protect() to pmap_remove_write(). Originally, I had adopted sparc64's name, pmap_clear_write(), for the function that is now pmap_remove_write(). However, this function is more like pmap_remove_all() than like pmap_clear_modify() or pmap_clear_reference(), hence, the name change. The higher-level rationale behind this change is described in src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm trying to clean up and fix our support for execute access. Reviewed by: marcel@ (ia64)
160801	28-Jul-2006	jhb	Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is now back to just being an argument count.
160798	28-Jul-2006	jhb	Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to mark system calls as being MPSAFE: - Stop conditionally acquiring Giant around system call invocations. - Remove all of the 'M' prefixes from the master system call files. - Remove support for the 'M' prefix from the script that generates the syscall-related files from the master system call files. - Don't explicitly set SYF_MPSAFE when registering nfssvc.
160773	27-Jul-2006	jhb	Unify the checking for lock misbehavior in the various syscall() implementations and adjust some of the checks while I'm here: - Add a new check to make sure we don't return from a syscall in a critical section. - Add a new explicit check before userret() to make sure we don't return with any locks held. The advantage here is that we can include the syscall number and name in syscall() whereas that info is not available in userret(). - Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by the more general checks just added. MFC after: 2 weeks
160770	27-Jul-2006	jhb	Add KTR_SYSC tracing to the syscall() implementations that didn't have it yet. MFC after: 1 week
160312	12-Jul-2006	jhb	Simplify the pager support in DDB. Allowing different db commands to install custom pager functions didn't actually happen in practice (they all just used the simple pager and passed in a local quit pointer). So, just hardcode the simple pager as the only pager and make it set a global db_pager_quit flag that db commands can check when the user hits 'q' (or a suitable variant) at the pager prompt. Also, now that it's easy to do so, enable paging by default for all ddb commands. Any command that wishes to honor the quit flag can do so by checking db_pager_quit. Note that the pager can also be effectively disabled by setting $lines to 0. Other fixes: - 'show idt' on i386 and pc98 now actually checks the quit flag and terminates early. - 'show intr' now actually checks the quit flag and terminates early.
160040	29-Jun-2006	marcel	Partial support for branch long emulation. This only emulates the branch long jump and not the branch long call. Support for that is forthcoming.
159971	27-Jun-2006	alc	Make several changes to pmap_enter_quick_locked(): 1. Make the caller responsible for performing pmap_install(). This reduces the number of times that pmap_install() is performed by pmap_enter_object() from twice per page to twice overall. 2. Don't block if pmap_find_pte() is unable to allocate a PTE. If it did block, then it might wind up mapping a cache page. Specifically, if pmap_enter_quick_locked() slept when called from pmap_enter_object(), the page daemon could change an active or inactive page into a cache page just before it was to be mapped. 3. Bail out of pmap_enter_quick_locked() if pv entries aren't plentiful. In other words, don't force the allocation of a pv entry if they aren't readily available. Reviewed by: marcel@
159850	22-Jun-2006	marcel	Identify the cual-core Montecito. MFC after: 3 days
159627	15-Jun-2006	ups	Remove mpte optimization from pmap_enter_quick(). There is a race with the current locking scheme and removing it should have no measurable performance impact. This fixes page faults leading to panics in pmap_enter_quick_locked() on amd64/i386. Reviewed by: alc,jhb,peter,ps
159303	05-Jun-2006	alc	Introduce the function pmap_enter_object(). It maps a sequence of resident pages from the same object. Use it in vm_map_pmap_enter() to reduce the locking overhead of premapping objects. Reviewed by: tegge@
159148	01-Jun-2006	alc	Correct a syntax error in the previous revision.
159130	01-Jun-2006	silby	After much discussion with mjacob and scottl, change bus_dmamem_alloc so that it just warns the user with a printf when it misaligns a piece of memory that was requested through a busdma tag. Some drivers (such as mpt, and probably others) were asking for alignments that could not be satisfied, but as far as driver operation was concerned, that did not matter. In the theory that other drivers will fall into this same category, we agreed that panicing or making the allocation fail will cause more hardship than is necessary. The printf should be sufficient motivation to get the driver glitch fixed.
159093	31-May-2006	mjacob	Since it's to all intents and purposes identical code to amd64 && i386, match the recent changes to bus_dmamem_alloc here.
158984	27-May-2006	marcel	Unbreak after previous commit. While here, improve function naming consistency by s/ssc/ssc_/g.
158964	26-May-2006	phk	Update to new console api.
158651	16-May-2006	phk	Since DELAY() was moved, most <machine/clock.h> #includes have been unnecessary.
158450	11-May-2006	phk	Remove more straggling CPU_ macro references
157941	21-Apr-2006	marcel	In nexus_teardown_intr(), actually remove the handler. MFC after: 1 day
157894	20-Apr-2006	imp	Set the rid of the resource obtained from rman_reserve_resource.
157680	12-Apr-2006	alc	Retire pmap_track_modified(). We no longer need it because we do not create managed mappings within the clean submap. To prevent regressions, add assertions blocking the creation of managed mappings within the clean submap. Reviewed by: tegge
157449	03-Apr-2006	marcel	Improve handling of IPI_STOP: o use atomic operations to fiddle with stopped_cpus and started_cpus. o disable interrupts while we're waiting to be started. o remove logic relating to cpustop_restartfunc as it's not used.
157443	03-Apr-2006	peter	Remove the unused sva and eva arguments from pmap_remove_pages().
155922	22-Feb-2006	jhb	Close some races between procfs/ptrace and exit(2): - Reorder the events in exit(2) slightly so that we trigger the S_EXIT stop event earlier. After we have signalled that, we set P_WEXIT and then wait for any processes with a hold on the vmspace via PHOLD to release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops to zero. - Change proc_rwmem() to require that the processing read from has its vmspace held via PHOLD by the caller and get rid of all the junk to screw around with the vmspace reference count as we no longer need it. - In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it doesn't exist. - Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem() to clear an earlier single-step simualted via a breakpoint). We only do one to avoid races. Also, by making the EINVAL error for unknown requests be part of the default: case in the switch, the various switch cases can now just break out to return which removes a _lot_ of duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug where a LWP ptrace command could return EINVAL with the proc lock still held. - Changed the locking for ptrace_single_step(), ptrace_set_pc(), and ptrace_clear_single_step() to always be called with the proc lock held (it was a mixed bag previously). Alpha and arm have to drop the lock while the mess around with breakpoints, but other archs avoid extra lock release/acquires in ptrace(). I did have to fix a couple of other consumers in kern_kse and a few other places to hold the proc lock and PHOLD. Tested by: ps (1 mostly, but some bits of 2-4 as well) MFC after: 1 week
155680	14-Feb-2006	jhb	Fix the hw.realmem sysctl. The global realmem variable is a count of pages, not a count of bytes. The sysctl handler for hw.realmem already uses ctob() to convert realmem from pages to bytes. Thus, on archs that were storing a byte count in the realmem variable, hw.realmem was inflated. Reported by: Valerio daelli valerio dot daelli at gmail dot com (alpha) MFC after: 3 days
155553	11-Feb-2006	marcel	Correct the spinlock nesting of the idle thread of the APs before we save the MCA state of the AP. Saving the MCA state of the AP requires us to allocate memory, which uses sleep locks. Now that we correct the spinlock nesting of the AP without having schedlock, avoid calling spinlock_exit(). Instead call critical_exit() and manually clear the MD spinlock count. MFC after: 3 days
155455	08-Feb-2006	phk	Simplify system time accounting for profiling. Rename struct thread's td_sticks to td_pticks, we will need the other name for more appropriately named use shortly. Reduce it from uint64_t to u_int. Clear td_pticks whenever we enter the kernel instead of recording its value as reference for userret(). Use the absolute value of td->pticks in userret() and eliminate third argument.
155444	07-Feb-2006	phk	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
155410	07-Feb-2006	marcel	Allocate memory for the MCA state information with M_NOWAIT. We can get a MCA event at any moment and it may not be safe to sleep. MFC after: 3 days
154491	17-Jan-2006	marcel	s/R_IA64_/R_IA_64_/g as per the ia64 psABI.
154017	04-Jan-2006	phk	Use ttyalloc() instead of ttymalloc()
153940	31-Dec-2005	netchild	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)
153741	26-Dec-2005	sobomax	Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually allow executing elf dynamic binaries (aka shared libraries). When it is requested to execute ET_DYN elf image check if this flag is on after we know the elf brand allowing execution if so. PR: kern/87615 Submitted by: Marcin Koziej <creep@desk.pl>
153666	22-Dec-2005	jhb	Tweak how the MD code calls the fooclock() methods some. Instead of passing a pointer to an opaque clockframe structure and requiring the MD code to supply CLKF_FOO() macros to extract needed values out of the opaque structure, just pass the needed values directly. In practice this means passing the pair (usermode, pc) to hardclock() and profclock() and passing the boolean (usermode) to hardclock_cpu() and hardclock_process(). Other details: - Axe clockframe and CLKF_FOO() macros on all architectures. Basically, all the archs were taking a trapframe and converting it into a clockframe one way or another. Now they can just extract the PC and usermode values directly out of the trapframe and pass it to fooclock(). - Renamed hardclock_process() to hardclock_cpu() as the latter is more accurate. - On Alpha, we now run profclock() at hz (profhz == hz) rather than at the slower stathz. - On Alpha, for the TurboLaser machines that don't have an 8254 timecounter, call hardclock() directly. This removes an extra conditional check from every clock interrupt on Alpha on the BSP. There is probably room for even further pruning here by changing Alpha to use the simplified timecounter we use on x86 with the lapic timer since we don't get interrupts from the 8254 on Alpha anyway. - On x86, clkintr() shouldn't ever be called now unless using_lapic_timer is false, so add a KASSERT() to that affect and remove a condition to slightly optimize the non-lapic case. - Change prototypeof arm_handler_execute() so that it's first arg is a trapframe pointer rather than a void pointer for clarity. - Use KCOUNT macro in profclock() to lookup the kernel profiling bucket. Tested on: alpha, amd64, arm, i386, ia64, sparc64 Reviewed by: bde (mostly)
153504	18-Dec-2005	marcel	Make our ELF64 type definitions match standards. In particular this means: o Remove Elf64_Quarter, o Redefine Elf64_Half to be 16-bit, o Redefine Elf64_Word to be 32-bit, o Add Elf64_Xword and Elf64_Sxword for 64-bit entities, o Use Elf_Size in MI code to abstract the difference between Elf32_Word and Elf64_Word. o Add Elf_Ssize as the signed counterpart of Elf_Size. MFC after: 2 weeks
153165	06-Dec-2005	ru	Fix -Wundef warnings from compiling GENERIC and LINT kernels of all architectures.
152630	20-Nov-2005	alc	Eliminate pmap_init2(). It's no longer used.
152359	13-Nov-2005	alc	In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock cannot possibly occur.
152224	09-Nov-2005	alc	Reimplement the reclamation of PV entries. Specifically, perform reclamation synchronously from get_pv_entry() instead of asynchronously as part of the page daemon. Additionally, limit the reclamation to inactive pages unless allocation from the PV entry zone or reclamation from the inactive queue fails. Previously, reclamation destroyed mappings to both inactive and active pages. get_pv_entry() still, however, wakes up the page daemon when reclamation occurs. The reason being that the page daemon may move some pages from the active queue to the inactive queue, making some new pages available to future reclamations. Print the "reclaiming PV entries" message at most once per minute, but don't stop printing it after the fifth time. This way, we do not give the impression that the problem has gone away. Reviewed by: tegge
152042	04-Nov-2005	alc	Begin and end the initialization of pvzone in pmap_init(). Previously, pvzone's initialization was split between pmap_init() and pmap_init2(). This split initialization was the underlying cause of some UMA panics during initialization. Specifically, if the UMA boot pages was exhausted before the pvzone was fully initialized, then UMA, through no fault of its own, would use an inappropriate back-end allocator leading to a panic. (Previously, as a workaround, we have increased the UMA boot pages.) Fortunately, there is no longer any reason that pvzone's initialization cannot be completed in pmap_init(). Eliminate a check for whether pv_entry_high_water has been initialized or not from get_pv_entry(). Since pvzone's initialization is completed in pmap_init(), this check is no longer needed. Use cnt.v_page_count, the actual count of available physical pages, instead of vm_page_array_size to compute the maximum number of pv entries. Introduce the vm.pmap.pv_entries tunable on alpha and ia64. Eliminate some unnecessary white space. Discussed with: tegge (item #1) Tested by: marcel (ia64)
152002	03-Nov-2005	alc	Remove the remaining spl*() calls. Add some assertions. Eliminate some excessive white space.
151897	31-Oct-2005	rwatson	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
151885	30-Oct-2005	marcel	Remove a stray return statement in the interrupt dispatch function that caused a premature exit after calling a fast interrupt handler and bypassing a much needed critical_exit() and the scheduling of the interrupt thread for non-fast handlers. In short: unbreak :-)
151658	25-Oct-2005	jhb	Reorganize the interrupt handling code a bit to make a few things cleaner and increase flexibility to allow various different approaches to be tried in the future. - Split struct ithd up into two pieces. struct intr_event holds the list of interrupt handlers associated with interrupt sources. struct intr_thread contains the data relative to an interrupt thread. Currently we still provide a 1:1 relationship of events to threads with the exception that events only have an associated thread if there is at least one threaded interrupt handler attached to the event. This means that on x86 we no longer have 4 bazillion interrupt threads with no handlers. It also means that interrupt events with only INTR_FAST handlers no longer have an associated thread either. - Renamed struct intrhand to struct intr_handler to follow the struct intr_foo naming convention. This did require renaming the powerpc MD struct intr_handler to struct ppc_intr_handler. - INTR_FAST no longer implies INTR_EXCL on all architectures except for powerpc. This means that multiple INTR_FAST handlers can attach to the same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach to the same interrupt. Sharing INTR_FAST handlers may not always be desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun either. Drivers can always still use INTR_EXCL to ask for an interrupt exclusively. The way this sharing works is that when an interrupt comes in, all the INTR_FAST handlers are executed first, and if any threaded handlers exist, the interrupt thread is scheduled afterwards. This type of layout also makes it possible to investigate using interrupt filters ala OS X where the filter determines whether or not its companion threaded handler should run. - Aside from the INTR_FAST changes above, the impact on MD interrupt code is mostly just 's/ithread/intr_event/'. - A new MI ddb command 'show intrs' walks the list of interrupt events dumping their state. It also has a '/v' verbose switch which dumps info about all of the handlers attached to each event. - We currently don't destroy an interrupt thread when the last threaded handler is removed because it would suck for things like ppbus(8)'s braindead behavior. The code is present, though, it is just under #if 0 for now. - Move the code to actually execute the threaded handlers for an interrrupt event into a separate function so that ithread_loop() becomes more readable. Previously this code was all in the middle of ithread_loop() and indented halfway across the screen. - Made struct intr_thread private to kern_intr.c and replaced td_ithd with a thread private flag TDP_ITHREAD. - In statclock, check curthread against idlethread directly rather than curthread's proc against idlethread's proc. (Not really related to intr changes) Tested on: alpha, amd64, i386, sparc64 Tested on: arm, ia64 (older version of patch by cognet and marcel)
151543	21-Oct-2005	ade	Specifically panic() in the case where pmap_insert_entry() fails to get a new pv under high system load where the available pv entries have been exhausted before the pagedaemon has a chance to wake up to reclaim some. Prior to this, the NULL pointer dereference ended up causing secondary panics with rather less than useful resulting tracebacks. Reviewed by: alc, jhb MFC after: 1 week
151388	16-Oct-2005	phk	Make ttyconsolemode() call ttsetwater() so that drivers don't have to.
151316	14-Oct-2005	davidxu	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
151006	06-Oct-2005	phk	Eliminate need for __RMAN_RESOURCE_VISIBLE Reviewed by: marcel@
150663	28-Sep-2005	rwatson	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde
150335	19-Sep-2005	rwatson	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week
149926	10-Sep-2005	marcel	Merge db_interface.c and db_trace.c into db_machdep.c.
149925	10-Sep-2005	marcel	Move the prototypes of db_md_set_watchpoint(), db_md_clr_watchpoint() and db_md_list_watchpoints() to ddb/ddb.h.
149915	09-Sep-2005	marcel	Change the High FP lock from a sleep lock to a spin lock. We can take the lock from interrupt context, which causes an implicit lock order reversal. We've been using the lock carefully enough that making it a spin lock should not be harmful.
149806	05-Sep-2005	marcel	o In pmap_remove_pte: always invalidate the page. Previously the page was not invalidated if the PTE was not actually being removed. In an UP kernel this didn't cause problems, because the new mapping would preempt the old one. In an SMP kernel this could lead to the use of stale translations when processes move between CPUs at the "right" moment. This fixes the last of the obvious SMP problems and it should be safe to enable SMP by default now. o In pmap_remove_pte: minor code refactoring to avoid duplication. o Test all PTE pointers against NULL. Don't use implicit boolean tests.
149777	03-Sep-2005	marcel	o s/vhpt_size/pmap_vhpt_log2size/g o s/vhpt_base/pmap_vhpt_base/g o s/vhpt_bucket/pmap_vhpt_bucket/g o Declare the above in <machine/pmap.h> o Move the vm.stats.vhpt.* sysctls to machdep.vhpt.* o Create a tunable machdep.vhpt.log2size, with corresponding sysctl. The tunable allows the user to specify the VHPT size from the loader. o Don't keep track of the number of PTEs in the VHPT. Calculate the population when necessary by iterating the buckets and summing up the length of the buckets. o Don't perform the tpa instruction with a bucket lock held. The instruction can (theoretically) fault and locking is not needed.
149770	03-Sep-2005	marcel	Fix collision chain termination checks. The result of IA64_PHYS_TO_RR7 is never 0, so one cannot test for a NULL pointer after a physical address is translated into a virtual pointer with said macro. Instead, keep the physical address around and test it against 0. Note that this obviously implies that a PTE can never be allocated at physical address 0. This isn't exactly guaranteed, but hasn't been a problem so far. We test the physical address against 0 for as long as the ia64 port exists...
149768	03-Sep-2005	alc	Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine whether the mapping should permit execute access.
149062	14-Aug-2005	marcel	Remove the execute permission for stacks.
149037	13-Aug-2005	marcel	o s/pmap_lpte_/pmap_/g o Remove pmap_is_referenced(). It was already compiled-out.
149036	13-Aug-2005	marcel	Fix the problem with the IPI for the lazy context switching of the high FP registers. It was not that the IPI got lost due to the perceived unreliability of the IPI delivery, but rather that the IPI was not assigned a vector (ugh). Sending a 0 vector to a CPU results in a stray external interrupt. Add a KASSERT to ipi_send() to catch this. The initialization of the IPIs could be better, but it's not at all sure what the future of the code is. Avoid wasting a lot of time on something that is going to be rewritten anyway.
148807	06-Aug-2005	marcel	Improve SMP support: o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU uses to look up translations it can't find in the TLB. As such, the VHPT serves as a level 1 cache (the TLB being a level 0 cache) and best results are obtained when it's not shared between CPUs. The collision chain (i.e. the hash bucket) is shared between CPUs, as all buckets together constitute our collection of PTEs. To achieve this, the collision chain does not point to the first PTE in the list anymore, but to a hash bucket head structure. The head structure contains the pointer to the first PTE in the list, as well as a mutex to lock the bucket. Thus, each bucket is locked independently of each other. With at least 1024 buckets in the VHPT, this provides for sufficiently finei-grained locking to make the ssolution scalable to large SMP machines. o Add synchronisation to the lazy FP context switching. We do this with a seperate per-thread lock. On SMP machines the lazy high FP context switching without synchronisation caused inconsistent state, which resulted in a panic. Since the use of the high FP registers is not common, it's possible that races exist. The ia64 package build has proven to be a good stress test, so this will get plenty of exercise in the near future. o Don't use the local ID of the processor we want to send the IPI to as the argument to ipi_send(). use the struct pcpu pointer instead. The reason for this is that IPI delivery is unreliable. It has been observed that sending an IPI to a CPU causes it to receive a stray external interrupt. As such, we need a way to make the delivery reliable. The intended solution is to queue requests in the target CPU's per-CPU structure and use a single IPI to inform the CPU that there's a new entry in the queue. If that IPI gets lost, the CPU can check it's queue at any convenient time (such as for each clock interrupt). This also allows us to send requests to a CPU without interrupting it, if such would be beneficial. With these changes SMP is almost working. There are still some random process crashes and the machine can hang due to having the IPI lost that deals with the high FP context switch. The overhead of introducing the hash bucket head structure results in a performance degradation of about 1% for UP (extra pointer indirection). This is surprisingly small and is offset by gaining reasonably/good scalable SMP support.
148666	03-Aug-2005	jeff	- Add support for saving stack traces and displaying them via printf(9) and KTR. Contributed by: Antoine Brodin <antoine.brodin@laposte.net> Concept code from: Neal Fachan <neal@isilon.com>
147889	10-Jul-2005	davidxu	Validate if the value written into {FS,GS}.base is a canonical address, writting non-canonical address can cause kernel a panic, by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring only canonical values get written to the registers. Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com > Approved by: re (scottl)
147773	05-Jul-2005	marcel	Enhance ia64_flush_dirty() to handle the case in which td != curthread. This case is triggered with ptrace(2) and the PT_SETREGS function. Change the return type of the function to int so that errors can be passed on to the caller. Approved by: re (scottl)
147745	02-Jul-2005	marcel	Implement functions calls from within DDB on ia64. On ia64 a function pointer doesn't point to the first instruction of that function, but rather to a descriptor. The descriptor has the address of the first instruction, as well as the value of the global pointer. The symbol table doesn't know anything about descriptors, so if you lookup the name of a function you get the address of the first instruction. The cast from the address, which is the result of the symbol lookup, to a function pointer as is done in db_fncall is therefore invalid. Abstract this detail behind the DB_CALL macro. By default DB_CALL is defined as db_fncall_generic, which yields the old behaviour. On ia64 the macro is defined as db_fncall_ia64, in which a descriptor is constructed to yield a valid function pointer. While here, introduce DB_MAXARGS. DB_MAXARGS replaces the existing (local) MAXARGS. The DB_MAXARGS macro can be defined by platforms to create a convenient maximum. By default this will be the legacy 10. On ia64 we define this macro to be 8, for 8 is the maximum number of arguments that can be passed in registers. This avoids having to implement spilling of arguments on the memory stack. Approved by: re (dwhite)
147740	02-Jul-2005	marcel	Fix a buglet that was present in the ia64 code and that got inherited by amd64 and i386: For buffered writes we collect data and write it out a ${DEV_BSIZE}-sized block at a time. The fragsz variable is used to keep track of how much data we have collected in the buffer so far and it's reset to zero immediately after writing a block to the dump device. When the last, possibly partially filled buffer is flushed, we didn't reset fragsz to 0 and as such would stop reflecting reality. Since we currently only need to do buffered writes once, this isn't a problem. However, when kernel dumps are made by hand (say by callling doadump from within DDB), the improperly cleared state from the first call to dumpsys causes the next call to dumpsys to create an invalid code file. This change resets fragsz after flushing the partially filled buffer so that it fixes the two problems at once. Approved by: re (scottl)
147640	27-Jun-2005	marcel	Handle B-unit break instructions. The break.b is unique in that the immediate is not saved by the architecture. Any of the break.{mifx} instructions have their immediate saved in cr.iim on interruption. Consequently, when we handle the break interrupt, we end up with a break value of 0 when it was a break.b. The immediate is important because it distinguishes between different uses of the break and which are defined by the runtime specification. The bottomline is that when the GNU debugger replaces a B-unit instruction with a break instruction in the inferior, we would not send the process a SIGTRAP when we encounter it, because the value is not one we recognize as a debugger breakpoint. This change adds logic to decode the bundle in which the break instruction lives whenever the break value is 0. The assumption being that it's a break.b and we fetch the immediate directly out of the instruction. If the break instruction was not a break.b, but any of break.{mifx} with an immediate of 0, we would be doing unnecessary work. But since a break 0 is invalid, this is not a problem and it will still result in a SIGILL being sent to the process. Approved by: re (scottl)
147639	27-Jun-2005	marcel	Replace the existing copyright notice with my own. Over the years I've changed this file so much that it's equivalent to a rewrite, and I'm not talking about any of the cosmetic changes of course. Approved by: re (scottl)
147638	27-Jun-2005	marcel	Cosmetic: s/u_int64_t/uint64_t/g Approved by: re (scottl)
147217	10-Jun-2005	alc	Introduce a procedure, pmap_page_init(), that initializes the vm_page's machine-dependent fields. Use this function in vm_pageq_add_new_page() so that the vm_page's machine-dependent and machine-independent fields are initialized at the same time. Remove code from pmap_init() for initializing the vm_page's machine-dependent fields. Remove stale comments from pmap_init(). Eliminate the Boolean variable pmap_initialized from the alpha, amd64, i386, and ia64 pmap implementations. Its use is no longer required because of the above changes and earlier changes that result in physical memory that is being mapped at initialization time being mapped without pv entries. Tested by: cognet, kensmith, marcel
146794	29-May-2005	marcel	Create nexus in configure_first() instead of in configure(). This makes sure that sysinit tasks that run after configure_first(), but before configure() have a nexus to hang devices off.
146791	29-May-2005	marcel	Call cninit_finish() in configure_final().
145433	23-Apr-2005	davidxu	Change cpu_set_kse_upcall to more generic style, so we can reuse it in other codes. Add cpu_set_user_tls, use it to tweak user register and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall, but since cpu_set_kse_upcall is also used by M:N threads which may not need this feature, so I wrote a separated cpu_set_user_tls.
145389	22-Apr-2005	marcel	Sanity the RTC code: o Remove the clock interface. Not only does it conflict with the MI version when device genclock is added to the kernel, it was also not possible to have more than 1 clock device. This of course would have been a problem if we actually had more than 1 clock device. In short: we don't need a clock interface and if we do eventually, we should be using the MI one. o Rewrite inittodr() and resettodr() to take into account that: 1) We use the EFI interface directly. 2) time_t is 64-bit and we do need to make sure we can determine leap years from year 2100 and on. Add a nice explanation of where leap years come from and why. 3) This rewrite happened in 2005 so any date prior to 1/1/2005 (either M/D/Y or D/M/Y) is bogus. Reprogram the EFI clock with 1/1/2005 in that case. 4) The EFI clock has a high probability of being correct, so only (further) correct the EFI clock when the file system time is larger. That should never happen in a time-synchronised world. Complain when EFI lost 2 days or more. Replace the copyright notice now that I (pretty much) rewrote all of this file.
145173	16-Apr-2005	marcel	Add a kpte command to DDB. It dumps the PTE of a KVA. This helps to analyze faults and TLB/VHPT inconsistencies.
145137	16-Apr-2005	marcel	Return better "error" values for UWX_BOTTOM and UWX_ABI_FRAME in unw_step(). Both errors denote the end of a stack trace (i.e. no prior frame), but are otherwise not error conditions. Have db_trace() return 0 when the trace ends due to one of these return codes as they are really normal termination conditions. This change especially improves the output of the "show thread" command in DDB when there are threads in fork_trampoline() and previously db_trace() would return an error, causing the show command to emit '***'.
145092	15-Apr-2005	marcel	Initialize curthread before we save the APs MCA state. Saving the MCA state requires a spin lock, which requires a valid curthread. This change allows SMP kernels to boot into multi-user again. While here, update the copyright notice and use __FBSDID for the revision string.
144971	12-Apr-2005	jhb	Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic operations in some places and simple non-per CPU math in others.
144962	12-Apr-2005	marcel	Dot the i's: 1 Move the debug.clock_adjust_* sysctls to debug.clock.adjust_* to make it easier to get only the clock statistics. 2 Make the sysctls read-only [suggested by Marius]. 3 When determining the new clock adjustment, we checked for an error either larger than 12.5% or smaller than 12.5%. We left out an error of exactly 12.5%. For errors larger than 12.5% we adjust the clock reload value in such a way that the next clock interrupt would be early (as in premature). For errors less than 12.5% we stopped the adjustment. The current algorithm doesn't benefit from excluding an error of exactly 12.5%. Change the code to stop adjusting the clock if the error is not larger than 12.5% [suggested by Marius]. Discussed with: marius@
144637	04-Apr-2005	jhb	Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
143867	20-Mar-2005	njl	s/SLIST/STAILQ to catch up with changes to resource lists. Missed by: imp
143796	18-Mar-2005	iedowse	Split configure() into 3 separate steps like we do on other architectures. This makes it possible to insert hooks before and after the device attachment step. Tested thanks to: marcel
143202	07-Mar-2005	scottl	Remove dead code.
143057	02-Mar-2005	marcel	Make sure fpswa_iface equals NULL when bootinfo.bi_fpswa equals 0. We need to be able to test for the (possible) non-existence of the FPSWA code. PR: ia64/77591 Submitted by: Christian Kandeler (christian dot kandeler at hob dot de) MFC after: 1 day
142956	01-Mar-2005	wes	Attempt to doff the pointy hat: implement 'hw.realmem' on remaining architectures. Pointed out by O'Brien, ScottL via email. Reviewed by: obrien (various)
141556	09-Feb-2005	marcel	s/descr/oid_descr/
141378	06-Feb-2005	njl	Finish the job of sorting all includes and fix the build by including malloc.h before proc.h on sparc64. Noticed by das@ Compiled on: alpha, amd64, i386, pc98, sparc64
141248	04-Feb-2005	marcel	Include sys/bus.h before sys/cpu.h. The latter needs device_t.
141237	04-Feb-2005	njl	Add an implementation of cpu_est_clockrate(9). This function estimates the current clock frequency for the given CPU id in units of Hz.
140891	27-Jan-2005	marcel	Fix handling of post increment: Either the first or second operand is the register with the memory address, and it's that register's value we need to increment or decrement. MFC after: 3 days
140418	18-Jan-2005	scottl	Fix compile errors. Bah.
140315	15-Jan-2005	scottl	Fix an assignment that I missed in the last commit.
140311	15-Jan-2005	scottl	Add bus_dmamap_load_mbuf_sg() to ia64
140256	14-Jan-2005	jhb	- Remove some OBE comments regarding cpu_exit(). cpu_exit() is no longer the last action of kern_exit(). Instead, it is a MD callout to cleanup per-process state during exit. - Add notes of concern to Alpha and ia64 about the possible need to drop fp state in cpu_thread_exit() rather than in cpu_exit() since it is per-thread state rather than per-process.
139790	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
139554	02-Jan-2005	marcel	Further enhance the handling of misaligned loads and stores: o implement double-extended and single precision loads and stores, o implement double precision stores, o replace the machdep.unaligned_print sysctl with debug.unaligned_print and change the default value to 0, o replace the machdep.unaligned_sigbus sysctl with debug.unaligned_test, o Remmove the fillfd() function. The function is trvial enough for inline assembly. The debug.unaligned_test sysctl is used to test the emulation of misaligned loads and stores. When PSR.ac is 0, the CPU will handle misaligned memory accesses itselfi and we don't get an exception for it. When PSR.ac is 1, the process needs to be signalled and we should not emulate. The sysctl takes effect when PSR.ac is 1 and tells us that we should emulate and not send a signal. PR: 72268 MFC after: 1 week
139241	23-Dec-2004	alc	Modify pmap_enter_quick() so that it expects the page queues to be locked on entry and it assumes the responsibility for releasing the page queues lock if it must sleep. Remove a bogus comment from pmap_enter_quick(). Using the first change, modify vm_map_pmap_enter() so that the page queues lock is acquired and released once, rather than each time that a page is mapped.
138897	15-Dec-2004	alc	In the common case, pmap_enter_quick() completes without sleeping. In such cases, the busying of the page and the unlocking of the containing object by vm_map_pmap_enter() and vm_fault_prefault() is unnecessary overhead. To eliminate this overhead, this change modifies pmap_enter_quick() so that it expects the object to be locked on entry and it assumes the responsibility for busying the page and unlocking the object if it must sleep. Note: alpha, amd64, i386 and ia64 are the only implementations optimized by this change; arm, powerpc, and sparc64 still conservatively busy the page and unlock the object within every pmap_enter_quick() call. Additionally, this change is the first case where we synchronize access to the page's PG_BUSY flag and busy field using the containing object's lock rather than the global page queues lock. (Modifications to the page's PG_BUSY flag and busy field have asserted both locks for several weeks, enabling an incremental transition.)
138752	12-Dec-2004	marcel	Fix the last of the instability and the cause of the annoying "vm_fault: fault on nofault entry, addr: %lx" panic. The problem was a stale PTE in the TLB that marked the page as not present, even though we had a good PTE in the VHPT. We typically don't yet insert PTEs in the TLB. We do that lazily. The CPU will look for the PTE in the VHPT when there's no PTE in the TLB. Unfortunately this doesn't handle the case of the stale PTE in the TLB. The quick fix is to invalidate the TLB (sloppily) when the VHPT doesn't contain a valid PTE. This is also the only case that may cause a PTE in the TLB that marks a page as non-present.
138543	08-Dec-2004	marcel	Don't obtain the HCDP address directly from the bootinfo structure. Use a function to keep the details at arms length from uart(4).
138253	01-Dec-2004	marcel	Change gdb_cpu_setreg() to not take the value to which to set the specified register, but a pointer to the in-memory representation of that value. The reason for this is twofold: 1. Not all registers can be represented by a register_t. In particular FP registers fall in that category. Passing the new register value by reference instead of by value makes this point moot. 2. When we receive a G or P packet, both are for writing a register, the packet will have the register value in target-byte order and in the memory representation (modulo the fact that bytes are sent as 2 printable hexadecimal numbers of course). We only need to decode the packet to have a pointer to the register value. This change fixes the bug of extracting the register value of the P packet as a hexadecimal number instead of as a bit array. The quick (and dirty) fix to bswap the register value in gdb_cpu_setreg() as it has been added on i386 and amd64 can therefore be removed and has in fact been that. Tested on: alpha, amd64, i386, ia64, sparc64
138129	27-Nov-2004	das	Don't include sys/user.h merely for its side-effect of recursively including other headers.
137978	21-Nov-2004	marcel	Remove struct ia64_itir and use a plain old uint64_t instead.
137912	20-Nov-2004	das	U areas are going away, so don't allocate one for process 0. Reviewed by: arch@
137906	20-Nov-2004	das	user.h is included only to get pcb.h, so use the latter directly instead.
137117	01-Nov-2004	jhb	- Change the ddb paging "support" to use a variable (db_lines_per_page) to control the number of lines per page rather than a constant. The variable can be examined and changed in ddb as '$lines'. Setting the variable to 0 will effectively turn off paging. - Change db_putchar() to force out pending whitespace before outputting newlines and carriage returns so that one can rub out content on the current line via '\r \r' type strings. - Change the simple pager to rub out the --More-- prompt explicitly when the routine exits. - Add some aliases to the simple pager to make it more compatible with more(1): 'e' and 'j' do a single line. 'd' does half a page, and 'f' does a full page. MFC after: 1 month Inspired by: kris
136809	23-Oct-2004	phk	Use bioq_takefirst()
136680	18-Oct-2004	phk	Add new function ttyinitmode() which sets our systemwide default modes on a tty structure. Both the ".init" and the current settings are initialized allowing the function to be used both at attach and open time. The function takes an argument to decide if echoing should be enabled. Echoing should not be enabled for regular physical serial ports unless they are consoles, in which case they should be configured by ttyconsolemode() instead. Use the new function throughout.
136521	14-Oct-2004	njl	Print flags in the nexus for child devices.
136183	06-Oct-2004	marcel	Add the Madison II, which is the second generation Madison. The Madison II is model 2 in the Itanium 2 family and has up to 9MB of L3 cache and clocks higher than 1.5Ghz. There's no LV variant AFAICT.
136070	03-Oct-2004	alc	The physical address stored in the vm_page is page aligned. There is no need to mask off the page offset bits. (This operation made some sense prior to i386/i386/pmap.c revision 1.254 when we passed a physical address rather than a vm_page pointer to pmap_enter().)
136050	02-Oct-2004	alc	Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These uses predate the change in the pmap_enter() interface that replaced the page's physical address by the address of its vm_page structure. The PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page structure that was being passed in.
135783	25-Sep-2004	marcel	Move the IA-32 trap handling from trap() to ia32_trap(). Move the ia32_syscall() function along with it to ia32_trap.c. When COMPAT_IA32 is not defined, we'll raise SIGEMT instead.
135590	23-Sep-2004	marcel	Redefine a PTE as a 64-bit integral type instead of a struct of bit-fields. Unify the PTE defines accordingly and update all uses.
135589	22-Sep-2004	marcel	s/u_int#_t/uint#_t/g
135529	20-Sep-2004	jhb	- Add support for "paging" in stack trace output. That is, when you do a stack trace from ddb, the output will pause with a '--More--' prompt every 18 lines. If you hit Enter, it will print another line and prompt again. If you hit space it will output another page and then prompt. If you hit 'q' or 'x' it will abort the rest of the stack trace. - Fix the sparc64 userland stack trace to honor the total count of lines to print. This is useful if your trace happens to walk back onto 0xdeadc0de and gets stuck in an endless loop. MFC after: 1 month Tested on: i386, alpha, sparc64
135453	19-Sep-2004	marcel	MFp4: Completely remove the remaining EFI includes and add our own (type) definitions instead. While here, abstract more of the internals by providing interface functions.
135443	18-Sep-2004	alc	Release the page queues lock earlier in pmap_protect() and pmap_remove() in order to reduce contention.
135405	17-Sep-2004	marcel	Provide our own FPSWA definitions, instead of depending on the Intel EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers conflict with the Intel ACPI headers (duplicate type definitions), so are being phased out in the kernel.
135403	17-Sep-2004	marcel	Remove useless inclusion of <machine/fpu.h>
134934	08-Sep-2004	scottl	Fix a problem with tag->boundary inheritence that has existed since day one and was propagated to nearly every platform. The boundary of the child needs to consider the boundary of the parent and pick the minimum of the two, not the maximum. However, if either is 0 then pick the appropriate one. This bug was exposed by a recent change to ATA, which should now be fixed by this change. The alignment and maxsegsz tag attributes likely also need a similar review in the near future. This is a MT5 candidate. Reviewed by: marcel Submitted by: sos (in part)
134928	08-Sep-2004	marcel	Sync the busdma code with i386. The most tangible upshot is that the alignment and boundary constraints are being respected, which fixes the reported ATA problems with SiI chips. I consider the busdma implementation worrisome nonetheless. Not only is there too much MI code duplicated in MD files, there's a lot of questionable code. I smell a wholesale, cross-platform overhaul coming... MT5 candidate.
134791	05-Sep-2004	julian	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
134571	31-Aug-2004	julian	Remove an unneeded argument.. The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync. MFC after: 3 days
134568	31-Aug-2004	julian	Remove sched_free_thread() which was only used in diagnostics. It has outlived its usefulness and has started causing panics for people who turn on DIAGNOSTIC, in what is otherwise good code. MFC after: 2 days
134509	30-Aug-2004	alc	Remove unnecessary check for curthread == NULL.
134502	30-Aug-2004	marcel	s/ENTRY/ENTRY_NOPROFILE/g for particular functions that do not follow the C calling convention or are otherwise not regular functions. This allows us to boot a profiling kernel.
134393	27-Aug-2004	alc	The machine-independent parts of the virtual memory system always pass a valid pmap to the pmap functions that require one. Remove the checks for NULL. (These checks have their origins in the Mach pmap.c that was integrated into BSD. None of the new code written specifically for FreeBSD included them.)
134287	25-Aug-2004	marcel	Make profiling actually work. The gcc compiler emits a call to the _mcount() stub when profiling is enabled. Emit this code sequence for assembly routines as welli (MCOUNT definition in <machine/asm.h>. We do not pass the GOT entry however as the 4th argument, because it's not used. The _mcount() stub calls __mcount(), which does the actual work. Define _MCOUNT_DECL to define __mcount. We do not have an implementation of mcount(), so we define MCOUNT as empty, but have a weak alias to _mcount() in _mcount.S. Note that the _mcount() stub in the kernel is slightly different from the stub in userland. This is because we do not have to worry about nested routines in the kernel.
134263	24-Aug-2004	njl	Catch up with i386 nexus.c rev 1.59: add bus_get_resource_list().
133888	16-Aug-2004	arun	The existing code fails some corner cases. Replace it with ia64_bsp_adjust() which has been tested to work in all cases for arbitrary (bsp, nslots) combinations. reviewed by: marcel@
133878	16-Aug-2004	marcel	Catch up with the drive-by renaming of IA32 to COMPAT_IA32. It must have been rush hour... While here, move COMPAT_IA32 from opt_global.h to opt_compat.h like on amd64. Consequently, it's unsafe to use the option in pcb.h. We now unconditionally have the ia32 specific registers in the PCB. This commit is untested.
133711	14-Aug-2004	marcel	Allocate memory in the unwinder with M_NOWAIT. We may need to provide backtraces with locks held.
133472	11-Aug-2004	marcel	In set_regs(), flush the dirty registers onto the backingstore before we update the registers. That way we don't have any dirty registers to worry about and also know that bsp=bspstore, which makes updating the RSE related registers predictable. This is not the end of it. We need more validity checks, but for now this allows us to complete the gdb testsuite without crashing the kernel.
133464	11-Aug-2004	marcel	Add __elfN(dump_thread). This function is called from __elfN(coredump) to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes. Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc
133405	09-Aug-2004	marcel	Better preserve the original protection for the mappings we maintain. The hardware always gives read access for privilege level 0, which means that we cannot use the hardware access rights and privilege level in the PTE to test whether there's a change in protection. So, we save the original vm_prot_t in the PTE as well. Add pmap_pte_prot() to set the proper access rights and privilege level on the PTE given a pmap and the requested protection. The above allows us to compare the protection in pmap_extract_and_hold() which was missing. While in pmap_extract_and_hold(), add pmap locking. While here, clean up most (i.e. all but one) PTE macros we inherited from alpha. They were either unused, used inconsistently, badly named or simply weren't beneficial. We save the wired and managed state of the PTE in distinct (bit) fields. While in pte.h, s/u_int64_t/uint64_t/g pmap locking obtained from: alc@ feedback & review by: alc@
133291	08-Aug-2004	marcel	Implement single stepping when we leave the kernel through the EPC syscall path. The basic problem is that we cannot set the single stepping flag directly, because we don't leave the kernel via an interrupt return. So, we need another way to set the single stepping flag. The way we do this is by enabling the lower-privilege transfer trap, which gets raised when we drop the privilege level. However, since we're still running in kernel space (sec), we're not yet done. We clear the lower- privilege transfer trap, enable the taken-branch trap and continue exiting the kernel until we branch into user space. Given the current code, there's a total of two traps this way before we can raise SIGTRAP.
133286	07-Aug-2004	marcel	Slightly move labels around to make sure we call ast() on our way out after a fork(2) in fork_trampoline(). By moving the epc_syscall_return label immediately before the call to do_ast() in epc_syscall(), we not only achieve that but also handle the detour through exception_return when the frame corresponds to an asynchronous kernel entry. Hence, we simplified fork_trampoline() as a side-effect.
133285	07-Aug-2004	marcel	De-inline gdb_cpu_signal() because we need to convert the trap vectors related to breakpoints and single stepping into SIGTRAP so gdb(1) knows why the remote target has stopped. In particular, gdb(1) needs to know if the reason is something of its own doing.
133135	04-Aug-2004	arun	Use a 256MB TR instead of a 64MB TR to make sure that the kernel text/data are covered on APs. This enables the kernel to boot on a 4 way Intel Itanium-2 platform. This has a secondary effect of keeping the TRs identical on BP and the APs. reviewed by: marcel@
133023	02-Aug-2004	marcel	Fix 2 typos in previous commit: both s/strct/struct/
132956	01-Aug-2004	markm	Break out the MI part of the /dev/[k]mem and /dev/io drivers into their own directory and module, leaving the MD parts in the MD area (the MD parts _are_ part of the modules). /dev/mem and /dev/io are now loadable modules, thus taking us one step further towards a kernel created entirely out of modules. Of course, there is nothing preventing the kernel from having these statically compiled.
132897	30-Jul-2004	alc	- Add pmap locking to ia64's pmap_enter() and pmap_enter_quick(). (This brings ia64 to parity with alpha, amd64, and i386 in this area.) - Prevent a race in pmap_find_pte(): If pmap_find_pte() sleeps in uma_zalloc(), another thread could allocate a pte at the same address. Instead, sleep at a higher level and retry the lookup before retrying the allocation. Reviewed and tested by: marcel@
132808	28-Jul-2004	phk	Move a relic to its correct location(s): Put nfs diskless initialization calls with the code they call. (Yet another example of mindless copy&paste).
132626	25-Jul-2004	marcel	Work-around a gcc code generation bug for function descriptors references (target/16559). This fixes SMP configurations. Obtained from: arun@
132522	22-Jul-2004	alc	In pmap_mincore() create a private copy of the pte for use after the pmap lock is released.
132487	21-Jul-2004	alc	Additional pmap locking Tested by: marcel@
132482	21-Jul-2004	marcel	Unify db_stack_trace_cmd(). All it did was look up the thread given the thread ID and call db_trace_thread(). Since arm has all the logic in db_stack_trace_cmd(), rename the new DB_COMMAND function to db_stack_trace to avoid conflicts on arm. While here, have db_stack_trace parse its own arguments so that we can use a more natural radix for IDs. If the ID is not a thread ID, or more precisely when no thread exists with the ID, try if there's a process with that ID and return the first thread in it. This makes it easier to print stack traces from the ps output. requested by: rwatson@ tested on: amd64, i386, ia64
132378	19-Jul-2004	alc	Add partial pmap locking. Tested by: marcel@
132235	16-Jul-2004	alc	Remove unused fields from the pmap.
132226	15-Jul-2004	phk	Preparation commit for the tty cleanups that will follow in the near future: rename ttyopen() -> tty_open() and ttyclose() -> tty_close(). We need the ttyopen() and ttyclose() for the new generic cdevsw functions for tty devices in order to have consistent naming.
132220	15-Jul-2004	alc	Push down the acquisition and release of the page queues lock into pmap_protect() and pmap_remove(). In general, they require the lock in order to modify a page's pv list or flags. In some cases, however, pmap_protect() can avoid acquiring the lock.
132170	15-Jul-2004	alc	A loop in pmap_remove() should use TAILQ_FOREACH_SAFE(), not TAILQ_FOREACH(), because the loop deletes elements from the list. Reviewed by: marcel@
132088	13-Jul-2004	davidxu	Add ptrace_clear_single_step(), alpha already has it for years, the function will be used by ptrace to clear a thread's single step state.
132085	13-Jul-2004	alc	Simplify pmap_protect().
132082	13-Jul-2004	alc	Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
131960	11-Jul-2004	marcel	Remove the now unused GDB stubs. See src/sys/gdb/* for the new KDB backend.
131952	10-Jul-2004	marcel	Mega update for the KDB framework: turn DDB into a KDB backend. Most of the changes are a direct result of adding thread awareness. Typically, DDB_REGS is gone. All registers are taken from the trapframe and backtraces use the PCB based contexts. DDB_REGS was defined to be a trapframe on all platforms anyway. Thread awareness introduces the following new commands: thread X switch to thread X (where X is the TID), show threads list all threads. The backtrace code has been made more flexible so that one can create backtraces for any thread by giving the thread ID as an argument to trace. With this change, ia64 has support for breakpoints.
131945	10-Jul-2004	marcel	Update for the KDB framework: o ksym_start and ksym_end changed type to vm_offset_t. o Make debugging support conditional upon KDB instead of DDB. o Call kdb_enter() instead of breakpoint(). o Remove implementation of Debugger(). o Call kdb_trap() according to the new world order. unwinder: o s/db_active/kdb_active/g o Various s/ddb/kdb/g o Add support for unwinding from the PCB as well as the trapframe. Abuse a spare field in the special register set to flag whether the PCB was actually constructed from a trapframe so that we can make the necessary adjustments. md_var.h: o Add RSE convenience macros. o Add ia64_bsp_adjust() to add or subtract from BSP while taking NaT collections into account.
131905	10-Jul-2004	marcel	Implement makectx(). The makectx() function is used by KDB to create a PCB from a trapframe for purposes of unwinding the stack. The PCB is used as the thread context and all but the thread that entered the debugger has a valid PCB. This function can also be used to create a context for the threads running on the CPUs that have been stopped when the debugger got entered. This however is not done at the time of this commit.
131899	10-Jul-2004	marcel	Introduce the GDB debugger backend for the new KDB framework. The backend improves over the old GDB support in the following ways: o Unified implementation with minimal MD code. o A simple interface for devices to register themselves as debug ports, ala consoles. o Compression by using run-length encoding. o Implements GDB threading support.
131840	08-Jul-2004	brian	Change the following environment variables to kernel options: bootp -> BOOTP bootp.nfsroot -> BOOTP_NFSROOT bootp.nfsv3 -> BOOTP_NFSV3 bootp.compat -> BOOTP_COMPAT bootp.wired_to -> BOOTP_WIRED_TO - i.e. back out the previous commit. It's already possible to pxeboot(8) with a GENERIC kernel. Pointed out by: dwmalone
131838	08-Jul-2004	marcel	MFamd64 (1.275): Reduce the scope of the Giant lock being held for non-mpsafe syscalls. There was way too much code being covered.
131821	08-Jul-2004	marcel	Better handle the break instruction trap. The runtime specification has outlined which break numbers are software interrupts, debugger breakpoints and ABI specific breaks. We mostly treated all break numbers we didn't care about as debugger breakpoints.
131814	08-Jul-2004	brian	Change the following kernel options to environment variables: BOOTP -> bootp BOOTP_NFSROOT -> bootp.nfsroot BOOTP_NFSV3 -> bootp.nfsv3 BOOTP_COMPAT -> bootp.compat BOOTP_WIRED_TO -> bootp.wired_to This lets you PXE boot with a GENERIC kernel by putting this sort of thing in loader.conf: bootp="YES" bootp.nfsroot="YES" bootp.nfsv3="YES" bootp.wired_to="bge1" or even setting the variables manually from the OK prompt.
131662	05-Jul-2004	alc	- Correct pmap_extract()'s return type. It should be vm_paddr_t, not vm_offset_t. - Convert pmap_extract() to the ANSI style of declaration.
131481	02-Jul-2004	jhb	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
131381	30-Jun-2004	marcel	Unbreak build: define __RMAN_RESOURCE_VISIBLE See also src/sys/sys/rman.h rev. 1.21.
130742	19-Jun-2004	alc	Remove dead code related to pv entry allocation. Reviewed by: marcel@
130585	16-Jun-2004	phk	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
130362	11-Jun-2004	alc	Neither pmap_enter() nor pmap_enter_quick() should create pv entries for unmanaged pages. Tested by: marcel@
130344	11-Jun-2004	phk	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.
130338	11-Jun-2004	alc	Reduce the number of preallocated pv entries and lpte entries in pmap_init(). Tested by: marcel@
130077	04-Jun-2004	phk	Machine generated patch which changes linedisc calls from accessing linesw[] directly to using the ttyld...() functions The ttyld...() functions ar inline so there is no performance hit.
130028	03-Jun-2004	tjr	Remove checks for curthread == NULL - it can't happen.
130025	03-Jun-2004	phk	Add missing <sys/module.h> instances which were shadowed by the nested include in <sys/kernel.h>
130023	03-Jun-2004	tjr	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb
129944	01-Jun-2004	phk	Gainfully employ the new ttyioctl in the trivial cases.
129750	26-May-2004	tmm	Retire cpu_sched_exit(); it is not used any more.
129323	17-May-2004	marcel	Unbreak build due to previous commit: now that elf_reloc_internal() gets the relocation base passed in relocbase, we cannot declare a local variable with the same name. Assume the argument holds the same value as the local variable did...
129282	16-May-2004	peter	Make a small revision to the api between the elf linker core and the elf_reloc() backends for two reasons. First, to support the possibility of there being two elf linkers in the kernel (eg: amd64), and second, to pass the relocbase explicitly (for relocating .o format kld files).
129023	07-May-2004	marcel	Revert previous commit. We should not get any FP traps from within the kernel. We can guarantee this by resetting the FP status register. This masks all FP traps. The reason we did get FP traps was that we didn't reset the FP status register in all cases. Make sure to reset the FP status register in syscall(). This is one of the places where it was forgotten. While on the subject, reset the FP status register only when we trapped from user space.
129022	07-May-2004	marcel	Make sure to sanitize the FP status register. Specifically this masks all FP traps, which should not happen in the kernel.
128857	03-May-2004	marcel	Floating-point faults and exceptions can happen in the kernel too. Do not panic when it happens; handle them. Run into by: das
128393	18-Apr-2004	alc	MFamd64 Simplify the sf_buf implementation. In short, make it a veneer over the direct virtual-to-physical mapping.
128105	11-Apr-2004	alc	Remove a comment that refers to avail_start and avail_end as these variables no longer exist.
128097	10-Apr-2004	alc	- pmap_kenter_temporary() is unused by machine-independent code. Therefore, move its declaration to the machine-dependent header file on those machines that use it. In principle, only i386 should have it. Alpha and AMD64 should use their direct virtual-to-physical mapping. - Remove pmap_kenter_temporary() from ia64. It is unused. Approved by: marcel@
128019	07-Apr-2004	imp	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
127922	06-Apr-2004	alc	Remove avail_end. As of yesterday, it is unused.
127875	05-Apr-2004	alc	Remove avail_start on those platforms that no longer use it. (Only amd64 does anything with it beyond simple initialization.)
127869	05-Apr-2004	alc	Remove unused arguments from pmap_init().
127788	03-Apr-2004	alc	In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it should not. Add a new parameter so that the caller can specify which is the case. Reported by: dillon
127494	27-Mar-2004	marcel	MFi386: correctly calculate the top-of-stack when a kthread is created with a larger kernel stack. Remove inclusion of opt_kstack_pages.h now that it's unused.
127241	20-Mar-2004	alc	- Add uiomove_fromphys() implementations to alpha and ia64. These only differ trivially from amd64. - Correct a spelling error in a comment.
127086	16-Mar-2004	alc	Refactor the existing machine-dependent sf_buf_free() into a machine- dependent function by the same name and a machine-independent function, sf_buf_mext(). Aside from the virtue of making more of the code machine- independent, this change also makes the interface more logical. Before, sf_buf_free() did more than simply undo an sf_buf_alloc(); it also unwired and if necessary freed the page. That is now the purpose of sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used as a general-purpose emphemeral map cache.
126919	13-Mar-2004	scottl	Now that contigfree() does not require Giant, don't grab it in busdma.
126825	10-Mar-2004	marcel	Identify the Deerfield processor. Deerfield is a low-voltage variant based on the Madison core and targeting the low end of the spectrum. Its clock frequency is 1Ghz, whereas Madison starts at 1.3Ghz. Since the CPUID information is the same for Madison and Deerfield, we use the clock frequency to identify the processor. Supposedly the Deerfield only uses 62W, which seems to be less than modern Xeon processors (about 70W) and about half what a Madison would need.
126728	07-Mar-2004	alc	Retire pmap_pinit2(). Alpha was the last platform that used it. However, ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps, there has been no reason for having this function on Alpha. Briefly, when pmap_growkernel() relied upon the list of all processes to find and update the various pmaps to reflect a growth in the kernel's valid address space, pmap_init2() served to avoid a race between pmap initialization and pmap_growkernel(). Specifically, pmap_pinit2() was responsible for initializing the kernel portions of the pmap and pmap_pinit2() was called after the process structure contained a pointer to the new pmap for use by pmap_growkernel(). Thus, an update to the kernel's address space might be applied to the new pmap unnecessarily, but an update would never be lost.
126716	07-Mar-2004	alc	Integrate the code from pmap_pinit2() into pmap_pinit(), leaving pmap_pinit2() empty. Approved by: marcel
126106	22-Feb-2004	marcel	Do not pre-map the I/O port space. On the Intel Tiger 4 this conflicts with a memory mapped I/O range that's immediately before it and is not 256MB aligned. As a result, when an address is accessed in the memory mapped range and a direct mapping is added for it, it overlaps with the pre-mapped I/O port space and causes a machine check. Based on a patch from: arun@
126080	21-Feb-2004	phk	Device megapatch 4/6: Introduce d_version field in struct cdevsw, this must always be initialized to D_VERSION. Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
126078	21-Feb-2004	phk	Device megapatch 3/6: Add missing D_TTY flags to various drivers. Complete asserts that dev_t's passed to ttyread(), ttywrite(), ttypoll() and ttykqwrite() have (d_flags & D_TTY) and a struct tty pointer. Make ttyread(), ttywrite(), ttypoll() and ttykqwrite() the default cdevsw methods for D_TTY drivers and remove the explicit initializations in various drivers cdevsw structures.
126076	21-Feb-2004	phk	Device megapatch 1/6: Free approx 86 major numbers with a mostly automatically generated patch. A number of strategic drivers have been left behind by caution, and a few because they still (ab)use their major number.
125975	18-Feb-2004	phk	Change the disk(9) API in order to make device removal more robust. Previously the "struct disk" were owned by the device driver and this gave us problems when the device disappared and the users of that device were not immediately disappearing. Now the struct disk is allocate with a new call, disk_alloc() and owned by geom_disk and just abandonned by the device driver when disk_create() is called. Unfortunately, this results in a ton of "s/\./->/" changes to device drivers. Since I'm doing the sweep anyway, a couple of other API improvements have been carried out at the same time: The Giant awareness flag has been flipped from DISKFLAG_NOGIANT to DISKFLAG_NEEDSGIANT A version number have been added to disk_create() so that we can detect, report and ignore binary drivers with old ABI in the future. Manual page update to follow shortly.
124739	20-Jan-2004	marcel	Fix handling of FP traps: o For traps, the cr.iip register points to the next instruction to execute on interrupt return (modulo slot). Since we need to get the bundle of the instruction that caused the FP fault/trap, make sure we fetch the previous bundle if the next instruction is in fact the first in a bundle. o When we call the FPSWA handler, we need to tell it whether it's a trap or a fault (first argument). This was hardcoded to mean a fault. Also, for FP faults, when a fault is converted to a trap, adjust the cr.iip and cr.ipsr registers to point to the next instruction. This makes sure that the SIGFPE handler gets a consistent state.
124737	20-Jan-2004	marcel	s/framep/tf/g -- this normalizes on the use of tf to point to the trapframe and improves grep-ability.
124092	03-Jan-2004	davidxu	Make sigaltstack as per-threaded, because per-process sigaltstack state is useless for threaded programs, multiple threads can not share same stack. The alternative signal stack is private for thread, no lock is needed, the orignal P_ALTSTACK is now moved into td_pflags and renamed to TDP_ALTSTACK. For single thread or Linux clone() based threaded program, there is no semantic changed, because those programs only have one kernel thread in every process. Reviewed by: deischen, dfr
123929	28-Dec-2003	silby	Track three new sendfile-related statistics: - The number of times sendfile had to do disk I/O - The number of times sfbuf allocation failed - The number of times sfbuf allocation had to wait
123920	28-Dec-2003	silby	Move the declaration of sfbufspeak and sfbufsused to mbuf.h, and use imax instead of max, as sfbufspeak and sfbufsused are signed. Submitted by: bde
123884	27-Dec-2003	silby	Track current and peak sfbuf usage, export the values via sysctl.
123819	24-Dec-2003	marcel	Don't use NULL with integral types.
123742	23-Dec-2003	peter	Add an additional field to the elf brandinfo structure to support quicker exec-time replacement of the elf interpreter on an emulation environment where an entire /compat/* tree isn't really warranted.
123528	14-Dec-2003	marcel	In set_mcontext(), take into account that kse_switchin(2) will eventually be passed an async. context as well as a syscall context. While here, fix a serious bug in that if the trapframe is a syscall frame, but we're restoring an async context, we need to clear the FRAME_SYSCALL flag so that we leave the kernel via exception_restore.
123346	09-Dec-2003	marcel	Don't panic for misalignment traps when the onfault handler is set. Not all transfers between kernel and user space are byte oriented and thus alignment safe. Especially fuword() and suword() are sensitive to alignment but in general more optimal than block copies. By catching the misalignment trap we avoid pessimizing the common case of properly aligned memory accesses which we would do if we were to use byte copies or adding tests for proper alignment. Note that the expectation that the kernel produces aligned pointers is unchanged. This change therefore relates to possible unaligned pointers generated in userland.
123255	07-Dec-2003	marcel	Simplify the contexts created by the kernel and remove the related flags. We now create asynchronous contexts or syscall contexts only. Syscall contexts differ from the minimal ABI dictated contexts by having the scratch registers saved and restored because that's where we keep the syscall arguments and syscall return values. Since this change affects KSE, have it use kse_switchin(2) for the "new" syscall context.
122947	21-Nov-2003	jhb	- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid. cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is actually present and sets mp_ncpus and all_cpus. Splitting these up allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the CPU probing code to live in a module, for example, since modules sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is needed to re-enable the ACPI module on i386. - For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating its contents in a few places. Also, add a smp_cpu_enabled() function to avoid duplicating some code. There is room for further code reduction later since much of this code is also present in cpu_mp_start(). - All archs besides i386 still set mp_maxid to the same values they set it to before this change. i386 now sets mp_maxid to MAXCPU. Tested on: alpha, amd64, i386, ia64, sparc64 Approved by: re (scottl)
122918	20-Nov-2003	marcel	Set the ACPI processor Id in the PCPU structure so that CPU idling on SMP systems has a chance of working. This was a loose end of the implementation of the ACPI Cx idle states. Since our logical CPU Id is the ACPI processor Id, we do not need to jump through hoops to obtain it. Approved: re@ (jhb)
122841	17-Nov-2003	peter	Widen the enable/disable helper function's argument in line with the ithread_create() changes etc. This should be mostly a NOP.
122821	16-Nov-2003	alc	- Remove unnecessary synchronization from sf_buf_init(). (There is only one active CPU when sf_buf_init() is performed.)
122780	16-Nov-2003	alc	- Modify alpha's sf_buf implementation to use the direct virtual-to- physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.
122763	15-Nov-2003	njl	Add the pc_acpi_id PCPU member. The new acpi_cpu driver uses this to dereference the softc.
122525	12-Nov-2003	marcel	Remove ia64_highfp_load() now that it's unused.
122518	12-Nov-2003	marcel	Further work-out the handling of the high FP registers. The most important change is in cpu_switch() where we disable the high FP registers for the thread that we switch-out if the CPU currently has its high FP registers. This avoids that the high FP registers remain enabled for the thread even when the CPU has unloaded them or the thread migrated to another processor. Likewise, when we switch-in a thread of that has its high FP registers on the CPU, we enable them. This avoids an otherwise harmless, but unnecessary trap to have them enabled. The code that handles the disabled high FP trap (in trap()) has been turned into a critical section for the most part to avoid being preempted. If there's a race, we bail out and have the processor trap again if necessary. Avoid using the generic ia64_highfp_save() function when the context is predictable. The function adds unnecessary overhead. Don't use ia64_highfp_load() for the same reason. The function is now unused and can be removed. These changes make the lazy context switching of the high FP registers in an UP kernel functional.
122480	11-Nov-2003	marcel	Save and restore the high FP registers in {g\|s}_mcontext(). Note that we currently do not keep track of whether the thread has actually used the high FP registers before. If not, we should not save them in the context which automaticly means that we also would not restore them from the context. For now, do it unconditionally so that we can reach functional completeness.
122479	11-Nov-2003	marcel	Fix a nasty bug that got exposed when the sendsig() and sigreturn() functions switched to using {g\|s}et_mcontext(). The problem is that sigreturn(), being a syscall, can be given an async. context (i.e. one corresponding to an interrupt or trap). When this happens, we try to return to user mode via epc_syscall_return with a trapframe that can only be used to return to user mode via exception_restore. To fix this, we check the frame's flags immediately prior to epc_syscall_return and branch to exception_restore for non-syscall frames. Modify the assertion in set_mcontext() to check that if there's a mismatch, it's because of sigreturn().
122389	10-Nov-2003	marcel	In get_mcontext(), do not update bspstore and ndirty in the trapframe. Only update them in the newly created context to reflect the state after copying the dirty registers onto the user stack. If we were to update the trapframe, we lose the state at entry into the kernel. We may need that after we create the context, such as for KSE upcalls. We have to update the trapframe after writing the dirty registers to the user stack for signal delivery to work. But this is best done in sendsig() itself where it applies, not in get_mcontext() where it's done unconditionally.
122373	09-Nov-2003	marcel	When a thread is being swapped-out, save the high FP registers. We have a pointer in the PCPU to the PCB of the thread that currently has its high FP registers loaded.
122368	09-Nov-2003	marcel	Use get_mcontext() to construct the signal context in sendsig() and use set_mcontext() to restore the context in sigreturn(). Since we put the syscall number and the syscall arguments in the trapframe (we don't save the scratch registers for syscalls, which allows us to reuse the space to our advantage), create a MD specific flag so that we save the scratch registers even for syscalls. We would not be able to restart a syscall otherwise. The signal trampoline does not need to flush the regiters anymore, because get_mcontext() already handles that. In fact, if we set up the context correctly, we do not need to have a trampoline at all. This change however only minimally changes the trampoline code. In follow-up commits this can be further optimized. Note that normally we preserve cfm and iip in the trapframe created by the EPC syscall path when we restore a context in set_mcontext() because those fields are not normally set for a synchronuous context. The kernel puts the return address and frame info of the syscall stub in there. By preserving these fields we hide this detail from userland which allows us to use setcontext(2) for user created contexts. However, sigreturn() is commonly called from the trampoline, which means that if we preserve cfm and iip in all cases, we would return to the trampoline after the sigreturn(), which means we hit the safety net: we call exit(2). So, we do not preserve cfm and iip when we have a synchronous context that also has scratch registers (the uncommon context created by sendsig() only), under the assumption that if such a context is created in userland, something special is going on and the use of cfm and iip is then just another quirk. All this is invisible in the common case.
122364	09-Nov-2003	marcel	Change the clear_ret argument of get_mcontext() to be a flags argument. Since all callers either passed 0 or 1 for clear_ret, define bit 0 in the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI code for possible (but unlikely) future use. The remaining bits are for use by MD code. This change is triggered by a need on ia64 to have another knob for get_mcontext().
122162	06-Nov-2003	marcel	Add support for unaligned ld2, st2, st4 and st8. While here, make sure we handle stacked registers properly by taking into account that: 1. bspstore points after the frame (due to cover), 2. we need to adjust for intermediate NaT collections.
121933	03-Nov-2003	marcel	Handle unaligned 4-byte loads. While in the neighborhood, remove the cr.isr sanity check. We actually encounter insanities, which very likely means that the insanity check itself is insane. Remove an empty comment while I'm at it.
121635	28-Oct-2003	marcel	When switching the RSE to use the kernel stack as backing store, keep the RNAT bit index constant. The net effect of this is that there's no discontinuity WRT NaT collections which greatly simplifies certain operations. The cost of this is that there can be up to 504 bytes of unused stack between the true base of the kernel stack and the start of the RSE backing store. The cost of adjusting the backing store pointer to keep the RNAT bit index constant, for each kernel entry, is negligible. The primary reasons for this change are: 1. Asynchronuous contexts in KSE processes have the disadvantage of having to copy the dirty registers from the kernel stack onto the user stack. The implementation we had so far copied the registers one at a time without calculating NaT collection values. A process that used speculation would not work. Now that the RNAT bit index is constant, we can block-copy the registers from the kernel stack to the user stack without having to worry about NaT collections. They will be in the right place on the user stack. 2. The ndirty field in the trapframe is now also usable in userland. This was previously not the case because ndirty also includes the space occupied by NaT collections. The value could be off by 8, depending on the discontinuity. Now that the RNAT bit index is contants, we have exactly the same number of NaT collection points on the kernel stack as we would have had on the user stack if we didn't switch backing stores. 3. Debuggers and other applications that use ptrace(2) can now copy the dirty registers from the kernel stack (using ptrace(2)) and copy them whereever they want them (onto the user stack of the inferior as might be the case for gdb) without having to worry about NaT collections in the same way the kernel doesn't have to worry about them. There's a second order effect caused by the randomization of the base of the backing store, for it depends on the number of dirty registers the processor happened to have at the time of entry into the kernel. The second order effect is that the RSE will have a better cache utilization as compared to having the backing store always aligned at page boundaries. This has not been measured and may be in practice only minimally beneficial, if at all measurable.
121622	27-Oct-2003	marcel	The previous commit removed both clause 3 and clause 4 from the UCB license. Only clause 3 has been revoked. Restore the fourth clause as clause 3. Pointed out by: das@ Remove my name as a copyright holder since I don't use a BSD license compatible or comparable to the UCB license. I choose not to add a complete second license for my work for aesthetic reasons, nor to replace the UCB license on grounds of rewriting more than 90% of the source files. The rewrite can also be seen as an enhancement and since the files were practically empty, it's rather trivial to have changed 90% of the files.
121600	27-Oct-2003	marcel	Add support for userland to access I/O port space. This is primarily added for XFree86. There are 2 reasons for doing this with sysarch(): 1. The memory mapped I/O space is not at a fixed physical address. An application has to use some interface to get the base address. It gets worse if the machine has multiple memory mapped I/O spaces. 2. Access to the memory mapped I/O space needs to happen through a translation that is flagged as uncachable. There's no interface that allows a process to do uncached memory I/O, other than though /dev/mem (possibly). So, until we either disallow direct access to I/O or bus space from userland or have a better way of doing this, sysarch() has the least negative impact on existing interfaces.
121457	24-Oct-2003	marcel	Remove ia64_pack_bundle() and ia64_unpack_bundle(). They are not used anymore.
121456	24-Oct-2003	marcel	Remove unused file. db_disasm() has been implemented in db_interface.c now.
121454	24-Oct-2003	marcel	Implement db_disasm() by using the new disassembler. Temporarily unimplement db_write_breakpoint() and db_clear_breakpoint().
121452	24-Oct-2003	arun	Use a TR of size 1 << IA64_ID_PAGE_SHIFT instead of 16M to avoid overlapping TR/TC entries (which results in a machine check). Note that we don't look at the size of the memory descriptor, because it doesn't guarantee non-overlap. With this change, a UP kernel could boot on a Intel Tiger4 machine with the following options: options LOG2_ID_PAGE_SIZE=26 # 64M options LOG2_PAGE_SIZE=14 # 16K Approved by: marcel
121449	24-Oct-2003	marcel	Don't use fuword() or suword() unconditionally. They explicitly disallow reading or writing.
121415	23-Oct-2003	marcel	Reimplement unaligned_fixup() using the new disassembler and a mcontext_t for the register values. Currently only ld8 and ldfd instructions are handled as those are the ones we need now (a misaligned ld8 occurs 4 times in ntpd(8) and a misaligned ldfd occurs once in mozilla 1.4 and 1.5). Other instructions are added when needed.
121413	23-Oct-2003	marcel	Remove unused include of <machine/inst.h>
121412	23-Oct-2003	marcel	Remove prototype of unaligned_fixup() and fix a nearby style(9) bug.
121410	23-Oct-2003	marcel	Add spillfd(). This function loads a double-precision FP register at the first address and spills it to the second address. This allows unaligned_fixup() to update the context of the process in a way that assures proper rounding. Similar functions for single-and extended-precision are added when needed.
121294	21-Oct-2003	marcel	Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.
121228	18-Oct-2003	njl	Add the cpu_idle_hook() function pointer so that other idlers can be hooked at runtime. Make C1 sleep (e.g., HLT) be the default. This prepares the way for further ACPI sleep states.
121148	17-Oct-2003	marcel	Implement cpu_idle() on ia64. We put the processor in a lightweight halt state that minimizes power consumption while still preserving cache and TLB coherency. Halting the processor is not conditional at this time. Tested with UP and SMP kernels.
120937	09-Oct-2003	robert	Implement preliminary support for the PT_SYSCALL command to ptrace(2).
120928	09-Oct-2003	marcel	With BETA 5 of libuwx some of the application registers are renamed from UWX_REG_MUMBLE to UWX_REG_AR_MUMBLE. Compatibility defines are present in libuwx. Change the names here so that we don't depend on compatibility defines. Note that there's now an UWX_REG_PFS and an UWX_REG_AR_PFS and the former is not a compatibility define for the latter AFAICT. Change to UWX_REG_AR_PFS as that seems to be the one we need to handle.
120914	08-Oct-2003	marcel	Include <sys/smp.h> for the prototype of smp_rendezvous().
120722	03-Oct-2003	alc	Migrate pmap_prefault() into the machine-independent virtual memory layer. A small helper function pmap_is_prefaultable() is added. This function encapsulate the few lines of pmap_prefault() that actually vary from machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have much in common. Going forward, it's worth considering their merger.
120683	03-Oct-2003	marcel	Swap the syscall caller frame info (i.e. the return pointer and frame marker) and the syscall stub frame info in the trap frame. Previously we stored the stub frame info in (rp,pfs) and the caller frame info in (iip,cfm). This ends up being suboptimal for the following reasons: 1. When we create a new context, such as for an execve(2), we had to set the (rp,pfs) pair for the entry point when using the syscall path out of the kernel but we need to set the (iip,cfm) pair when we take the interrupt way out. This is mostly just an inconsistency from the kernel's point of view, but an ugly irregularity from gdb(1)'s point of view. 2. The getcontext(2) and setcontext(2) syscalls had to swap the (rp,pfs) and (iip,cfm) pairs to make the context compatible with one created purely in userland. Swapping the (rp,pfs) and (iip,cfm) pairs is visible to signal handlers that actually peek at the mcontext_t and to gdb(1). Since this change is made for gdb(1) and we don't care about signal handlers that peek at the mcontext_t because we're still a tier 2 platform, this ABI breakage is academic at this moment in time. Note that there was no real reason to save the caller frame info in (iip,cfm) and the stub frame info in (rp,pfs).
120464	26-Sep-2003	phk	Set cn_name, not cn_dev
120422	25-Sep-2003	peter	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
120296	20-Sep-2003	marcel	Fix the last remaining problem encountered by KSE: apparently it is not guaranteed that the RSE writes the NaT collection immediately, sort of atomically, to the backing store when it writes the register immediately prior to the NaT collection point. This means that we cannot assume that the low 9 bits of the backingstore pointer do not point to the NaT collection. This is rather a surprise and I don't know at this time if it's a bug in the Merced or that it's actually a valid condition of the architecture. A quick scan over the sources does not indicate that we depend on the false assumption elsewhere, but it's something to keep in mind. The fix is to write the saved contents of the ar.rnat register to the backingstore prior to entering the loop that copies the dirty registers from the kernel stack to the user stack.
120294	20-Sep-2003	marcel	Move uma_small_alloc() and uma_small_free() to uma_machdep.c. These functions reference UMA internals from <vm/uma_int.h>, which makes them highly unwanted in non-UMA specific files. While here, prune the includes in pmap.c and use __FBSDID(). Move the includes above the descriptive comment. The copyright of uma_machdep.c is assigned to the project and can be reassigned to the foundation if and when when such is preferrable.
120252	19-Sep-2003	marcel	Fix the most significant KSE breakage caused by not restoring the restart instruction bits in the PSR. As such, we were returning from interrupt to the instruction in the bundle that caused us to enter the kernel, only now we're returning to a completely different bundle. While close here: add two KASSERTs to make sure that we restore sync contexts only when entered the kernel through a syscall and restore an async context only when entered the kernel through an interrupt, trap or fault. While not exactly here, but close enough: use suword64() when we copy the dirty registers from the kernel stack to the user stack. The code was intended to be be replaced shortly after being added, but that was a couple of weeks ago. I might as well avoid that it is a source for panics until it's replaced.
120250	19-Sep-2003	marcel	Revamp trap(): make it more explicit which kinds of traps/faults we can get (or not) and what we do with them. This fixes the behaviour for NaT consumption and speculation faults in that we now don't panic for user faults. Remove the dopanic label and move the code to a function. This makes it easier in the simulator to set a breakpoint. While here, remove the special handling of the old break-based syscall path and move it to where we handle the break vector. While here, reserve a new break immediate for KSE. We currently use the old break- based syscall to deal with restoring async contexts. However, it has the side-effect of also setting the signal mask and callong ast() on the way out. The new break immediate simply restores the context and returns without calling ast().
120212	19-Sep-2003	marcel	Include "opt_kstack_pages.h". We export KSTACK_PAGES to assembly and better have the right value.
119999	12-Sep-2003	alc	Add a new parameter to pmap_extract_and_hold() that is needed to eliminate Giant from vmapbuf(). Idea from: tegge
119970	10-Sep-2003	marcel	Rewrite the SAPIC initialization to always program the RTEs with what we think is the correct trigger mode and polarity. This allows us to implement BUS_CONFIG_INTR() as an update of the RTE in question. Consequently, we can trust the RTE when we enable an interrupt and avoids that we need to know about the trigger mode and polarity at that time.
119906	09-Sep-2003	marcel	Introduce IA64_ID_PAGE_{MASK\|SHIFT\|SIZE} and LOG2_ID_PAGE_SIZE. The latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE. The constants are used instead of the literal hardcoding (in its various forms) of the size of the direct mappings created in region 6 and 7. The default and probably only workable size is still 256M, but for kicks we use 128M for LINT.
119869	08-Sep-2003	alc	Introduce a new pmap function, pmap_extract_and_hold(). This function atomically extracts and holds the physical page that is associated with the given pmap and virtual address. Such a function is needed to make the memory mapping optimizations used by, for example, pipes and raw disk I/O MP-safe. Reviewed by: tegge
119861	07-Sep-2003	alc	MFamd64/i386 Add necessary page locking to pmap_mincore().
119787	05-Sep-2003	marcel	Fix a place where I forgot to change the code that checks whether we return to kernel or userland. This triggered a panic in a KSE application when TDF_USTATCLOCK was set in the case userland was interrupted, but we never called ast() on our way out. As such, we called ast() at some other time. Unfortunately, TDF_USTATCLOCK handling assumes running in the interrupt thread. This was not the case anymore. To avoid making the same mistake later, interrupt() now returns to its caller whether we interrupted userland or not. This avoids that we have to duplicate the check in assembly, where it's bound to fall off the scope. Now we simply check the return value and call ast() if appropriate. Run into this: davidxu
119649	01-Sep-2003	marcel	Use pmap_steal_memory() for the msgbuf instead of trying to squeeze it in the last chunk (phys_avail block). The last chunk very often is not larger than one or two pages, resulting in a msgbuf that's too small to hold a complete verbose boot. Note that pmap_steal_memory() will bzero the memory it "allocates". Consequently, ia64 will never preserve previous msgbufs. This is not a noticable difference in practice. If the msgbuf could be reused, it was invariably too small to have anything preserved anyway.
119624	01-Sep-2003	marcel	Use direct mapped KVA for the sf_buf allocator, as made possible by the previous commit. While here, fix a typo, reformat comments and fix a long line. Tested with: ftpd
119563	29-Aug-2003	alc	Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy sockets into machine-dependent files. The rationale for this migration is illustrated by the modified amd64 allocator. It uses the amd64's direct map to avoid emphemeral mappings in the kernel's address space. On an SMP, the emphemeral mappings result in an IPI for TLB shootdown for each transmitted page. Yuck. Maintainers of other 64-bit platforms with direct maps should be able to use the amd64 allocator as a reference implementation.
119337	23-Aug-2003	marcel	Remove unused inclusion of opt_acpi.h
119159	20-Aug-2003	marcel	Undo the mistake made in revision 1.77 of trap.c and which was the ultimate trigger for the follow-up fixes in revisions 1.78, 1.80, 1.81 and 1.82 of trap.c. I was simply too pre-occupied with the gateway page and how it blurs kernel space with user space and vice versa that I couldn't see that it was all a load of bollocks. It's not the IP address that matters, it's the privilege level that counts. We never run in user space with lifted permissions and we sure can not run in kernel space without it. Sure, the gateway page is the exception, but not if you look at the privilege level. It's user space if you run with user permissions and kernel space otherwise. So, we're back to looking at the privilege level like it should be. There's no other way. Pointy hat: marcel
119015	17-Aug-2003	gordon	Fixup the ELF branding information to point to the new home of rtld.
119004	16-Aug-2003	marcel	In vm_thread_swap{in\|out}(), remove the alpha specific conditional compilation and replace it with a call to cpu_thread_swap{in\|out}(). This allows us to add similar code on ia64 without cluttering the code even more.
118990	16-Aug-2003	marcel	Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to cpu.h. This affects db_command.c and kern_shutdown.c. ia64: move all MD prototypes from cpu.h to md_var.h. This affects madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory(). It's not used (vm_machdep.c). alpha: the MD prototypes have been left in cpu.h with a comment that they should be there. Moving them is left for later. It was expected that the impact would be significant enough to be done in a seperate commit. powerpc: MD prototypes left in cpu.h. Comment added. Suggested by: bde Tested with: make universe (pc98 incomplete)
118981	16-Aug-2003	marcel	Fix a range check bug. Don't left-shift the integer argument 'data'. Sign extension happens after the shift, not before so that boundary cases like 0x40000000 will not be caught properly. Instead, right shift ndirty. It is guaranteed to be a multiple of 8. While here, do some manual code motion and code commoning. Range check bug pointed out by: iedowse
118935	15-Aug-2003	marcel	Fix the generation of coredumps. We did not take the dirty registers that were on the kernel stack into account. For now we write them out to the register stack of the process before creating the dump. This however is not the final solution. The problem is that we may invalidate the coredump by overwriting vital information due to an invalid backing store pointer. Instead we need to write the dirty registers to an unused region of VM which will result in a seperate segment in the coredump. For now we can at least get to all the registers from a coredump.
118933	15-Aug-2003	marcel	Introduce two machine specific ptrace(2) requests: PT_GETKSTACK and PT_SETKSTACK. These requests allow the tracing process to access the dirty registers of the traced process that are on the kernel stack. Note that there's currently no way to access the rnat register for those dirty registers that are not (yet) covered by a nat collection point. The interface for this is still being slept on. Also note that implied by these requests is the division of work: The tracing process has to keep track of where registers are spilled and is responsible to figure out where the NaT bit of the stacked registers are at any time during the execution of the traced process. The kernel provides the interfaces but will not abstract the fact that the register stack can be split. This model does not follow the approach taken in Linux where PT_PEEK and PT_POKE deals with this automagically.
118853	13-Aug-2003	marcel	Don't use VM_MIN_KERNEL_ADDRESS to check if the faulting address is in user space or kernel space. VM_MIN_KERNEL_ADDRESS starts after the gateway page, which means that improper memory accesses to the gateway page while in user mode would panic the kernel. Use VM_MAX_ADDRESS instead. It ends before the gateway page. The difference between VM_MIN_KERNEL_ADDRESS and VM_MAX_ADDRESS is exactly the gateway page.
118851	13-Aug-2003	marcel	Put an instruction group break between the move to ar.rnat and the move to ar.rsc. The RSE must be in enforced lazy mode when writing to RSE modifyable registers. In this case we restore the RSE NaT collection register ar.rnat. I have seen 2 general exception faults on pluto1 now that indicate that the move to ar.rsc has already happened prior to the move to ar.rnat, meaning that the RSE is not in enforced lazy mode anymore. The ia64 dependency and instruction ordering rules seem to allow having both registers written to in the same instruction group, provided ar.rsc is written to later than ar.rnat (based on the ordering semantics). It appears that we may be pushing our luck. For now, put them in seperate cycles (by means of the instruction group break). If we ever get a general exception fault on the move to ar.rnat again, we have definite proof that something else is fishy.
118848	12-Aug-2003	imp	Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's copyrighted files. Approved by: Matt Dillon
118818	12-Aug-2003	marcel	Extend identifycpu(): o Differentiate between CPU family and CPU model. There are multiple Itanium 2 models and it's nice to differentiate between them. o Seperately export the CPU family and CPU model with sysctl. o Merced is the only model in the Itanium family. o Add Madison to the Itanium 2 family. We already knew about McKinley. o Print the CPU family between parenthesis, like we do with the i386 CPU class. My prototype now identifies itself as: CPU: Merced (800.03-Mhz Itanium) pluto1 and pluto2 will eventually identify themselves as: CPU: McKinley (900.00-Mhz Itanium 2)
118811	12-Aug-2003	marcel	Cleanup prototypes in cpu.h, including fswintrberr and any references to it. Sort the remaining prototypes in cpu.h. No functional change.
118739	10-Aug-2003	marcel	o move cpu_reset() from vm_machdep.c to machdep.c. o reorder cpu_boot(), cpu_halt() and identifycpu(). No functional change.
118717	10-Aug-2003	marcel	Now that we can ignore up to 8KB of dirty registers, remove the RSE magic from exec_setregs(). In set_mcontext() we now also don't have to worry that we entered the kernel with more that 512 bytes of dirty registers on the kernel stack. Note that we cannot make any assumptions anymore WRT to NaT collection points in exec_setregs(), so we have to deal with them now.
118640	08-Aug-2003	marcel	MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry().
118607	07-Aug-2003	jhb	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)
118590	07-Aug-2003	marcel	Better define the flags in the mcontext_t and properly set the flags when we create contexts. The meaning of the flags are documented in <machine/ucontext.h>. I only list them here to help browsing the commit logs: _MC_FLAGS_ASYNC_CONTEXT _MC_FLAGS_HIGHFP_VALID _MC_FLAGS_KSE_SET_MBOX _MC_FLAGS_RETURN_VALID _MC_FLAGS_SCRATCH_VALID Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)
118588	07-Aug-2003	marcel	o Fix cut-n-paste whitespace corruption in previous commit o For trap-based upcalls the argument (the kse_mailbox) to the UTS must be written onto the kernel stack, not the user stack. While here, deal with the fact that we may be at a NaT collection point.
118566	06-Aug-2003	marcel	In cpu_set_upcall_kse(), create the upcall according to the entry path into the kernel. Normally it's due to a syscall, but one can also be created as the result of a clock interrupt (for example). This now even more looks like exec_setregs(). While here, add an assert that we don't expect more than 8KB of dirty registers on the kernel stack.
118563	06-Aug-2003	marcel	o In revision 1.45 of exception.S we changed exception_restore to unconditionally restore ar.k7 (kernel memory stack) and ar.k6 (kernel register stack). I don't know what I was smoking then, but if you unconditionally restore ar.k6, you also want to compute its value unconditionally. By having the computation predicated and dependent on whether we return to user mode, we would end up writing junk (= invalid value for ar.bspstore) if we would return to kernel mode. But the whole point of the unconditional restoration was that there is a grey area where we still need to have ar.k6 restored. If we restore with a junk value, we would end up wedging the machine on the next interrupt. So, unconditionally calculate the value we unconditionally write to ar.k6. o The previous braino was found while making the following change: We used to clear the lower 9 bits of the value we write to ar.k6. The meaning being that we know that the kernel register stack is at least 512 byte aligned and simply clearing the lower 9 bits allows us to return to a context of which we don't have dirty registers on the kernel stack, even though the context that entered the kernel does have dirty registers on the kernel stack. By masking-off the lower bits, we correctly obtain the base of the register stack without having to worry that we didn't actually reached the base while unwinding it. The change is to mask off the lower 13 bits, knowing that the kernel register stack is always 8KB aligned. The advantage is that we don't have to worry anymore if there's more than 512 bytes of dirty registers on the kernel stack. A situation that frequently occurs. In exec_setregs() in machdep.c:1.147 or older, we had to deal with that situation by copying the active portion of the register stack down in multiples of 512 bytes. Now that we mask off the lower 13 bits we don't have to do that at all. Contemporary IPF processors have a register file that can hold up to 96 stacked registers (=784 bytes [incl. 2 NaT collections]). With no indication that register files grow beyond a couple of hundred registers, we should not have to worry about it anymore... and yes, 640KB is enough for everybody :-) This change helps setcontext(2) and cpu_set_upcall_kse() in that they can return to completely different contexts without having to mess with the kernel stack. Of course exec_setregs() doesn't need to do that anymore as well.
118503	05-Aug-2003	marcel	o Put the syscall return registers in the context. Not only do we need this for swapcontext(), KSE upcalls initiated from ast() also need to save them so that we properly return the syscall results after having had a context switch. Note that we don't use r11 in the kernel. However, the runtime specification has defined r8-r11 as return registers, so we put r11 in the context as well. I think deischen@ was trying to tell me that we should save the return registers before. I just wasn't ready for it :-) o The EPC syscall code has 2 return registers and 2 frame markers to save. The first (rp/pfs) belongs to the syscall stub itself. The second (iip/cfm) belongs to the caller of the syscall stub. We want to put the second in the context (note that iip and cfm relate to interrupts. They are only being misused by the syscall code, but are not part of a regular context). This way, when the context is switched to again, we return to the caller of setcontext(2) as one would expect. o Deal with dirty registers on the kernel stack. The getcontext() syscall will flush the RSE, so we don't expect any dirty registers in that case. However, in thread_userret() we also need to save the context in certain cases. When that happens, we are sure that there are dirty registers on the kernel stack. This implementation simply copies the registers, one at a time, from the kernel stack to the user stack. NAT collections are not dealt with. Hence we don't preserve NaT bits. A better solution needs to be found at some later time. We also don't deal with this in all cases in set_mcontext. No temporay solution is implemented because it's not a showstopper. The problem is that we need to ignore the dirty registers and we automaticly do that for at most 62 registers. When there are more than 62 dirty registers we have a memory "leak". This commit is fundamental for KSE support.
118450	04-Aug-2003	marcel	Fix logic bug in the previous commit. Any region less than 5 is a user space region. Hence, we need to test if 5 is greater than the region; not greater equal. This bug caused us to call ast() while interrupting kernel mode.
118443	04-Aug-2003	jhb	- Since td_critnest is now initialized in MI code, it doesn't have to be set in cpu_critical_fork_exit() anymore. - As far as I can tell, cpu_thread_link() has never been used, not even when it was originally added, so remove it.
118414	04-Aug-2003	marcel	Cleanup the clock code. This includes: o Remove alpha specific timer code (mc146818A) and compiled-out calibration of said timer. o Remove i386 inherited timer code (i8253) and related acquire and release functions. o Move sysbeep() from clock.c to machdep.c and have it return ENODEV. Console beeps should be implemented using ACPI or if no such device is described, using the sound driver. o Move the sysctls related to adjkerntz, disable_rtc_set and wall_cmos_clock from machdep.c to clock.c, where the variables are. o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't bother faking a stathz that's 1/8 of that. Keep it simple: hz defaults to HZ and stathz equals hz. This is also how it's done for sparc64. o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj) to calculate ITC skew and corrections. On average, we adjust the ITC match register once every ~1500 interrupts for a duration of 2 consequtive interruprs. This is to correct the non-deterministic behaviour of the ITC interrupt (there's a delay between the match and the raising of the interrupt). o Add 4 debugging sysctls to monitor clock behaviour. Those are debug.clock_adjust_edges, debug.clock_adjust_excess, debug.clock_adjust_lost and debug.clock_adjust_ticks. The first counts the individual adjustment cycles (when the skew first crosses the threshold), the second counts the number of times the adjustment was excessive (any non-zero value is to be considered a bug), the third counts lost clock interrupts and the last counts the number of interrupts for which we applied an adjustment (debug.clock_adjust_ticks / debug.clock_adjust_edges gives the avarage duration of an individual adjustment -- should be ~2). While here, remove some nearby (trivial) left-overs from alpha and other cleanups.
118402	04-Aug-2003	marcel	Fix handling of external interrupts: we weren't calling ast() when interrupting user mode. The net effect of this bug is that a clock interrupt does not cause rescheduling and processes are not preempted. It only takes a "while (1);" to render the machine useless. This bug was introduced by the context changes and EPC syscall code. Handling of ASTs was moved to C for clarity and ease of maintenance, but was not added for the external interrupt case. This needs to be revisited. We now have calls to do_ast() in trap(), break_syscall() and ivt_External_Interrupt(). A single call in exception_restore covers these 3 places without duplication. This is where we handled ASTs prior to the overhaul, except that the meat has been moved to do_ast(), a C function. This was the goal to begin with. Pointy hat: marcel
118296	01-Aug-2003	marcel	Write the preserved registers to (and read them from) struct reg and struct fpreg.
118244	31-Jul-2003	bmilekic	Make sure that when the PV ENTRY zone is created in pmap, that it's created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In the i386 case in particular, the pmap code would hook a special page allocation routine that allocated from kernel_map and not kmem_map, and so when/if the pageout daemon drained the zones, it could actually push out slabs from the PV ENTRY zone but call UMA's default page_free, which resulted in pages allocated from kernel_map being freed to kmem_map; bad. kmem_free() ignores the return value of the vm_map_delete and just returns. I'm not sure what the exact repercussions could be, but it doesn't look good. In the PAE case on i386, we also set-up a zone in pmap, so be conservative for now and make that zone also ZONE_NOFREE and ZONE_VM. Do this for the pmap zones for the other archs too, although in some cases it may not be entirely necessarily. We'd rather be safe than sorry at this point. Perhaps all UMA_ZONE_VM zones should by default be also UMA_ZONE_NOFREE? May fix some of silby's crashes on the PV ENTRY zone.
118239	31-Jul-2003	peter	Deal with 'options KSTACK_PAGES' being a global option.
118238	31-Jul-2003	peter	Cosmetic: fix some disorder of #include "opt_...." files
118237	31-Jul-2003	peter	Remove leftover relic of pmap_new_thread() etc.
118081	27-Jul-2003	mux	- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed memory in bus_dmamem_alloc(). This is possible now that contigmalloc() supports the M_ZERO flag. - Remove the locking of Giant around calls to contigmalloc() since contigmalloc() now grabs Giant itself.
118024	25-Jul-2003	alc	MFi386 revision 1.416 Add vm object locking to pmap_prefault(). Note: powerpc and sparc64 do not implement this function.
117999	25-Jul-2003	marcel	Remove __aligned(16) from the definition of struct _ia64_fpreg. It's a non-standard construct. Instead, redefine struct _ia64_fpreg as a union and put a long double in it. On ia64 and for LP64, this is defined by the ABI to have 16-byte alignment. For ILP32 a long double has 4-byte alignment, but we don't support ILP32. Note that the in-memory image of a long double does not match the in- memory image of spilled FP registers. This means that one cannot use the fpr_flt field to interpet the bits. For this reason we continue to use an aggregate type.
117993	25-Jul-2003	marcel	Move ia64_pa_access() from machdep.c to mem.c and declare it static. It's only used in mem.c and cannot accidentally be used elsewhere this way.
117984	25-Jul-2003	marcel	Disable the single-step trap on a debug related trap, including of course the single-step trap itself.
117608	15-Jul-2003	marcel	Rename thread_siginfo to cpu_thread_siginfo.
117499	13-Jul-2003	marcel	Enable the high FP registers when we call the FPSWA handler and disable them again afterwards. This fixes a disabled FP fault while in the FPSWA handler. While here, merge the FP fault and FP trap handling code to reduce code duplication. Where code was different, it was not sure it should be. Trigger case: ports/math/atlas
117467	12-Jul-2003	marcel	Add logic to trace across/over a trapframe. We have ABI markers in our unwind information for functions that are entry points into the kernel. When stepping to the next frame, the unwinder will let us know when sych a marker was encountered. We use this to stop the current unwind session, query the trapframe and restart a new unwind session based on the new trapframe. The implementation is a bit sloppy, but at this time there are bigger fish to fry.
117437	11-Jul-2003	marcel	Add a body directive before the first instruction in epc_syscall(). This results in a zero length prologue and a body that covers the whole function. This is more correct.
117436	11-Jul-2003	marcel	Remove a gratuitous align directive after the endp directive for IVT entries.
117267	05-Jul-2003	marcel	Don't call malloc() and free() while in the debugger and unwinding to get a stacktrace. This does not work even with M_NOWAIT when we have WITNESS and is generally a bad idea (pointed out by bde@). We allocate an 8K heap for use by the unwinder when ddb is active. A stack trace roughly takes up half of that in any case, so we have some room for complex unwind situations. We don't want to waste too much space though. Due to the nature of unwinding, we don't worry too much about fragmentation or performance of unwinding while in the debugger. For now we have our own heap management, but we may be able to leverage from existing code at some later time. While here: o Make sure we actually free the unwind environment after unwinding. This fixes a memory leak. o Replace Doug's license with mine in unwind.c and unwind.h. Both files don't have much, if any, of Doug's code left since the EPC syscall overhaul and the import of the unwinder. o Remove dead code. o Replace M_NOWAIT with M_WAITOK for all remaining malloc() calls.
117206	03-Jul-2003	alc	Background: pmap_object_init_pt() premaps the pages of a object in order to avoid the overhead of later page faults. In general, it implements two cases: one for vnode-backed objects and one for device-backed objects. Only the device-backed case is really machine-dependent, belonging in the pmap. This commit moves the vnode-backed case into the (relatively) new function vm_map_pmap_enter(). On amd64 and i386, this commit only amounts to code rearrangement. On alpha and ia64, the new machine independent (MI) implementation of the vnode case is smaller and more efficient than their pmap-based implementations. (The MI implementation takes advantage of the fact that objects in -CURRENT are ordered collections of pages.) On sparc64, pmap_object_init_pt() hadn't (yet) been implemented.
117161	02-Jul-2003	ru	The .s files were repo-copied to .S files. Approved by: marcel Repocopied by: joe
117142	02-Jul-2003	marcel	The use of SYSINIT requires the inclusion of <sys/kernel.h>
117139	01-Jul-2003	mux	Make this even closer to other busdma backends.
117133	01-Jul-2003	mux	Sync bounce pages support with the alpha backend. More precisely: o use a mutex to protect the bounce pages structure. o use a SYSINIT function to initialize the bounce pages structures and thus avoid a race condition in alloc_bounce_pages(). o add support for the BUS_DMA_NOWAIT flag in bus_dmamap_load(). o remove obsolete splhigh()/splx() calls. o remove printf() about incorrect locking in busdma_swi() and sync busdma_swi() with the one of the alpha backend. o use __FBSDID.
117129	01-Jul-2003	mux	Honor the boundary of the busdma tag when allocating bounce pages. This was fixed in revision 1.5 of alpha/alpha/busdma_machdep.c and was never fixed in other busdma backends using bounce pages.
117126	01-Jul-2003	scottl	Mega busdma API commit. Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg. Lockfunc allows a driver to provide a function for managing its locking semantics while using busdma. At the moment, this is used for the asynchronous busdma_swi and callback mechanism. Two lockfunc implementations are provided: busdma_lock_mutex() performs standard mutex operations on the mutex that is specified from lockfuncarg. dftl_lock() is a panic implementation and is defaulted to when NULL, NULL are passed to bus_dma_tag_create(). The only time that NULL, NULL should ever be used is when the driver ensures that bus_dmamap_load() will not be deferred. Drivers that do not provide their own locking can pass busdma_lock_mutex,&Giant args in order to preserve the former behaviour. sparc64 and powerpc do not provide real busdma_swi functions, so this is largely a noop on those platforms. The busdma_swi on is64 is not properly locked yet, so warnings will be emitted on this platform when busdma callback deferrals happen. If anyone gets panics or warnings from dflt_lock() being called, please let me know right away. Reviewed by: tmm, gibbs
117045	29-Jun-2003	alc	- Export pmap_enter_quick() to the MI VM. This will permit the implementation of a largely MI pmap_object_init_pt() for vnode-backed objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64 and powerpc. - Correct a mismatch between pmap_object_init_pt()'s prototype and its various implementations. (I plan to keep pmap_object_init_pt() as the MD hook for device-backed objects on i386 and amd64.) - Correct an error in ia64's pmap_enter_quick() and adjust its interface to match the other versions. Discussed with: marcel
117022	29-Jun-2003	alc	- Remove the calls to pmap_install() from pmap_object_init_pt(); they are redundant. Discussed with: marcel - MFi386: Add vm object locking to pmap_object_init_pt().
116971	28-Jun-2003	marcel	Implement cpu_set_upcall_kse(). Elementary testing shows that this function behaves correctly in principle, but is not expected to be 100% complete. In any case, with this commit we have KSE ported enough to start runtime testing with threaded applications and fix whatever bugs or omissions we encounter. Yay!
116958	28-Jun-2003	davidxu	Add a machine depended function thread_siginfo, SA signal code will use the function to construct a siginfo structure and use the result to export to userland. Reviewed by: julian
116907	27-Jun-2003	scottl	Do the first and mostly mechanical step of adding mutex support to the bus_dma async callback scheme. Note that sparc64 does not seem to do async callbacks. Note that ia64 callbacks might not be MPSAFE at the moment. Note that powerpc doesn't seem to do async callbacks due to the implementation being incomplete. Reviewed by: mostly silence on arch@
116510	18-Jun-2003	alc	Fix a performance bug in all of the various implementations of uma_small_alloc(): They always zeroed the page regardless of what the caller requested.
116361	15-Jun-2003	davidxu	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.
116355	14-Jun-2003	alc	Migrate the thread stack management functions from the machine-dependent to the machine-independent parts of the VM. At the same time, this introduces vm object locking for the non-i386 platforms. Two details: 1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The different machine-dependent implementations used various combinations of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set KSTACK_GUARD_PAGES to 0. 2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In 5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed to vm_page_alloc() or vm_page_grab().
116328	14-Jun-2003	alc	Move the _new_altkstack() and _dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.
116227	12-Jun-2003	marcel	Make sure pcpu->pc_pcb is pointing to a 16-byte aligned address. The PCB contains FP registers, whose alignment must be 16 bytes at least. Since the PCB pointed to by pc_pcb is immediately after the PCPU itself, round-up the size of thge PCPU to a multiple of 16 bytes. The PCPU is page aligned. This fixes a misalignment trap caused by stopping a CPU in a SMP kernel, such as been done when entering the debugger. Reported by: Alan Robinson <alan.robinson@fujitsu-siemens.com>
116188	11-Jun-2003	peter	GC unused cpu_wait() function
115937	07-Jun-2003	marcel	pmap_find_vhpt() has been observed to return a NULL pointer when the caller assumes this to not happen by means of performing an indirection without checking the return value. Add KASSERTs to force a kernel with INVARIANTS to panic. This is a short-term measure. The pmap code is scheduled to be overhauled.
115936	07-Jun-2003	marcel	If we get a fault in the gateway page, which would happen if we try to deliver a signal and the RSE backing store has been exhausted or the backing store pointer has been clobbered, we need to make sure we call userret() and do_ast() when we exit from trap(). Not adjusting the local variable 'user' in this case will prevent the faulty process from being terminated and we end up in an infinite fault repetition. Faulty process provided by: bento
115916	06-Jun-2003	marcel	Use TRAPF_USERMODE() to replace an equivalent check in trap(). While here, amend the related comment.
115859	04-Jun-2003	marcel	Fix the dreaded double counting that was present on alpha as well and got fixed two weeks after the ia64 version was copied from the alpha version (see rev 1.32 of sys/alpha/alpha/mem.c). As such, we were missing the same continue as on alpha. While here, add a default case for the device minor switch and do some general style(9) cleanups. WARNING: this file still has bugs. When reading from region 6 or region 7, we don't validate the physical address. One can trivially cause a machine check by trying to read from address 0xFFFFFFFFFFFFFFF0 or something that uses the unimplemented physical address bits. Reported by: Alan Robinson <alan.robinson@fujitsu-siemens.com>
115858	04-Jun-2003	marcel	Change the second (and last) argument of cpu_set_upcall(). Previously we were passing in a void* representing the PCB of the parent thread. Now we pass a pointer to the parent thread itself. The prime reason for this change is to allow cpu_set_upcall() to copy (parts of) the trapframe instead of having it done in MI code in each caller of cpu_set_upcall(). Copying the trapframe cannot always be done with a simply bcopy() or may not always be optimal that way. On ia64 specifically the trapframe contains information that is specific to an entry into the kernel and can only be used by the corresponding exit from the kernel. A trapframe copied verbatim from another frame is in most cases useless without some additional normalization. Note that this change removes the assignment to td->td_frame in some implementations of cpu_set_upcall(). The assignment is redundant. A previous call to cpu_thread_setup() already did the exact same assignment. An added benefit of removing the redundant assignment is that we can now change td_pcb without nasty side-effects. This change officially marks the ability on ia64 for 1:1 threading. Not tested on: amd64, powerpc Compile & boot tested on: alpha, sparc64 Functionally tested on: i386, ia64
115652	01-Jun-2003	marcel	Improve set_mcontext: o Don't copy psr verbatim from the user supplied context. Only allow userland to change the processor settings that are part of the user mask.
115651	01-Jun-2003	marcel	Improve on cpu_set_upcall: o Use pcb and tf for the new pcb and the new trapframe and use pcb0 for the old (current) pcb. The mix of pcb, pcb2 and tf was slightly confusing. o Don't define td->td_frame here. It has already been set previously by cpu_thread_setup. Add a KASSERT to make sure pcb and tf are both non-NULL. o Make sure the number of dirty registers is 0 for the new thread. There are no user registers on the backing store because we heven't enter userland yet.
115605	01-Jun-2003	marcel	Implement cpu_thread_setup(). This is mostly the same as on i386, except for the fact that trapframes have a size recorded in it that we set here too. We need this for proper thread setup. Pointed out by: mtm
115573	31-May-2003	marcel	Now that we have the signal trampolines in the gateway page and the gateway page is considered kernel space, we can panic when we should only SIGSEGV. Hence, add the additional constraint that for page faults we also require running with kernel privileges. The gateway page is the only kernel code running with user privileges, iso this is a correct way to exclude the gateway page from kernel land. We do not currently exclude the gateway page for other faults as it is not always the right way to do it. Further tuning will happen on a case by case bases.
115570	31-May-2003	marcel	Implement cpu_set_upcall(). Required by libthr and used by thr_create(2). This implementation is so far only compile tested. But since this is also the last of the functions required to support libthr, we're now functionally complete (for some weird definition of functionally; and complete). Runtime testing can commence.
115566	31-May-2003	marcel	Implement set_mcontext() and get_mcontext(). Just as for sendsig() and sigreturn(), we cheat and assume the preserved registers are still on-chip and unmodified. This is actually the case, but more by accident than by design. We need to use unwinding eventually or explicitly compile the kernel in a way that the compiler steers clear from using the preserved registers completely.
115563	31-May-2003	marcel	Some ia32 related finetuning for the EPC syscall path: o The SDM states that flushing the RSE in the cycle prior to the call to ia32 code yields the best performance. We don't really care to much about performance here, but we do the same anyway. I'm being paranoia and conservative here. o Only initialize the ia32 state registers, not the registers used as scratch by the ia32 engine. This saves a couple of loads from the trapframe, but also helps debugging: we don't clobber useful debugging data (engineering hints :-) o Make sure all general registers constituting ia32 state have been initialized. If there's no useful to be loaded from the trapframe, clear the register. This avoids accidentally leaking NaT bits. o Make sure we set ar.k6 prior to clobbering ar.bspstore and also set ar.k7 prior to setting sp. This fixes a race seen for ia64 native code as well (and previously fixed too).
115558	31-May-2003	marcel	Make sure we have all the dirty registers in user frames on the backing store before we discard them. It is possible that we enter the kernel (due to an execve in this case) with a lot of dirty user registers and that the RSE has only partially spilled them (to make room for new frames). We cannot move the backing store pointer down (to discard user registers) when not all of the user registers are on the backing store. So, we flush the register stack IFF this happens. Unconditionally doing the flush is too costly, because the condition in which we need to flush is very rare. This change appears to fix the SIGSEGV that sometimes happen for newly executed processes and so far also appears to fix the last of the corruption. It is possible, although not likely, that this change prevents some other bug from happening, even though it is itself not a fix. Hence the uncertainty. We'll know in a couple of months I guess :-)
115378	29-May-2003	marcel	Move the sysctls of the misalignment handler to where they belong and use OID_AUTO instead of fixed IDs. Approved by: re@ (blanket)
115376	29-May-2003	marcel	Fix what I think is a cut-n-paste bug: use OID_AUTO for the print_usertrap sysctl instead of CPU_UNALIGNED_PRINT. The latter is used already. Approved by: re@ (blanket)
115344	27-May-2003	marcel	A flushrs must be the first in an instruction group. Approved by: re@ (blanket)
115343	27-May-2003	scottl	Bring back bus_dmasync_op_t. It is now a typedef to an int, though the BUS_DMASYNC_ definitions remain as before. The does not change the ABI, and reverts the API to be a bit more compatible and flexible. This has survived a full 'make universe'. Approved by: re (bmah)
115342	27-May-2003	marcel	Have the unwinder allocate memory with M_NOWAIT. The unwinder is used by DDB and we cannot know in advance whether it's save to sleep. It often enough isn't. We may want to pre-allocate space to cover the most common cases without having to use malloc at all, but that requires some analysis. We leave that for later. Approved by: re@ (blanket)
115341	27-May-2003	marcel	Fix fu{byte\|word} and su{byte\|word}: o If the address was not within user space we jumped to fusufault where we would clear pcb_onfault and return 0. There are two bugs here: 1. We never got to the point where we assigned the address of pcb_onfault to r15, which means that we would clobber some random memory location, including I/O space or ROM. 2. We're supposed to return -1 on error. o Make sure we have proper memory ordering for setting pcb_onfault, doing the memory access to user space and clearing pcb_onfault. For the fu* family of functions this means that we need a mf instruction, because we don't have acquire semantics on stores and release semantics on loads (hence st;ld cannot be ordered without intermediate mf). While here, implement casuptr() so that we are a (small) step closer to supporting libthr and deobfuscate the non-implementation of {f\|s}uswintr. Approved by: re@ (blanket)
115339	26-May-2003	marcel	Revision 1.99 of this file changed the allocation request from VM_ALLOC_INTERRUPT to VM_ALLOC_SYSTEM. There was no mention of this in commit log as it was considered harmless. Guess what: it does harm. WITNESS showed that we can not safely grab the page queue lock in vm_page_alloc() in all cases as we may have to sleep on it. Revert the request to VM_ALLOC_INTERRUPT to circumvent this. We panic if vm_page_alloc returns 0. I'm not entirely happy about this, but we have bigger fish to fry. Approved by: re@ (blanket)
115298	25-May-2003	marcel	Now that we define user mode as any IP address that isn't in the kernel's VA regions, we cannot limit the use of break-based syscalls to user mode only. The signal trampolines are in the gateway page, which is mapped into the process address space in region 5 and thus is kernel space. We don't special case the gateway page here. Allow break-based syscalls from anywhere in the kernel VA space. Approved by: re@ (blanket)
115296	24-May-2003	marcel	Fix a source of instability specific to an EPC userland. We return to userland with interrupts disabled until we restore PSR. However, it has been observed that interrupts do actually happen before they are enabled again. This is a bit surprising and I don't know yet what's going on exactly. Nevertheless, the code was not crafted carefully enough to allow interrupts to happen and we could clobber the kernel stack of another thread when interrupts did happen. This is what happens: we restore the (memory) stack pointer (sp) and the register stack base prior to restoring ar.k6 and ar.k7. This is not a problem if interrupts don't happen between setting sp/ar.bspstore and ar.k6/ar.k7. Alas, interrupts can happen. Since sp/ar.bspstore already point to the userland stacks, we need to switch to the kernel stack in interrupt. However, ar.k6 and ar.k7 have not been set, which means that we were switching to some unrelated kstack and happily clobbered the trapframe present there if the thread to which the kstack belonged was in kernel mode or otherwise we could have our trapframe clobbered if that other thread enters the kernel. Nasty either way. We now carefully restore ar.k6 prior to restoring ar.bspstore and likewise for ar.k7 and sp. All we need is the guarantee that an interrupt does not clobber ar.k6 or ar.k7 before we're back in userland. That has been achieved by restoring ar.k6/ar.k7 unconditionally (see exception.s) While here, remove the disabling of interrupts on EPC entry. It was added as a way to "resolve" the crashes until it was understood what was going on. I think I achieved the latter, so we can remove the patch. Note that setting up a trapframe with interrupts enabled has it's own share of corner cases, but it's better to properly fixed those than to keep a mostly wrong patch around because we're afraid to remove it... Approved by: re@ (blanket)
115294	24-May-2003	marcel	Consistently us the same metric to differentiate between kernel mode and user mode. We need to take into account that the EPC syscall path introduces a grey area in which one can argue either way, including a third: neither. We now use the region in which the IP address lies. Regions 5, 6 and 7 are kernel VA regions and if the IP lies any any of those regions we assume we're in kernel mode. Hence, we can be in kernel mode even if we're not on the kernel stack and/or have user privileges. There're gremlins living in the twilight zone :-) For the EPC syscall path this particularly means that the process leaves user mode the moment it calls into the gateway page. This makes the most sense because from a process' point of view the call represents a request to the kernel for some service and that service has been performed if the call returns. With the metric we picked, this also means that we're back in user mode IFF the call returns. Approved by: re@ (blanket)
115291	24-May-2003	marcel	Unconditionally restore ar.k7 (memory stack) and ar.k6 (register stack) when returning from an interrupt. Both registers are used on interrupt to switch to the right kernel stack, but other than that they are not used. This means we only have to make sure they contain proper values while in user mode. As such, we conditionally restored these registers based on whether we returned to userland or not. A nice property of conditionally restoring ar.k6 and ar.k7 is that it introduces two invariants: ar.k6 always points to the bottom of the kernel stack and ar.k7 always points to the top of the kernel stack (immediately below the PCB we have there). However, the EPC syscall path introduces an irregularity: there's no "thin red line" between user and kernel. There's a grey area that's a couple of instructions wide. Any interruption in that grey area is bound to see an inconsistent state. One such state is that we're in kernel space for all practical purposes, but we still need to have ar.k6 and ar.k7 restored as if we're in userland. Thus: restore ar.k6 and ar.k7 unconditionally at the cost of losing a valuable invariant. Both registers now hold the extend of the usable portion of the kernel stack at any interrupt nesting, which when in userland mean the bottom and the top of the kstack.
115276	24-May-2003	marcel	Fix an alpha inheritance bug: On alpha, PAL is involved in context management and after wiring the CPU (in alpha_init()) a context switch was performed to tell PAL about the context. This was bogusly brought over to ia64 where it introduced bugs, because we restored the context from a mostly uninitialized PCB. The cleanup constitutes: o Remove the unused arguments from ia64_init(). o Don't return from ia64_init(), but instead call mi_startup() directly. This reduces the amount of muckery in assembly and also allows for the next bullet: o Save our currect context prior to calling mi_startup(). The reason for this is that many threads are created from thread0 by cloning the PCB. By saving our context in the PCB, we have something sane to clone. It also ensures that a cloned thread that does not alter the context in any way will return to the saved context, where we're ready for the eventuality with a nice, user unfriendly panic(). The cleanup fixes at least the following bugs: o Entering mi_startup() with the RSE in enforced lazy mode. o Re-execution of ia64_init() in certain "lab" conditions. While here, add proper unwind directives to __start() so that the unwind knows it has reached the bottom of the (call) stack. Approved by: re@ (blanket)
115274	23-May-2003	marcel	Fix a (new) source of instability: When interrupting a kernel context, we don't need to switch stacks (memory nor register). As such, we were also not restoring the register stack pointer (ar.bspstore). This, however, fails to be valid in 1 situation: when we interrupt a register stack switch as is being done in restorectx(). The problem is that restorectx() needs to have ar.bsp == ar.bspstore before it can assign the new value to ar.bspstore. This is achieved by doing a loadrs prior to assigning to ar.bspstore. If we take an interrupt in between the loadrs and the assignment and we don't make sure we restore the ar.bspstore prior to returning from the interrupt, we switch stacks with possibly non-zero dirty registers, which means that the new frame pointer (ar.bsp) will be invalid. So, instead of jumping over the restoration of the register frame pointer and related registers, we conditionalize it based on whether we return to kernel context or user context. A future performance tweak is possible by only restoring ar.bspstore when returning to kernel mode and when the RSE is in enforced lazy mode. One cannot assume ar.bsp == ar.bspstore if the RSE is not in enforced lazy mode anyway. While here (well, not quite) don't unconditionally assign to ar.bspstore in exception_save. Only do that when we actually switch stacks. It can only harm us to do it unconditionally. Approved by: re@ (blanket)
115270	23-May-2003	marcel	In swapctx(), put the RSE in enforced lazy mode before we flush the register stack. There's nothing really wrong with flushing before putting the RSE in enforced lazy mode, provided you don't depend on ar.bspstore being equal to ar.bsp when the RSE has been put in enforced lazy more. The small window between the flush and setting the RSE may be sufficient to have the RSE eagerly increase the dirty region (and hence cause ar.bspstore != ar.bsp) or have an interrupt that may even get the laziest RSE to do something. Anyway: we don't depend on ar.bspstore being equal to ar.bsp, so nothing was and is broken. But the code was non-intuitive and easily confuses. This is a source of future bugs. Note: the advantage of not depending on ar.bspstore is that there's some recilience against an interrupted flushrs. Clobbering is limited to stacked register contents only, not to RSE address clobbering. Approved: re@ (blanket)
115179	20-May-2003	marcel	o Fix a definite bogon: the dirty bity fault, instruction access failt and data access fault install the PTE in question into the VHPT table. However, a post-increment was missing and we wrote the raw PTE data into the pagesize/access key field. This leaves a corrupt VHPT entry. o While here, remove the explicit cache purge. Insertion into the translation implicitly purges any overlapping entries. o Make sure there's a cycle break between the itc and the rfi. o Whitespace fixes.
115178	20-May-2003	marcel	Rename the "IA64 ITC" counter to "ITC" counter. We don't call the "TSC" counter on i386 "I386 TSC". Approved by: re@ (blanket)
115176	20-May-2003	marcel	Prevent corruption of the VHPT collision chain by protecting it with a mutex. The only volatile chain operations are insertion and deletion but since updating an existing PTE also updates the VHPT entry itself, and we have the VHPT mutex in both other cases, we also lock when we update an existing PTE even though no chain operation is involved. Note that we perform the insertion and deletion careful enough that we don't need to lock traversals. If we need to lock traversals, we also need to lock from the exception handler, which we can't without creating a trapframe. We're now able to withstand a -j8 buildworld. More work is needed to withstand Murphy fields. In other words: we still have a bogon... Approved by: re@ (blanket)
115152	19-May-2003	marcel	Turn pmap_install_pte() into a critical section. We better not get interrupted while writing into the VHPT table. While here, make sure memory accesses a properly ordered. Tag invalidation must happen first so that the hardware VHPT walker will not be able to match this entry while we're updating it and we have to make sure the new new tag gets written only after the PTE is completely updated. Approved by: re (blanket)
115149	19-May-2003	marcel	Unconditionally set pcb_current_pmap. WIP versions of the code previously committed cleared pcb_current_pmap prior to changing the region registers, but that was removed before committing. Since we don't normally (at all?) pass a NULL pointer, the bug was mostly harmless. Fix it while I'm here... I'm here because we need to have data serialization after writing to the region registers. Not doing so was likely the cause of the hangs we were experiencing. General exceptions in cpu_switch may also be caused by the lack of serialization. Approved by: re (blanket)
115148	19-May-2003	marcel	pmap_install() needs to be atomic WRT to context switching. Protect switching user regions (region 0-4) with schedlock. Avoid unnecessary recursion on schedlock by moving the core functionality to another function (pmap_switch()) where we assert schedlock is held. Turn pmap_install() into a wrapper that grabs schedlock. This minimizes the number of callsites that need to be changed. Since we already have schedlock in cpu_switch() and cpu_throw(), have them call pmap_switch() directly. These were also the only two calls to pmap_install() outside pmap.c, so make pmap_install() static and remove its prototype from pmap.h Approved by: re (blanket)
115094	17-May-2003	marcel	Remove unused files. cpu_switch() and cpu_throw(), normally in swtch.s, can be found in machdep.c. Approved: re@
115084	16-May-2003	marcel	Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)
115063	16-May-2003	marcel	o In pmap_install, don't prevent switching the pmap if we're switching to kernel_pmap. The pmap is not special enough. o Clear the active bit on the pmap we're switching out. o Fix some nearby style(9) bugs. Approved by: re@
115059	16-May-2003	marcel	Indent a comment. This makes 1.100. Still approved by: re@ (blanket)
115058	16-May-2003	marcel	Turn pmap_growkernel() into a critical section. While here, initialize kernel_vm_end in pmap_bootstrap. Don't delay the initialization until we need to grow the kernel VM space. This BTW happens twice before we enter either single- or multi-user mode. Don't adjust kernel_vm_end while growing based on whether the KPT contains a non-NULL entry. We trust kernel_vm_end to be correct and we make sure it's still correct after growing. Define virtual_avail and virtual_end in terms of VM_MIN_KERNEL_ADDRESS and VM_MAX_KERNEL_ADDRESS (resp). Don't hardcode region knowledge.
115057	16-May-2003	marcel	Revamp the RID allocation code: o Limit the size of the region ID map to 64KB. This gives a bitmap that is large enough to keep track of 2^19 numbers. The minimal map size is 32KB. The reason we limit the map size is that processor models may have implemented a 24-bit region ID, which would give a 2MB bitmap while the maximum number of allocations is always less than PID_MAX*5, which is less than 2^19. o Allocate all region IDs up-front. The slight downside of reserving more RIDs then a process needs (3 for ia64 native and 1 for ia32) is preferable over the call to pmap_ensure_rid() where RIDs are allocated on demand. On SMP systems this may lead to a race condition. o When allocating a region ID, don't use arc4random(). We're not interested in randomness or uniform distribution across the spectrum. We only need uniqueness. Random numbers may easily collide when the number of allocated RIDs is high, creating a possibly unbounded retry rate.
115056	16-May-2003	marcel	Move the conditional definition of KSTACK_MAX_PAGES up ahead where it's more visible. Approved by: re@ (blanket)
115018	15-May-2003	marcel	This file contains elementary context related functions used to save and restore "sets" of registers in various places. The restorectx and swapctx functions are used by cpu_switch() and deal with the special registers, as well as the preserved registers. The callee_saved functions are used to save and restore the preserved registers (integer and floating-point). They are useful for signal delivery and ptrace support. The save_high_fp and restore_high_fp functions are used to "load" and "unload" to and from the CPU as part of lazy context switching. The ia32 specific context functions have been kept with the ia32 code. Approved by: re@ (blanket)
115017	15-May-2003	marcel	This file contains the code that implements the syscall path based on the epc instruction. The epc instruction, given the permissions of the page in which the epc is located, allows the privilege level to be increased with little or no overhead. The previous privilege level is recorded in the current frame marker and is restored by a regular (function) return. Since the epc instruction has to live in a page with non-standard properties, we hardwire a "gateway" page in the address space. The address of the gateway page is exported to userland in ar.k7. This allows us to rewire the page without breaking the ABI. The syscall stubs in libc are regular function calls that slightly differ from the normal runtime. The difference is mostly to simplify the stubs themselves by by moving some of the logic to the kernel. The libc stubs call into the gateway page (offset 0), from where the kernel trampolines to the code that sets up a minimal trapframe and arranges to execute from the kernel stack. The way back is basicly the same. The kernel returns to the gateway page, whereby privilege is dropped, and jumps back to the syscall stub. Only the special registers are saved in the trapframe. None of the scratch registers are preserved and since the kernel follows the same runtime model, none of the preserved registers are saved. Future enhancements can include the implementation of lightweight syscalls, where kernel functions are performed without setting up a trapframe. Good candidates are the *context syscalls for example. Now that there's a gateway page from which code can be executed in a non-privileged context, we also have the ideal place to put the signal trampolines. By moving the signal trampolines from the user stack to the gateway page, we open up the doors to unexecutable stacks. The gateway page contains signal trampolines for both the "legacy" break-based syscall code and the new and improved epc- based syscall code. Approved: re@ (blanket)
114983	13-May-2003	jhb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
114616	03-May-2003	marcel	Fix c99 victim: the accepted character '0 most now be types as '0'.
114553	02-May-2003	marcel	Option KADB does not exist. It came from alpha, where it still exists.
114305	30-Apr-2003	jhb	Range check the syscall number before looking it up in the syscallnames[] array. Submitted by: pho
114208	29-Apr-2003	marcel	Revamp the newbus functions: o do not use the in* and out* functions. These functions are used by legacy drivers and thus must have ia32 compatible behaviour. Hence, they need to have fences. Using these functions for newbus would then pessimize performance. o remove the conditional compilation of PIO and/or MEMIO support. It's a PITA without having any significant benefit. We always support them both. Since there are no I/O ports on ia64 (they are simulated by the chipset by translating memory mapped I/O to predefined uncacheable memory regions) the only difference between PIO and MEMIO is in the address calculation. There should be enough ILP that can be exploited here that making these computations compile-time conditional is not worth it. We now also don't use the read* and write* functions. o Add the missing *_8 variants. They were missing, although not missed. It's for completeness. o Do not add the fences that were present in the low-level support functions here. We're using uncacheable memory, which means that accesses are in program order. Change the barrier implementation to not only do a memory fence, but also an acceptance fence. This should more reliably synchronize drivers with the hardware. The memory fence enforces ordering, but does not imply visibility (ie the access does not necessarily have happened). This is what the acceptance deals with. cpufunc.h cleanup: o Remove the low-level memory mapped I/O support functions. They are not used. Keep the low-level I/O port access functions for legacy drivers and add fences to ensure ia32 compatibility. o Remove the syscons specific functions now that we have moved the proper definitions where they belong. o Replace the ia64_port_address() and ia64_memory_address() functions with macros. There's a bigger change inline functions get inlined when there aren't function callsi and the calculations are simply enough to do it with macros. Replace the one reference to ia64_memory address in mp_machdep.c to use the macro.
114029	25-Apr-2003	jhb	- Push down Giant into the sysarch() calls that still need Giant. - Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is invalid.
113998	25-Apr-2003	deischen	Add an argument to get_mcontext() which specified whether the syscall return values should be cleared. The system calls getcontext() and swapcontext() want to return 0 on success but these contexts can be switched to at a later time so the return values need to be cleared in the saved register sets. Other callers of get_mcontext() would normally want the context without clearing the return values. Remove the i386-specific context saving from the KSE code. get_mcontext() is not i386-specific any more. Fix a bad pointer in the alpha get_mcontext() code. The context was being bcopy()'d from &td->tf_frame, but tf_frame is itself a pointer, so the thread was being copied instead. Spotted by jake. Glanced at by: jake Reviewed by: bde (months ago)
113833	22-Apr-2003	davidxu	Remove single threading detecting code, these code really should be replaced by thread_user_enter(), but current we don't want to enable this in trap.
113831	22-Apr-2003	marcel	Don't use the tpa instruction to implement pmap_kextract. The tpa instruction requires that a translation is present in the TC. This may trigger a TLB miss and a subsequent call to vm_fault(). This implementation is deliberately non-inline for debugging and profiling purposes. Partial or full inlining should eventually be done. Valuable insights by: jake
113686	18-Apr-2003	jhb	Use the proc lock to protect p_singlethread and a P_WEXIT test. This fixes a couple of potential KSE panics on non-i386 arch's that weren't holding the proc lock when calling thread_exit().
113347	10-Apr-2003	mux	Change the operation parameter of bus_dmamap_sync() from an enum to an int and redefine the BUS_DMASYNC_* constants as flags. This allows us to specify several operations in one call to bus_dmamap_sync() as in NetBSD.
113255	08-Apr-2003	des	Introduce an M_ASSERTPKTHDR() macro which performs the very common task of asserting that an mbuf has a packet header. Use it instead of hand- rolled versions wherever applicable. Submitted by: Hiten Pandya <hiten@unixdaemons.com>
113181	06-Apr-2003	marcel	Remove the 32KB VHPT section from the kernel image. We don't really use it because we allocate a VHPT based on the size of the physical memory and even if the allocated VHPT is 32KB, we don't use the in- image section for it. Since the VHPT must be naturally aligned, we save 48K on average (due to alignment). Consequently, we start off with the VHPT disabled (it is assumed the VHPT is disabled because the EFI loader runs without memory address translation and thus has no need to setup the VHPT). It's probably a good idea to explicitly disable the VHPT if we make the use of the VHPT optional.
113160	06-Apr-2003	marcel	Also set the access bit in the PTE when we get a data dirty bit fault. This avoids an immediate access bit fault when we serviced the dirty bit fault in case the access bit is unset. This typically happens for newly allocated memory that's being zeroed and thus very common.
113140	05-Apr-2003	marcel	Include <geom/geom_disk.h> and stop including <sys/disk.h>. The former gives us 'struct disk'.
113090	04-Apr-2003	des	Define ovbcopy() as a macro which expands to the equivalent bcopy() call, to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's bcopy() doesn't handle overlap like ours. Remove all implementations of ovbcopy(). Previously, bzero was a function pointer on i386, to save a jmp to bzero_vector. Get rid of this microoptimization as it only confuses things, adds machine-dependent code to an MD header, and doesn't really save all that much. This commit does not add my pagezero() / pagecopy() code.
112946	01-Apr-2003	phk	Use bioq_flush() to drain a bio queue with a specific error code. Retain the mistake of not updating the devstat API for now. Spell bioq_disksort() consistently with the remaining bioq_*(). #include <geom/geom_disk.h> where this is more appropriate.
112898	01-Apr-2003	jeff	- Define a new md function 'casuptr'. This atomically compares and sets a pointer that is in user space. It will be used as the basic primitive for a kernel supported user space lock implementation. - Implement this function in x86's support.s - Provide stubs that return -1 in all other architectures. Implementations will follow along shortly. Reviewed by: jake
112888	31-Mar-2003	jeff	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
112883	31-Mar-2003	jeff	- Change trapsignal() to accept a thread and not a proc. - Change all consumers to pass in a thread. Right now this does not cause any functional changes but it will be important later when signals can be delivered to specific threads.
112882	31-Mar-2003	jeff	- Use sigexit() instead of twiddling the signal mask, catch, ignore, and action bits to allow SIGILL to work as expected. This brings this file in line with other architectures.
112569	25-Mar-2003	jake	- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)
112436	20-Mar-2003	mux	Use atomic operations to increment and decrement the refcount in busdma tags. There are currently no tags shared accross different drivers so this isn't needed at the moment, but it will be required when we'll have a proper newbus method to get the parent busdma tag.
112235	14-Mar-2003	mux	Bah, get it right this time and add sys/lock.h before sys/mutex.h.
112215	14-Mar-2003	mux	Oops, add missing includes. Pass me the pointy hat. Reported by: jake
112196	13-Mar-2003	mux	Grab Giant around calls to contigmalloc() and contigfree() so that drivers converted to be MP safe don't have to deal with it.
112195	13-Mar-2003	mux	Memory allocated with contigmalloc() should be freed with contigfree(), not with free().
112051	10-Mar-2003	marcel	Fix two rounds of breakages and cleanup. Remove the sccdebug sysctl while I'm here and garbage collect dead code (ssc_clone). Define d_maxsize as DFLTPHYS for now because that's what it will be if we don't define it.
111979	08-Mar-2003	phk	Centralize the devstat handling for all GEOM disk device drivers in geom_disk.c. As a side effect this makes a lot of #include <sys/devicestat.h> lines not needed and some biofinish() calls can be reduced to biodone() again.
111883	04-Mar-2003	jhb	Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().
111815	03-Mar-2003	phk	Gigacommit to improve device-driver source compatibility between branches: Initialize struct cdevsw using C99 sparse initializtion and remove all initializations to default values. This patch is automatically generated and has been tested by compiling LINT with all the fields in struct cdevsw in reverse order on alpha, sparc64 and i386. Approved by: re(scottl)
111587	27-Feb-2003	davidxu	Needn't kse.h
111585	27-Feb-2003	julian	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
111524	26-Feb-2003	mux	Correctly set BUS_SPACE_MAXSIZE in all the busdma backends. It was bogusly set to 64 * 1024 or 128 * 1024 because it was bogusly reused in the BUS_DMAMAP_NSEGS definition.
111462	25-Feb-2003	mux	Cleanup of the d_mmap_t interface. - Get rid of the useless atop() / pmap_phys_address() detour. The device mmap handlers must now give back the physical address without atop()'ing it. - Don't borrow the physical address of the mapping in the returned int. Now we properly pass a vm_offset_t * and expect it to be filled by the mmap handler when the mapping was successful. The mmap handler must now return 0 when successful, any other value is considered as an error. Previously, returning -1 was the only way to fail. This change thus accidentally fixes some devices which were bogusly returning errno constants which would have been considered as addresses by the device pager. - Garbage collect the poorly named pmap_phys_address() now that it's no longer used. - Convert all the d_mmap_t consumers to the new API. I'm still not sure wheter we need a __FreeBSD_version bump for this, since and we didn't guarantee API/ABI stability until 5.1-RELEASE. Discussed with: alc, phk, jake Reviewed by: peter Compile-tested on: LINT (i386), GENERIC (alpha and sparc64) Runtime-tested on: i386
111194	20-Feb-2003	phk	Change the console interface to pass a "struct consdev " instead of a dev_t to the method functions. The dev_t can still be found at struct consdev ->cn_dev. Add a void *cn_arg element to struct consdev which the drivers can use for retrieving their softc.
111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
111036	17-Feb-2003	julian	Fix missed patch in last commit
111032	17-Feb-2003	julian	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@
111030	17-Feb-2003	marcel	Print two new processor features: o Spontaneous deferral (A feature required by dutch railways :-) o 16-byte atomic operations (ld, st, cmpxchg)
111028	17-Feb-2003	jeff	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu
111024	17-Feb-2003	jeff	- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into the proc. These counters are only examined through calcru. Submitted by: davidxu Tested on: x86, alpha, UP/SMP
110959	15-Feb-2003	marcel	Fix misuse of Maxmem in the calculation of the VHPT size. Maxmem is already in pages, so we should not convert from bytes to pages. The result of this bug was bad scaling of the VHPT relative to the available memory. Submitted by: Arun Sharma <arun@sharma-home.net>
110784	13-Feb-2003	alc	MFi386 Remove kptobj. Instead, use VM_ALLOC_NOOBJ.
110335	04-Feb-2003	harti	Fix a problem in bus_dmamap_load_{mbuf,uio} when the first mbuf or the first uio segment is empty. In this case no dma segment is create by bus_dmamap_load_buffer, but the calling routine clears the first flag. Under certain combinations of addresses of the first and second mbuf/uio buffer this leads to corrupted DMA segment descriptors. This was already fixed by tmm in sparc64/sparc64/iommu.c. PR: kern/47733 Reviewed by: sam Approved by: jake (mentor)
110296	03-Feb-2003	jake	Split statclock into statclock and profclock, and made the method for driving statclock based on profhz when profiling is enabled MD, since most platforms don't use this anyway. This removes the need for statclock_process, whose only purpose was to subdivide profhz, and gets the profiling clock running outside of sched_lock on platforms that implement suswintr. Also changed the interface for starting and stopping the profiling clock to do just that, instead of changing the rate of statclock, since they can now be separate. Reviewed by: jhb, tmm Tested on: i386, sparc64
110255	03-Feb-2003	marcel	Don't use the 'c' partition for mounting root. A disklabel is very likely not present under the simulator. If multiple partitions are present on the virtual disk, then the 'a' partition would be the most logical choice. Nowadays partitions are GPT based, which would make the assumption of a disklabel even more questionable. Given all the possible scenarios, assuming a raw "device" seems best.
110232	02-Feb-2003	alfred	Consolidate MIN/MAX macros into one place (param.h). Submitted by: Hiten Pandya <hiten@unixdaemons.com>
110229	02-Feb-2003	phk	We don't need sscopen() and sscclose(). Register sscstrategy directly, instead of using a cdevsw{} for the purpose. Tested by: marcel
110227	02-Feb-2003	marcel	Export IA32 from opt_ia32.h to assembly so that we can eliminate saving and restoring ia32 specific registers when switching context and ia32 support has not been compiled-in. The primary reason for this change is that one of the ia32 registers (ar.fcr) is wrongly marked as invalid by the simulator. Now that we avoid using the register when possible, usability is improved. The secundary reason is that it saves us 7 loads and stores. Note that the PCB will continue to have room for these registers, irrespective of the IA32 option. There are no benefits that make it worthwhile.
110211	01-Feb-2003	marcel	Remove special casing for running in the simulator from the kernel and instead add platform, firmware and EFI stubs to the loader. The net effect of this change is that besides a special console and disk driver, the kernel has no knowledge of the simulator. This has the following advantages: o Simulator support is much harder to break, o It's easier to make use of more feature complete simulators. This would only need a change in the simulator specific loader, o Running SMP kernels within the simulator. Note that ski at this time does not simulate IPIs, so there's no way to start APs. The platform, firmware and EFI stubs describe the following hardware: o 4 CPU Itanium, o 128 MB RAM within the 4GB address space, o 64 MB RAM above the 4GB address space. NOTE: The stubs in the skiloader describe a machine that should in parts be defined by the simulator. Things like processor interrupt block and AP wakeup vector cannot be choosen at random because they require interpretation by the simulator. Currently the simulator is ignorant of this. This change introduces an unofficial SSC call SSC_SAL_SET_VECTORS which is ignored by the simulator. Tested with: ski (version 0.943 for linux)
110190	01-Feb-2003	julian	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
110084	30-Jan-2003	phk	Remove D_CANFREE from sscdisk.c. I belive it got here by copy&paste and I see no signs in the source code that BIO_DELETE was dealt with correctly and can only wonder what kind of trouble this may have caused.
109911	27-Jan-2003	julian	Unbreak SMP cases for these architectures. statclock_process() changed arguments. note: it may be worth checking if curkse is needed on these architectures.. (and if so, why?)
109877	26-Jan-2003	davidxu	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
109799	24-Jan-2003	dfr	Fix pmap_extract so that it doesn't panic if the user types 'cat /proc/pid/map' Submitted by: Arun Sharma <arun.sharma@intel.com>
109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
109615	21-Jan-2003	jeff	- Add a VM_WAIT in the appropriate cases where vm_page_alloc() fails and flags indicate that uma_small_alloc should not. This code should be refactored so that there is not so much cross arch duplication. Reviewed by: jake Spotted by: tmm Tested on: alpha, sparc64 Pointy hat to: jeff and everyone who cut and pasted the bad code. :-)
109605	21-Jan-2003	jake	Resolve relative relocations in klds before trying to parse the module's metadata. This fixes module dependency resolution by the kernel linker on sparc64, where the relocations for the metadata are different than on other architectures; the relative offset is in the addend of an Elf_Rela record instead of the original value of the location being patched. Also fix printf formats in debug code. Submitted by: Hartmut Brandt <brandt@fokus.gmd.de> PR: 46732 Tested on: alpha (obrien), i386, sparc64
109558	20-Jan-2003	phk	We need neither <sys/diskslice.h> nor <sys/disklabel.h> here.
109490	18-Jan-2003	mux	Don't try to free() map in bus_dmamap_destroy() when it's set to &nobounce_dmamap. A similar bug was fixed by wpaul in revision 1.19 of sys/alpha/alpha/busdma_machdep.c.
109342	16-Jan-2003	dillon	Merge all the various copies of vm_fault_quick() into a single portable copy.
109340	15-Jan-2003	dillon	Merge all the various copies of vmapbuf() and vunmapbuf() into a single portable copy. Note that pmap_extract() must be used instead of pmap_kextract(). This is precursor work to a reorganization of vmapbuf() to close remaining user/kernel races (which can lead to a panic).
108759	06-Jan-2003	marcel	Move ia64_sapics and ia64_sapic_count from interrupt.c to sapic.c and declare them extern in interrupt.c. This eliminates the need for ia64_add_sapic(), which is called from sapic.c. While here, reformat ia64_enable() in interrupt.c to improve indentation and add a sysctl (machdep.apic) to dump the I/O APIC entries currently programmed into all I/O APICs. The latter can help analyze interrupt problems. Note that the sysctl is not intended as a userland (software) interface. It may be changed in the future to include counters so that vmstat -i can make use of it. It may also be removed...
108757	06-Jan-2003	peter	Move the itm reload to a single place rather than having two identical copies of the reload. Note that we use the precomputed itm_reload value so that we can avoid a division in the kernel. The ia64 cpu does not have integer divide, so this would have been done by a floating point operation.
108756	06-Jan-2003	marcel	Replace the hardcoding of 255 as the clock interrupt vector with CLOCK_VECTOR and define it as 254, not 255. Vector 255 is already in use as the AP wakeup vector on the HP rx2600. This needs to be made more dynamic. The likelyhood of vector 254 being in use is pretty small, but we already have code to assign vectors to IPIs (see sal.c) and it's preobably better to have a centralized "vector manager" that hands out vectors based on some imput (like priority).
108751	06-Jan-2003	marcel	Manually inline handleclock(). There's only a single caller and handleclock itself is trivial. While here, replace (itc_frequency+hz/2)/hz with itm_reload for consistency. There's now a single place where we determine the ITM reload value.
108749	06-Jan-2003	marcel	Count interrupts as soon as possible. This makes sure interrupts are counted even when there are no handlers.
108737	05-Jan-2003	marcel	Don't hardcode the address of the local (S)APIC (aka processor interrupt block). We use the previously hardcoded address as a default only, but will otherwise use whatever ACPI tells us. The address can be found in the MADT table header or in the LAPIC override table entry.
108733	05-Jan-2003	marcel	Handle 3-digit interrupt numbers (vectors). While here, change the name of unused entries from "intr XXX" to "#XXX". This makes it easier to debug interrupt problems, because vmstat can be hacked more easily to dump all interrupt entries that are in use and not those that have had interrupts.
108643	04-Jan-2003	alc	Hold the page queues lock around pmap_remove_pte() in pmap_enter(). Submitted by: Arun Sharma <adsharma@unix-os.sc.intel.com>
107963	17-Dec-2002	marcel	Check that the dump device is large enough. Otherwise we could end up with a dump offset that's smaller than the start of the dump device and either clobber data in preceding partitions or try to write beyond the end of the medium (unsigned wrap). Implement legacy behaviour to never write to the first 64KB as that is where metadata (ie disklabels) may reside.
107849	14-Dec-2002	alfred	SCARGS removal take II.
107839	13-Dec-2002	alfred	Backout removal SCARGS, the code freeze is only "selectively" over.
107838	13-Dec-2002	alfred	Remove SCARGS. Reviewed by: md5
107719	10-Dec-2002	julian	Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage during the context switch. Rearrange thread cleanups to avoid problems with Giant. Clean threads when freed or when recycled. Approved by: re (jhb)
107481	02-Dec-2002	alc	MFi386 Hold the page queues lock around vm_page_unhold() in vunmapbuf(). Approved by: re (blanket)
107394	29-Nov-2002	marcel	Better handle sparse physical memory: Don't use the address range as a measure for available memory to scale the VHPT. Instead, use the previously determined Maxmem. Approved by: re (carte blanc)
107206	24-Nov-2002	marcel	MFp4: Add function map_port_space() to map the memory mapped I/O port range as uncacheable virtual memory and call it prior to probing for a console. This removes the dependency on the loader to have done this for us. Note that this change does not include doing the same for APs. Approved by: re (blanket)
107205	24-Nov-2002	marcel	Fix comparison that caused a 1-off bug. This appeared harmless for the kernel itself, but SAL on Itanium2 machines spontaneously rebooted the machine. Approved by: re (blanket) Submitted by: Arun Sharma <adsharma@unix-os.sc.intel.com>
107180	22-Nov-2002	mux	Under certain circumstances, we were calling kmem_free() from i386 cpu_thread_exit(). This resulted in a panic with WITNESS since we need to hold Giant to call kmem_free(), and we weren't helding it anymore in cpu_thread_exit(). We now do this from a new MD function, cpu_thread_dtor(), called by thread_dtor(). Approved by: re@ Suggested by: jhb
107028	17-Nov-2002	alc	MFi386 r1.369 - Clear the PG_WRITEABLE flag in pmap_page_protect() if write access is being removed. Return immediately if write access is being removed and PG_WRITEABLE is already clear.
106977	16-Nov-2002	deischen	Add getcontext, setcontext, and swapcontext as system calls. Previously these were libc functions but were requested to be made into system calls for atomicity and to coalesce what might be two entrances into the kernel (signal mask setting and floating point trap) into one. A few style nits and comments from bde are also included. Tested on alpha by: gallatin
106965	15-Nov-2002	peter	Do not assume that time_t is an int. Approved by: re (jhb)
106838	13-Nov-2002	alc	Move pmap_collect() out of the machine-dependent code, rename it to reflect its new location, and add page queue and flag locking. Notes: (1) alpha, i386, and ia64 had identical implementations of pmap_collect() in terms of machine-independent interfaces; (2) sparc64 doesn't require it; (3) powerpc had it as a TODO.
106753	11-Nov-2002	alc	- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit() if we're removing write access from the page's PTEs. - Export pmap_remove_all() on alpha, i386, and ia64. (It's already exported on sparc64.)
106697	09-Nov-2002	des	Print real / avail memory in megabytes rather than kilobytes.
106605	07-Nov-2002	tmm	Move the definitions of the hw.physmem, hw.usermem and hw.availpages sysctls to MI code; this reduces code duplication and makes all of them available on sparc64, and the latter two on powerpc. The semantics by the i386 and pc98 hw.availpages is slightly changed: previously, holes between ranges of available pages would be included, while they are excluded now. The new behaviour should be more correct and brings i386 in line with the other architectures. Move physmem to vm/vm_init.c, where this variable is used in MI code.
106503	06-Nov-2002	jmallett	Remove what was a temporary bogus assignment of bits of siginfo_t, as it does not look like the prerequisites to fill it in properly will be in the tree for the upcoming release, but it's mostly done, so there is no need for these to stay around to remind us.
106486	06-Nov-2002	marcel	Define UMA_MD_SMALL_ALLOC so that we can allocate memory with region 7 addresses for use by page tables and kernel stacks. Obtained from: peter
106197	30-Oct-2002	marcel	Don't pass the return address to exception_save in register b0. Use a true scratch register. This change and future re-allocations will eventually result in code that we can unwind to to get the preserved registers of the process. This of course means that we cannot trash them while saving the process context. While re-allocating, remove the register aliases. Abstraction is in this case disadvanteous.
106189	30-Oct-2002	marcel	Rewrite cpu_switch(). The most notable change is the fact that we now have f16-f31 as part of the context. The PCB has been reorganized to better match how we save and restore the (preserved) registers. This commit also moves the context restoriation to its own function (named pcb_restore), as we did with pcb_save. Only minimal effort has been put in writing optimal assembly. The expectation is that there will be more rounds of changes.
106069	28-Oct-2002	marcel	Remove mf.a from sapic_read() and sapic_write(). We only care about ordering and not acceptance. The removal of mf.a leaves behind the mf that accompanied it.
106066	28-Oct-2002	marcel	Make vmstat -i work: o Properly set the pointer to the counter for each interrupt and update the intrnames table. o Remove Alpha cruft from intrcnt.h. o Create INTRNAME_LEN as the single entity that defines the width of the names in the intrnames table (incl. terminatinf '\0').
106063	27-Oct-2002	marcel	In ipi_send(), perform a mf instruction prior to initiating the IPI. This guarantees that loads and stores emitted before the fence are made visible before the IPI becomes pended. Remove the mf.a instruction after initiating the IPI. There's no guarantee that the IPI becomes pended prior to subsequent reads or writes. Even if there was a guarantee, it would mostly be without any benefit.
105950	25-Oct-2002	peter	Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound. Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext ' to the current sigreturn(2), instead of the 'ucontext_t ' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc. Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
105900	24-Oct-2002	julian	Extract out KSE specific code from machine specific code so that there is ony one copy of it. Fix that one copy so that KSEs with no mailbox in a KSE program are not a cause of page faults (this can legitmatly happen). Submitted by: (parts) davidxu
105891	24-Oct-2002	jhb	Oops, I missed a few changes in 'device acpica' -> 'device acpi' change. Submitted by: Hiten Pandya <hiten@angelica.unixdaemons.com>
105591	20-Oct-2002	marcel	In cb_dumphdr() we were calling buf_write() with di->priv as the pointer to a dumperinfo instead of di. A brainfart, surely. This bug went unnoticed for all this time because the pointer is only used by buf_write() when it can write a completely filled buffer to the dump device. This depends on the number of memory chunks that needs to be dumped. This has apparently been low enough that it has never happened up until this point.
105500	20-Oct-2002	marcel	Remove the special casing for IP addresses that are within the IVT or the do_syscall() function. We have unwind directives to stop the unwinder.
105499	20-Oct-2002	marcel	Define IVT_ENTRY and IVT_END as special versions of ENTRY and END for defining vectors. As a result, each vector will be a global function with unwind directives to notify the unwinder that we're in an interrupt handler. In the debugger this will show up something like: Debugger(0xe000000000a211d8, 0xe000000000748960) at Debugger+0x31 panic(0xe000000000a36858, 0xe0000000021d32d0, 0xe000000000ae42e8, ... trap(0x14, 0x100000, 0xe0000000021d32d0, 0x0, 0xa0000000002095f0, ... ivt_Data_TLB(0x14, 0x100000, 0xe0000000021d32d0) at ivt_Data_TLB+0x1f0
105470	19-Oct-2002	marcel	Update the unwind information when modules are loaded and unloaded by using the linker hooks. Since these hooks are called for the kernel as well, we don't need to deal with that with a special SYSINIT. The initialization implicitly performed on the first update of the unwind information is made explicit with a SYSINIT. We now don't need the _ia64_unwind_{start\|end} symbols.
105469	19-Oct-2002	marcel	Add two hooks to signal module load and module unload to MD code. The primary reason for this is to allow MD code to process machine specific attributes, segments or sections in the ELF file and update machine specific state accordingly. An immediate use of this is in the ia64 port where unwind information is updated to allow debugging and tracing in/across modules. Note that this commit does not add the functionality to the ia64 port. See revision 1.9 of ia64/ia64/elf_machdep.c. Validated on: alpha, i386, ia64
105432	19-Oct-2002	marcel	Make this compile when DDB is not defined by conditionally compiling all references to ksym_start and ksym_end.
105147	15-Oct-2002	marcel	Fix kernel module loading on ia64. Cross-module function calls were improperly relocated due to faulty logic in lookup_fdesc() in elf_machdep.c. The symbol index (symidx) was bogusly used for load modules other than the one the relocation applied to. This resulted in bogus bindings and consequently runtime failures. The fix is to use the symbol index only for the module being relocated and to use the symbol name for look-ups in the modules in the dependent list. As such, we need a function to return the symbol name given the linker file and symbol index.
105079	14-Oct-2002	marcel	Allow kernel dumps to be aborted with ctrl-C.
105011	12-Oct-2002	marcel	o Fix typo in previous commit: s/sc-nsect/sc->nsect/ o Fix printf format error for %d format with long argument.
105010	12-Oct-2002	marcel	Plug two holes where we returned to userland without restoring the predicate registers. Even though the ITLB and DTLB interrupts happen often enough, this bug didn't do much harm. The reason is that the interrupt handlers only modify p1 and since this is a preserved (callee-saved) register it is hardly used in code generated by the compiler. Compilers use scratch registers by default. Changing the interrupt handlers to use p6 (ie a scratch register) proved that the bug was in fact fatal.
105001	12-Oct-2002	marcel	Polish previous commit: o Replace KSTACK_PAGES with pages on panic() in pmap_new_thread(), o Fix style bugs in adjacent code, o Use NULL instead of 0 for pointers, o Save the virtual kstack address if we create an alternate kstack because 1) we can derive the physical (RR7) address from it and 2) we need the virtual address for contigfree() in pmap_dispose_thread(). Thus td_altkstack saves td_md.md_kstackvirt.
105000	12-Oct-2002	marcel	MFp4: Include machine/vmparam.h to pull in definition of IA64_RR_BASE. Obtained from: peter
104939	11-Oct-2002	peter	cut/paste the pmap_new_altkstack stuff from the other platforms. It's no different here. Update the rest of the kstack API's for scottl's changes.
104938	11-Oct-2002	peter	Call uma_zalloc on pvzone with M_NOWAIT, just like i386 and alpha. Otherwise we get hundreds of 'could sleep' during boot.
104908	11-Oct-2002	mike	Change iov_base's type from `char ' to the standard `void '. All uses of iov_base which assume its type is `char ' (in order to do pointer arithmetic) have been updated to cast iov_base to `char '.
104486	04-Oct-2002	sam	New bus_dma interfaces for use by crypto device drivers: o bus_dmamap_load_mbuf o bus_dmamap_load_uio Test on i386. Known to compile on alpha and sparc64, but not tested. Otherwise untried.
104438	04-Oct-2002	peter	List the IO SAPIC delivery mode definitions.
104433	04-Oct-2002	peter	Do a bit of rude hackery to get clock interrupts on all CPUs. This is partly based on the Alpha system which duplicates the clock to each cpu, instead of doing a clock roundrobin like on i386. This means we get hz * ncpu clocks per second and so we have to seperate clock sampling from actual 'do the work' clock processing. The BSP runs the complete processing, the rest just sample state etc. Using the on-cpu interval timer is not ideal as it will drift. There is more to be done here, we should use an external clock source.
104426	04-Oct-2002	peter	Update stubs for post-kseIII.
104425	04-Oct-2002	peter	Update for post-kseIII
104320	01-Oct-2002	phk	Fix the same misinitialization of pmap_prefault_pageorder as on i386. Suggeste by: jake
103714	20-Sep-2002	phk	(This commit touches about 15 disk device drivers in a very consistent and predictable way, and I apologize if I have gotten it wrong anywhere, getting prior review on a patch like this is not feasible, considering the number of people involved and hardware availability etc.) If struct disklabel is the messenger: kill the messenger. Inside struct disk we had a struct disklabel which disk drivers used to communicate certain metrics to the disklayer above (GEOM or the disk mini-layer). This commit changes this communication to use four explicit fields instead. Amongst the benefits is that the fields do not get overwritten by wrong or bogus on-disk disklabels. Once that is clear, <sys/disk.h> which is included in the drivers no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in, the few places that needs them, have gotten explicit #includes for them. The disklabel inside struct disk is now only for internal use in the disk mini-layer, so instead of embedding it, we malloc it as we need it. This concludes (modulus any mistakes) the series of disklabel related commits. I belive it all amounts to a NOP for all the rest of you :-) Sponsored by: DARPA & NAI Labs.
103703	20-Sep-2002	phk	For reasons now lost in historical fog, the bounds_check_with_label() function were put in i386/i386/machdep.c from where it has been cut and pasted to other architectures with only minor corruption. Disklabel is really a MI format in many ways, at least it certainly is when you operate on struct disklabel. Put bounds_check_with_label() back in subr_disklabel.c where it belongs. Sponsored by: DARPA & NAI Labs.
103646	19-Sep-2002	jhb	Implement db_print_backtrace() if DDB is compiled into the kernel. This MD function is just a wrapper around db_stack_trace_cmd() that prints out a backtrace of curthread. Currently, this function is only implemented on i386 and alpha (and the alpha version isn't quite tested yet, will do that in a bit). Other changes: - For i386, fix a bug in the raw frame address case. The eip we extract from the passed in frame address does not match the frame we received. Thus, instead of printing a bogus frame with the wrong eip, go ahead and advance frame down to the same frame as the eip we are using. - For alpha, attempt to add a way of doing a raw trace for alpha. Instead of passing a frame address in 'addr', pass in a pointer to a structure containing PC and KSP and use those to start the backtrace. The alpha db_print_backtrace() uses asm to read in the current PC and KSP values into such a request. Tested on: i386 Requested by: many
103436	17-Sep-2002	peter	Initiate deorbit burn for the i386-only a.out related support. Moves are under way to move the remnants of the a.out toolchain to ports. As the comment in src/Makefile said, this stuff is deprecated and one should not expect this to remain beyond 4.0-REL. It has already lasted WAY beyond that. Notable exceptions: gcc - I have not touched the a.out generation stuff there. ldd/ldconfig - still have some code to interface with a.out rtld. old as/ld/etc - I have not removed these yet, pending their move to ports. some includes - necessary for ldd/ldconfig for now. Tested on: i386 (extensively), alpha
103367	15-Sep-2002	julian	Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
103081	07-Sep-2002	jmallett	Fill out two fields (si_pid, si_uid) in the siginfo structure handed back to userland in the signal handler that were not being iflled out before, but should and can be. This part of sendsig could be slightly refactored to use an MI interface, or ideally, sendsig() would have an API change to accept a siginfo_t, which would be filled out by an MI function in the level above sendsig, and said MI function would make a small call into MD code to fill out the MD parts (some of which may be bogus, such as the si_addr stuff in some places). This would eventually make it possible for parts of the kernel sending signals to set up a siginfo with meaningful information. Reviewed by: mux MFC after: 2 weeks
103049	07-Sep-2002	peter	Zap the implementations of the i386-aout specific cpu_coredump function. Most of the non-i386 platforms had rather broken implementations anyway.
102837	02-Sep-2002	alc	o Remove an initialized but unused variable from pmap_remove_all().
102808	01-Sep-2002	jake	Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to sysentvec. Initialized all fields of all sysentvecs, which will allow them to be used instead of constants in more places. Provided stack fixup routines for emulations that previously used the default.
102666	31-Aug-2002	peter	Take a shot at fixing up a whole stack of style and other embarresing unforced errors that Bruce identified. I have not yet addressed all of his concerns.
102663	31-Aug-2002	peter	Do not use an object for the pte and pv zones on ia64 because it overrides the pmap_allocf() function that we provide above. We still use the limits via other means. Submitted by: jeff
102600	30-Aug-2002	peter	Change hw.physmem and hw.usermem to unsigned long like they used to be in the original hardwired sysctl implementation. The buf size calculator still overflows an integer on machines with large KVA (eg: ia64) where the number of pages does not fit into an int. Use 'long' there. Change Maxmem and physmem and related variables to 'long', mostly for completeness. Machines are not likely to overflow 'int' pages in the near term, but then again, 640K ought to be enough for anybody. This comes for free on 32 bit machines, so why not?
102561	29-Aug-2002	jake	Renamed poorly named setregs to exec_setregs. Moved its prototype to imgact.h with the other exec support functions.
102560	29-Aug-2002	jake	Fixed printf format errors.
102399	25-Aug-2002	alc	o Retire pmap_pageable(). It's an advisory routine that none of our platforms implements.
102161	20-Aug-2002	rwatson	Correct one more errant whitespace nit that crept in during changes in the arguments to vn_rdwr(). Hopefully the last.
101945	15-Aug-2002	rwatson	Correct a minor whitespace nit that sneaked in with my previous commit.
101941	15-Aug-2002	rwatson	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
101645	10-Aug-2002	alc	o Remove the setting and clearing of the PG_MAPPED flag from the alpha and ia64 pmap. o Remove the PG_MAPPED flag's declaration.
101625	10-Aug-2002	peter	My quad cpu itanium2 box has its cpu's numbered with a lid starting at 192. Masking off bottom 4 bits is not very good here.
101251	03-Aug-2002	peter	Ignore memory above 4GB for now due to unpleasant pci issues.
101199	02-Aug-2002	alc	o Lock page queue accesses by vm_page_deactivate().
100543	23-Jul-2002	arr	- Pass the VM_ALLOC_WIRED flag to vm_page_alloc() in pmap_growkernel() so that we can avoid a call to vm_page_lock_queues(). Approved by: peter
100398	20-Jul-2002	peter	Change the max IRQ from 63 to 255. I realize we have to block some out still for the IPI vectors, but 63 isn't enough. There is an fxp at IRQ 86 on the Itanium2 box I have.
100384	20-Jul-2002	peter	Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable handler in the kernel at the same time. Also, allow for the exec_new_vmspace() code to build a different sized vmspace depending on the executable environment. This is a big help for execing i386 binaries on ia64. The ELF exec code grows the ability to map partial pages when there is a page size difference, eg: emulating 4K pages on 8K or 16K hardware pages. Flesh out the i386 emulation support for ia64. At this point, the only binary that I know of that fails is cvsup, because the cvsup runtime tries to execute code in pages not marked executable. Obtained from: dfr (mostly, many tweaks from me).
100269	17-Jul-2002	peter	Fix some typos in 1.68 from over a week ago.
100268	17-Jul-2002	peter	Cap the initial PV and PTE table preallocations. Otherwise we explode on the Itanium2 system I have when we use up all of the initial 256MB direct mapped region before we are ready to dynamically expand it. The machine that I have has 4 cpus and a very big hole in the middle. This makes the bogus '(last_address - first_address) / PAGE_SIZE' calculations especially dangerous and caused many millions of initial PV/PTE's to be preallocated.
100267	17-Jul-2002	peter	Be sure to use a logical address for the SAL table. For some reason the phsysical address is still mapped at this stage of boot on the Itanium1 SDV boxes we have. But Itanium2 does not let us get away with this.
100000	14-Jul-2002	alc	o Lock page queue accesses by vm_page_wire().
99900	13-Jul-2002	mini	Add additional cred_free_thread() calls that I had missed the first time. Pointed out by: jhb
99887	12-Jul-2002	jhb	Set the thread state of the newly chosen to run thread to TDS_RUNNING in choosethread() in MI C code instead of doing it in in assembly in all the various cpu_switch() functions. This fixes problems on ia64 and sparc64. Reviewed by: julian, peter, benno Tested on: i386, alpha, sparc64
99571	08-Jul-2002	peter	Add a special page zero entry point intended to be called via the single threaded VM pagezero kthread outside of Giant. For some platforms, this is really easy since it can just use the direct mapped region. For others, IPI sending is involved or there are other issues, so grab Giant when needed. We still have preemption issues to deal with, but Alan Cox has an interesting suggestion on how to minimize the problem on x86. Use Luigi's hack for preserving the (lack of) priority. Turn the idle zeroing back on since it can now actually do something useful outside of Giant in many cases.
99559	07-Jul-2002	peter	Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/ pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code and use a single equivalent MI version. There are other cleanups needed still. While here, use the UMA zone hooks to keep a cache of preinitialized proc structures handy, just like the thread system does. This eliminates one dependency on 'struct proc' being persistent even after being freed. There are some comments about things that can be factored out into ctor/dtor functions if it is worth it. For now they are mostly just doing statistics to get a feel of how it is working.
99422	05-Jul-2002	peter	Back out proc part of last commit. UMA manages the thread cache only, and we just have to deal with the kstack when told to. We do not have a UMA-managed cache for the proc struct and its associated upage yet. So, go back to the old lazy mechanism. Note that if UMA destroys pages that used to contain proc structures, we'll lose the corresponding upage forever. (zones never did this - once a page was allocated, it stayed attached to the proc zone forever)
99421	05-Jul-2002	peter	Copy from sparc64/pmap.c rev 1.64 (Retrofit changes from i386/pmap.c rev 1.328-1.331.) but for uarea only. We still have our own broken kstack code here.
99095	29-Jun-2002	julian	Fix reverse ordering of locks. add a comment about locks on some platforms. Submitted by: jhb@freebsd.org
99081	29-Jun-2002	julian	Add KSE stubs to MD parts of ia64 code. Dfr will fill these out when we decide to enable KSEs on ia64 (probably not immediatly)
99072	29-Jun-2002	julian	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
98773	24-Jun-2002	dfr	Add UMA_ZONE_VM flag to the zones which are used for pmap_enter().
98765	24-Jun-2002	jake	Add an MD callout like cpu_exit, but which is called after sched_lock is obtained, when all other scheduling activity is suspended. This is needed on sparc64 to deactivate the vmspace of the exiting process on all cpus. Otherwise if another unrelated process gets the exact same vmspace structure allocated to it (same address), its address space will not be activated properly. This seems to fix some spontaneous signal 11 problems with smp on sparc64.
98727	24-Jun-2002	mini	Remove unused diagnostic function cread_free_thread(). Approved by: alfred
98484	20-Jun-2002	peter	Update an 'XXX what is this?' type comment about suswintr and fuswintr. These are 16 bit short values used only by the profiling code.
98480	20-Jun-2002	peter	Deorbit suibyte(). It was only used for split address space systems for supporting UIO_USERISPACE (ie: it wasn't used).
98474	20-Jun-2002	peter	ia32 %edx return comes from td_retval[1], not td_retval[0] Obtained from: dfr
98473	20-Jun-2002	peter	Use suword32/64 and fuword32/64 like elsewhere instead of inventing suhword/fuhword.
98471	20-Jun-2002	peter	panic rather than fault and explode if we fail to contigmalloc a kernel stack. This is still bad(TM), but at least we have a clue when we get hit when contigmalloc fails.
98470	20-Jun-2002	peter	Use the canonical pmap_{new,dispose,swapin,swapout}_proc() functions, in this case cut/pasted from sparc64 instead of messing with contigmalloc where it is not needed.
98001	07-Jun-2002	jhb	- Fixup / remove obsolete comments. - ktrace no longer requires Giant so do ktrace syscall events before and after acquiring and releasing Giant, respectively. - For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the goto bad hack and instead use the model on ia64 and alpha were we skip the actual syscall invocation if error != 0. This fixes a bug where if we the copyin() of the arguments failed for a syscall that was not marked MP safe, we would try to release Giant when we had not acquired it.
97969	06-Jun-2002	marcel	Work around a bug in the Linux version of ski, that's specific to SSC_GET_RTC. This fixes the panic seen shortly after mounting the root file system. Thanks to: "K.Sumitani" <ksumitani@mui.biglobe.ne.jp>
97443	29-May-2002	marcel	Remove the definition of struct mca_guid and use the generic struct uuid defined in <sys/uuid.h>. Use uuid/UUID instead of guid/GUID to emphasize that the identifiers are DCE version 1 identifiers and also to avoid inconsistencies as much a possible.
96973	20-May-2002	marcel	Flesh-out ptrace support. This obviously needs more work.
96961	19-May-2002	marcel	Fix a kernel page fault when accessing user memory. We were combining too much conditions and as such ended up with the kernel map instead of the corresponding process map. While here, remove code to allow access to the stackgap and restyle slightly to improve readability. This fix specifically fixes the procfs failure we're having when reading the process map (cat /proc/curproc/map)
96916	19-May-2002	peter	Catch another C++ comment
96912	19-May-2002	marcel	o Remove namespace pollution from param.h: - Don't include ia64_cpu.h and cpu.h - Guard definitions by _NO_NAMESPACE_POLLUTION - Move definition of KERNBASE to vmparam.h o Move definitions of IA64_RR_{BASE\|MASK} to vmparam.h o Move definitions of IA64_PHYS_TO_RR{6\|7} to vmparam.h o While here, remove some left-over Alpha references.
96899	19-May-2002	marcel	Cast dumpsize to long long to match printf format.
96755	16-May-2002	trhodes	More s/file system/filesystem/g
96487	13-May-2002	jake	These were repo-copied to dump_machdep.c.
96442	12-May-2002	marcel	o Rename ia64_count_aps to ia64_count_cpus and reimplement the function to return the total number of CPUs and not the highest CPU id. o Define mp_maxid based on the minimum of the actual number of CPUs in the system and MAXCPU. o In cpu_mp_add, when the CPU id of the CPU we're trying to add is larger than mp_maxid, don't add the CPU. Formerly this was based on MAXCPU. Don't count CPUs when we add them. We already know how many CPUs exist. o Replace MAXCPU with mp_maxid when used in loops that iterate over the id space. This avoids a couple of useless iterations. o In cpu_mp_unleash, use the number of CPUs to determine if we need to launch the CPUs. o Remove mp_hardware as it's not used anymore. o Move the IPI vector array from mp_machdep.c to sal.c. We use the array as a centralized place to collect vector assignments. Note that we still assign vectors to SMP specific IPIs in non-SMP configurations. Rename the array from mp_ipi_vector to ipi_vector. o Add IPI_MCA_RENDEZ and IPI_MCA_CMCV. These are used by MCA. Note that IPI_MCA_CMCV is not SMP specific. o Initialize the ipi_vector array so that we place the IPIs in sensible priority classes. The classes are relative to where the AP wake-up vector is located to guarantee that it's the highest priority (external) interrupt. Class assignment is as follows: class IPI notes x AP wake-up (normally x=15) x-1 MCA rendezvous x-2 AST, Rendezvous, stop x-3 CMCV, test
96146	07-May-2002	marcel	o Add ar.lc to the pcb. o Create pcb_save as the backend for savectx and cpu_switch. o While here, use explicit bundling for pcb_save and optimize for compactness (~87% density). o Not part of the commit is a backend pcb_restore. restorectx() still jumps halfway into cpu_switch().
96061	05-May-2002	marcel	o Include md_var.h o Remove definition of struct ia64_fdesc o Remove prototype of os_boot_rendez o Use the FDESC_FUNC and FDESC_GP abstractions
96059	05-May-2002	marcel	Remove definition of struct ia64_fdesc. It's been moved to md_var.h
96029	04-May-2002	dfr	Use region 7 addresses for the slabs in the PV and PT zones so that we don't confuse the zone allocater by translating region 5 addresses to region 7 addresses (which is unavoidable for PTEs).
96019	04-May-2002	marcel	Make sure we don't index the pm_rid array out of bounds in pmap_ensure_rid(). This can happen because the function is called for both user and kernel addresses, while the rid array only has room for user addresses. This bug got exposed by rev 1.58 of ia64/ia64/pmap.c and rev 1.8 of ia64/include/pmap.h.
95920	02-May-2002	marcel	In pmap_pinit0, remove duplicate initialization.
95919	02-May-2002	marcel	PCPU(current_pmap) is initialized in pmap_bootstrap. No need to do it again.
95893	01-May-2002	marcel	Save the MCA info specific to the AP as part of the AP launch.
95892	01-May-2002	marcel	Make ia64_mca_save_state MP safe. Protect access to the info block, updating the sysctl tree and clearing the SAL state by a spin lock.
95863	01-May-2002	peter	Connect up kern_envp before we use it for getenv() and console probing. It is a bit late after that when we have no consoles. :-] Also, fix a comment nit and print a warning about missing metadata.
95814	30-Apr-2002	phk	Don't export timecounter structures under debug. with sysctl, they contain no truly interesting data anymore.
95768	30-Apr-2002	marcel	Add ar.lc and ar.ec to the trapframe. These are not saved for syscalls, only for exceptions. While adding this to exception_save and exception_restore, it was hard to find a good place to put the instructions. The code sequence was sufficiently arbitrarily ordered that the density was low (roughly 67%). No explicit bundling was used. Thus, I rewrote the functions to optimize for density (close to 80% now), and added explicit bundles and nop instructions. The immediate operand on the nop instruction has been incremented with each instance, to make debugging a bit easier when looking at recurring patterns. Redundant stops have been removed as much as possible. Future optimizations can focus more on performance. A well-placed lfetch can make all the difference here! Also, the FRAME_Fxx defines in frame.h were mostly bogus. FRAME_F10 to FRAME_F15 were copied from FRAME_F9 and still had the same index. We don't use them yet, so nothing was broken.
95762	30-Apr-2002	marcel	Make this work for ski again. Don't call ia64_mca_init() when we're in the simulator.
95761	30-Apr-2002	marcel	Include md_var.h. It has the prototype of ia64_running_in_simulator().
95710	29-Apr-2002	peter	Tidy up some loose ends. i386/ia64/alpha - catch up to sparc64/ppc: - replace pmap_kernel() with refs to kernel_pmap - change kernel_pmap pointer to (&kernel_pmap_store) (this is a speedup since ld can set these at compile/link time) all platforms (as suggested by jake): - gc unused pmap_reference - gc unused pmap_destroy - gc unused struct pmap.pm_count (we never used pm_count - we track address space sharing at the vmspace)
95519	26-Apr-2002	marcel	Initialize MCA in cpu_startup() so that it's ready before we wake-up the application processors. This allows us to collect unconsumed AP specific error records as part of the wake-up.
95518	26-Apr-2002	marcel	MCA specific code has been moved to a seperate file. It is expected to grow enough to be in the way here.
95517	26-Apr-2002	marcel	Machine Check Architecture (MCA) support code. Error records are collected at boot and made available through sysctl(8). At the moment, the following MIB names are created: hw.mca.count - The number of error records collected. hw.mca.first - The lowest sequence number present. hw.mca.last - The highest sequence number present. hw.mca.<X> - The error record with sequence number <X>. Using sysctl(8) allows us to easily detect and analyze the records, which is very helpful during development of MCA but can also be used in production as a way to collect machine health statistics.
95458	25-Apr-2002	marcel	The official name for McKinley is: Itanium 2
95410	25-Apr-2002	marcel	Don't use the symbol name to lookup the symbol value when we can use the symbol index defined by the relocation. The elf_lookup() support function is to be used by elf_reloc() when symbol lookups need to be done. The elf_lookup() function operates on the symbol index and will do a symbol name based lookup when such is required, otherwise it uses the symbol index directly. This solves the problem seen on ia64 where the symbol hash table does not contain local symbols and a symbol name based lookup would fail for those symbols. Don't pass the symbol name to elf_reloc(), as it isn't used any more.
95245	22-Apr-2002	marcel	Add ia64_sal_init_state(). This function will initialize the machine check handling. In its current form, it only determines the largest amount of state information it can get from SAL and allocates a region 7 memory block for it. The next steps involve: o get and log any unconsumed (NVM stored) error records across reboots, o register an OS_MCA handler and enable machine checks.
95231	21-Apr-2002	marcel	Fix WAW dependency violation on r17 (line 198) that only exists for the SMP case. While on the subject, remove unnecessary stops. I don't know if this resolves the memory corruption I'm seeing, but it does have the potential. We'll see...
95229	21-Apr-2002	marcel	Implement elf_reloc(). The RT specification says that we can expect both Elf_Rel and Elf_Rela types of relocation, so handle them both even though we only have Rel_Rela ATM. We don't handle 32-bit and big-endian variants yet. Support for that is not trivial enough to implement it without any evidence that we ever need it in the near future. For the FPTR relocations, we currently use the fptr_storage used by _reloc() is locore.s. This is in no way a real solution, but for now provides the service we need to get the basics going. A static recursive function lookup_fdesc() is used to find the address of a function in a way that keeps track of the load module so that we can get the correct GP value if we need to construct an OPD (ie there's no OPD yet for the function. For simplicity, we create an OPD for the IPLT relocations as well and simply fill the user provided function descriptor from the OPD. Since the the official descriptors are unique, this has no bad side effects. Note that we ignore the addend for FPTR relocations, but use the addend for IPLT relocations as an offset to the function address. This commit allows us to load and relocate modules and modules appear to work correctly, although we probably need to make sure that we set GP correctly in all cases when we have inter-module calls. This especially applies to assembly coded functions that have cross module calls.
95202	21-Apr-2002	dfr	Setup the child's return values correctly when forking an IA-32 process.
95191	21-Apr-2002	marcel	Improve self-relocation and fix ABI misinterpretation. The changes here mostly mirror the changes made in boot/efi/libefi/arch/ia64/start.S rev 1.5 Significant difference: We don't handle the IPLT relocation here. For barebones KLD support, we make the fptr_storage global.
95025	19-Apr-2002	marcel	Remove the bootinfo kludge. We get the address of the bootinfo block from the loader.
95019	19-Apr-2002	alc	o Remove vm_map_growstack() from ia64's trap_pfault(). o Remove the acquisition and release of Giant from ia64's trap_pfault(). (vm_fault() still acquires it.)
94936	17-Apr-2002	mux	Rework the kernel environment subsystem. We now convert the static environment needed at boot time to a dynamic subsystem when VM is up. The dynamic kernel environment is protected by an sx lock. This adds some new functions to manipulate the kernel environment : freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be called after every getenv() when you have finished using the string. testenv() only tests if an environment variable is present, and doesn't require a freeenv() call. setenv() and unsetenv() are self explanatory. The kenv(2) syscall exports these new functionalities to userland, mainly for kenv(1). Reviewed by: peter
94819	16-Apr-2002	alc	Remove code that updates vm->vm_ssize. This duplicates work already performed by vm_map_growstack().
94779	15-Apr-2002	peter	Fix an "oops!" that turned out to be mostly harmless (but gave a warning). I did this right on the sparc64. Store the direct mapped addresses in the correct variables. Submitted by: jake
94777	15-Apr-2002	peter	Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]() and pmap_copy_page(). This gets rid of a couple more physical addresses in upper layers, with the eventual aim of supporting PAE and dealing with the physical addressing mostly within pmap. (We will need either 64 bit physical addresses or page indexes, possibly both depending on the circumstances. Leaving this to pmap itself gives more flexibilitly.) Reviewed by: jake Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)
94642	14-Apr-2002	marcel	Dotting the i-s: o Use chunk instead of region when we talk about a memory range. Region can be confused with region register and we already call it chunk in machdep.c o Update the twiddle every 16MB
94639	14-Apr-2002	peter	Allow a kernel to be compiled with both SKI and acpica and still work on real hardware. (SKI used to break the sapic probes)
94628	13-Apr-2002	alc	Add comment that sigreturn() is MPSAFE.
94496	12-Apr-2002	dfr	Initialise ar.cflg, which contains the IA-32 registers cr0 and cr4. Since all IA-32 processes use the same values for cr0 and cr4, we initialise them at system startup.
94495	12-Apr-2002	dfr	Print extra information in printtrap() if the interrupted state was for an IA-32 process. Don't sign extend arguments in ia32_syscall - its not normally going to be useful (e.g. pointers need to be zero extended).
94481	12-Apr-2002	peter	Really fix uniprocessor on IA64. Note to self: do not use variables before they are initialized. I had correctly figured out that the UP problem was the pcpu current_pmap thing, but didn't fix it right last time.
94378	10-Apr-2002	dfr	Save and restore the IA-32 state in cpu_switch(). Probably should only do this if the thread has been executing IA-32 code.
94377	10-Apr-2002	dfr	Add suhword() and fuhword() for accessing 32-bit values ("half words") in userland. All these functions should be renamed to be explicit about the size of value being read or written.
94376	10-Apr-2002	dfr	Add exception and syscall support for executing IA-32 binaries.
94365	10-Apr-2002	dfr	Call ast() from the syscall exit path as well as for full exception restores.
94364	10-Apr-2002	dfr	Initialise PCPU_GET(current_pmap) in pmap_bootstrap - cpu_switch needs to be sure that it is always correct and this was not true for the first call to cpu_switch. When thread0 resumed later, it ended up calling pmap_install with a null pmap, which is bad.
94275	09-Apr-2002	phk	GC various bits and pieces of USERCONFIG from all over the place.
94270	09-Apr-2002	dfr	Don't call make_dev from ssccnattach - its far too early to work properly.
93997	06-Apr-2002	marcel	Add prototype for bootpc_init when BOOTP is defined.
93933	06-Apr-2002	marcel	Fix a braino in the alignment of the segment contents after dumping the program headers. As a result of this, dumplo was advanced too much causing the end of the dump and most notably the trailing dump header to be written beyond the end of the the dump medium.
93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
93794	04-Apr-2002	brian	Back out the previous commit. In the i386 case, options BOOTP requires options NFS_ROOT as well as options NFSCLIENT. With both the NFS options, a bootpc_init() prototype is brought in by nfsclient/nfsdiskless.h. In the ia64 case, it just doesn't work and my change just pushes it further away from working. Suggested to be wrong by: bde
93793	04-Apr-2002	bde	Moved signal handling and rescheduling from userret() to ast() so that they aren't in the usual path of execution for syscalls and traps. The main complication for this is that we have to set flags to control ast() everywhere that changes the signal mask. Avoid locking in userret() in most of the remaining cases. Submitted by: luoqi (first part only, long ago, reorganized by me) Reminded by: dillon
93785	04-Apr-2002	brian	Pre-declare bootpc_init() so that options BOOTP doesn't break the build in ia64 and i386 due to -Werror.
93761	04-Apr-2002	alc	o Kill the MD grow_stack(). Call the MI vm_map_growstack() in its place. o Eliminate the use of useracc() and grow_stack() from sendsig(). Reviewed by: peter
93717	03-Apr-2002	marcel	Make the kernel dump header endianness invariant by always dumping in dump byte order (=network byte order). Swap blocksize and dumptime to avoid extraneous padding on 64-bit architectures. Use CTASSERT instead of runtime checks to make sure the header is 512 bytes large. Various style(9) fixes. Reviewed by: phk, bde, mike
93713	03-Apr-2002	marcel	o GC dumplo o Replace the string lit. "ia64" with MACHINE
93712	03-Apr-2002	marcel	Use a twiddle to show that we're busy dumping. The initial code emitted the total number of pages it still had to dump prior to dumping a block of up to 16 pages. For a 128MB region this would result in 8M number of printf()s. Barf! The problem in general is that memory typically has one really big region and a number of "scattered" smaller regions. Some may even be just a few pages. The twiddle works best for now, but it doesn't really give a good progress indication for the large regions. Those are the cases where you definitely want good PI to avoid having the user turn into a twiddle :-)
93702	02-Apr-2002	jhb	- Move the MI mutexes sched_lock and Giant from being declared in the various machdep.c's to being declared in kern_mutex.c. - Add a new function mutex_init() used to perform early initialization needed for mutexes such as setting up thread0's contested lock list and initializing MI mutexes. Change the various MD startup routines to call this function instead of duplicating all the code themselves. Tested on: alpha, i386
93647	02-Apr-2002	marcel	Initial implementation of the ia64 kernel dumper. The dumper constructs an ELF image, consisting of the ELF header, for each memory region a program header, followed by the memory contents for each region. It does blocked I/O for the headers as they are typically smaller than DEV_BSIZE.
93627	02-Apr-2002	marcel	o GC totalphysmem and resvmem. o Rephrase comment describing that the memory region can contain the kernel.
93607	01-Apr-2002	dillon	Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter() and cpu_critical_exit() and moves associated critical prototypes into their own header file, <arch>/<arch>/critical.h, which is only included by the three MI source files that need it. Backout and re-apply improperly comitted syntactical cleanups made to files that were still under active development. Backout improperly comitted program structure changes that moved localized declarations to the top of two procedures. Partially re-apply one of the program structure changes to move 'mask' into an intermediate block rather then in three separate sub-blocks to make the code more readable. Re-integrate bug fixes that Jake made to the sparc64 code. Note: In general, developers should not gratuitously move declarations out of sub-blocks. They are where they are for reasons of structure, grouping, readability, compiler-localizability, and to avoid developer-introduced bugs similar to several found in recent years in the VFS and VM code. Reviewed by: jake
93593	01-Apr-2002	jhb	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
93467	31-Mar-2002	phk	Centralize the "bootdev" and "dumpdev" variables. They are still pretty bogus all things considered, but at least now they don't camouflage as being MD variables.
93458	30-Mar-2002	marcel	Transition to a model where the loader passes the address of the bootinfo block in register r8. In locore.s we save the address in the global variable 'pa_bootinfo'. In machdep.c we compare this value against the hardwired address, but don't depend on its validity yet (ie: we still expect the bootinfo block to be at the hardwired address). After a small amount of time, we'll flip the switch and depend on the loader to pass us the address. From that moment on the loader is free to put it anywhere it likes, provided the machine itself likes it as well. Add some verbosity to aid in the transition. We emit a message if the loader didn't pass the address and we also emit a message if there's no bootinfo block at the hardwired address. While in locore.s, reduce the number of redundant serialization instructions. A srlz.i is a proper superset of a srlz.d and thus is a valid replacement. Also slightly reorder the movl instructions to improve bundle density.
93389	29-Mar-2002	jake	Remove abuse of intr_disable/restore in MI code by moving the loop in ast() back into the calling MD code. The MD code must ensure no races between checking the astpening flag and returning to usermode. Submitted by: peter (ia64 bits) Tested on: alpha (peter, jeff), i386, ia64 (peter), sparc64
93312	28-Mar-2002	obrien	style(9) Approved by: jake
93273	27-Mar-2002	jeff	Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares. Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone. Approved by: jhb
93264	27-Mar-2002	dillon	Compromise for critical()/cpu_critical() recommit. Cleanup the interrupt disablement assumptions in kern_fork.c by adding another API call, cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it from MI to MD. Temporarily move cpu_critical() from <arch>/include/cpufunc.h to <arch>/<arch>/critical.c (stage-2 will clean this up). Implement interrupt deferral for i386 that allows interrupts to remain enabled inside critical sections. This also fixes an IPI interlock bug, and requires uses of icu_lock to be enclosed in a true interrupt disablement. This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized, and will move cpu_critical() into its own header file(s) + other things. This commit may break non-i386 architectures in trivial ways. This should be temporary. Reviewed by: core Approved by: core
93256	27-Mar-2002	marcel	o Revert previous commit in asm.h. There's no need to undefine __FBSDID first, because it should not be defined at all, o Remove inclusion of cdefs.h in locore.s. Pointed out by: peter
92870	21-Mar-2002	dfr	Change critical_t to register_t for intr_disable/restore.
92865	21-Mar-2002	peter	In UP mode, the primary cpu's per-cpu current_pmap was not initialized - this was only done as a side effect of calling cpu_mp_start(). I haven't actually tested that this fixes UP kernels, but it feels about right.
92851	21-Mar-2002	jeff	Remove references to vm_zone.h and switch over to the new uma API. Approved by: peter
92843	20-Mar-2002	alfred	Remove __P. Reviewd by: peter
92824	20-Mar-2002	jhb	Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined. Instead of caching the ucred reference, just go ahead and eat the decerement and increment of the refcount. Now that Giant is pushed down into crfree(), we no longer have to get Giant in the common case. In the case when we are actually free'ing the ucred, we would normally free it on the next kernel entry, so the cost there is not new, just in a different place. This also removse td_cache_ucred from struct thread. This is still only done #ifdef DIAGNOSTIC. Tested on: i386, alpha
92805	20-Mar-2002	dfr	Change intr_enable to intr_restore for consistency with sparc64.
92782	20-Mar-2002	dfr	Replace calls to cpu_critical_enter/exit with appropriate calls to either explicitly disable interrupts or use a real critical section, as appropriate.
92696	19-Mar-2002	peter	#if 0 some unused variables (only in #if 0 code)
92677	19-Mar-2002	peter	My ia64 box for some reason likes to fragment the beginning/end of memory a bit before handing it over to the OS. I occasionally have 11 segments with several 8K or so fragments depending on nvram settings and what I have done under loader(8) before booting. This needs to be revisited.
92676	19-Mar-2002	peter	Fix some unused variables.
92675	19-Mar-2002	peter	Move a couple of prototypes together instead of being incompletely scattered around.
92674	19-Mar-2002	peter	__func__ is a const char *, not a "string" that can be concatenated.
92673	19-Mar-2002	peter	Fix a pointer/int warning
92672	19-Mar-2002	peter	#ifdef SMP some variables that are only used elsewhere under #ifdef SMP also.
92671	19-Mar-2002	peter	Work around an apparent compiler bug with gcc-3.1, although it might be a language feature that I do not know about. gcc is complaining about a left shift >= sizeof type, even when shifting a (cast) 64 bit type left by 43 bits.
92669	19-Mar-2002	peter	#if 0 out some unused code.
92668	19-Mar-2002	peter	Add some #includes after things got broken with the last round of MI include file (<sys/smp.h> I think) tweaks.
92667	19-Mar-2002	peter	Turn off the ia64 ITC timecounter when SMP is present since it has the same problem as the TSC on the x86 - ie: it is not synchronized. #if 0 out some unused functions, ia64 doesn't calibrate clocks yet.
92654	19-Mar-2002	jeff	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
92552	18-Mar-2002	dfr	Fix spelling.
92321	15-Mar-2002	dfr	* Stop other cpus when one cpu enters DDB and restart them after it leaves. * Add a sync.i instruction to the code which writes out breakpoints to ensure that the breakpoint is seem by all cpus in the coherence domain.
92318	15-Mar-2002	dfr	* Remove a breakpoint() I accidentally left in for debugging :-(. * Make cpu_mp_probe() work before the VM system is available and initialise mp_maxid accordingly.
92287	14-Mar-2002	dfr	Tweak the AP startup code somewhat. With all the other recent changes, this now works pretty well for two processors at least. Submitted by: marcel, mostly.
92286	14-Mar-2002	dfr	* Initialise pcb_pmap for new threads. * Add support for forking new threads from &thread0 as well as curthread.
92285	14-Mar-2002	dfr	* Save and restore PCPU_GET(current_pmap) in pcb_pmap so that we don't lose if a process is preempted while pmap is temporarily switched to another pmap. * For SMP, drop the high-fp state when a thread is switched away from so that if another cpu resumes that thread, it doesn't have to play games with IPI to get ahold of the correct register values.
92281	14-Mar-2002	dfr	Add pcpu.pc_current_pmap and pcb.pcb_pmap.
92268	14-Mar-2002	dfr	* Add some KTR messages for IPIs. * Don't call ast() from interrupt() - if we switch, then we will miss writing cr.eoi which will prevent the current cpu from receiving interrupts until the current thread is resumed. The call to ast() happens magically in exception_restore where it is safe. * Add DDB 'show irq' command to examine interrupt hardware state.
92267	14-Mar-2002	dfr	Add debug code to print SAPIC registers.
92263	14-Mar-2002	dfr	* Use a mutex to protect the RID allocator. * Use ptc.g instead of ptc.l so that TLB shootdowns are broadcast to the coherence domain. * Use smp_rendezvous for pmap_invalidate_all to ensure it happens on all cpus. * Dike out a DIAGNOSTIC printf which didn't compile. * Protect the internals of pmap_install with cpu_critical_enter/exit.
92262	14-Mar-2002	dfr	Move the call to pmap_bootstrap to after the initialisation of thread0. This allows us to use mutexes in pmap safely. Also initialise fpcurthread for cpu0 so that ia64_fpstate_check doesn't barf during boot.
92249	14-Mar-2002	dfr	Don't restore r13 when returning to kernel mode. We may have migrated to a different cpu since the exception_save and r13 needs to point at the current cpu's pcpu structure.
92123	12-Mar-2002	peter	Fix a warning (make ucontext_t *ucp a const)
92122	12-Mar-2002	peter	Stop concatenating __func__ with strings
92105	11-Mar-2002	jhb	Fix a misspelling of mine: s/optomization/optimization/. Noticed by: bmilekic
92020	10-Mar-2002	dfr	Add an implementation of cpu_throw() and make restorectx() simply branch to the tail of cpu_switch.
92019	10-Mar-2002	dfr	Don't try to print the arguments if the value of bsp is outside the kernel - its asking for trouble.
92010	10-Mar-2002	dfr	Use the right value for the region length in parse_spill_mask.
91779	07-Mar-2002	jake	Include machine/smp.h.
91669	05-Mar-2002	marcel	Call ast() only when we're handling a user trap.
91635	04-Mar-2002	dfr	Add emulation support for PAL_VM_SUMMARY.
91598	03-Mar-2002	dfr	* Include <sys/ucontext.h> so that this compiles again. * Move the section which manipulates ia64_pal_base to after cninit() so that we don't risk printing anything before we have a console. * Don't call ia64_probe_sapics() for a SKI build. This should really be dependant on ACPICA being present or something.
91504	28-Feb-2002	arr	- Move a comment from being on the same line as a #ifdef to the line following it. This should have gone in the previous commit, but misviewed Bruce's patch. Requested by: bde
91475	28-Feb-2002	arr	- Fix panic() message and a couple style nits that snuck in from the recent diagnostics commit (rev. 1.84).
91406	27-Feb-2002	jhb	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
91403	27-Feb-2002	silby	Fix a horribly suboptimal algorithm in the vm_daemon. In order to determine what to page out, the vm_daemon checks reference bits on all pages belonging to all processes. Unfortunately, the algorithm used reacted badly with shared pages; each shared page would be checked once per process sharing it; this caused an O(N^2) growth of tlb invalidations. The algorithm has been changed so that each page will be checked only 16 times. Prior to this change, a fork/sleepbomb of 1300 processes could cause the vm_daemon to take over 60 seconds to complete, effectively freezing the system for that time period. With this change in place, the vm_daemon completes in less than a second. Any system with hundreds of processes sharing pages should benefit from this change. Note that the vm_daemon is only run when the system is under extreme memory pressure. It is likely that many people with loaded systems saw no symptoms of this problem until they reached the point where swapping began. Special thanks go to dillon, peter, and Chuck Cranor, who helped me get up to speed with vm internals. PR: 33542, 20393 Reviewed by: dillon MFC after: 1 week
91090	22-Feb-2002	julian	Add some DIAGNOSTIC code. While in userland, keep the thread's ucred reference in a shadow field so that the usual place to store it is NULL. If DIAGNOSTIC is not set, the thread ucred is kept valid until the next kernel entry, at which time it is checked against the process cred and possibly corrected. Produces a BIG speedup in kernels with INVARIANTS set. (A previous commit corrected it for the non INVARIANTS case already) Reviewed by: dillon@freebsd.org
91066	22-Feb-2002	phk	Convert p->p_runtime and PCPU(switchtime) to bintime format.
90892	19-Feb-2002	julian	Duplicate the changes to i386 to keep creds over the user boundary.
90361	07-Feb-2002	julian	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
90344	07-Feb-2002	phk	GC the PC_SWITCH* symbols which are not used in assembly anymore.
90065	01-Feb-2002	bde	Compile osigreturn() unconditionally since it will always be needed on some arches and the syscall table is machine-independent. It was (bogusly) conditional on COMPAT_43, so this usually makes no difference. ia64: in addition: - replace the bogus cloned comment before osigreturn() by a correct one. osigreturn() is just a stub fo ia64's. - fix the formatting of cloned comment before sigreturn(). - fix the return code. use nosys() instead of returning ENOSYS to get the same semantics as if the syscall is not in the syscall table. Generating SIGSYS is actually correct here. - fix style bugs. powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment. sparc64: copy the cleaned up the ia64 stub, since there was no stub before.
89493	18-Jan-2002	marcel	Add a definition of ddb_regs. ddb_regs is declared as extern in db_machdep.h to fix the link failure (multiple definitions) caused by disabling the emission of common symbols. As a result, there were no definitions at all. While here, remove useless declarations.
89492	18-Jan-2002	marcel	Remove the definition of bootverbose. This fixes the link failure caused by disabling the emission of common symbols.
88903	05-Jan-2002	peter	Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).
88900	05-Jan-2002	jhb	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
88727	30-Dec-2001	marcel	Revert previous definition of cpu_throw(). Non-MP configurations were broken as well.
88695	30-Dec-2001	marcel	Better implement SMP support: o Do not use a special struct to keep track of CPUs we found; instead, use struct pcpu. This handles all the magic WRT thread creation (yay!). o Respect MAXCPU. o Use the vhpt_base and vhpt_size values to initialize the AP. o Style fixes. Note that this commit temporarily breaks SMP configurations. Previously APs didn't do anything, but they now enter the scheduler. They hold sched_lock for more than 5 secs though and cause a panic. That's what I call progress :-)
88693	30-Dec-2001	marcel	o Reimplement map_pal_code to work with a global variable ia64_pal_base instead of scanning the EFI tables. This way AP startup code can more easily use the function. o Initialize ia64_pal_base in ia64_init(). When the PAL code doesn't need explicit mapping or no PAL code has been found, ia64_pal_base will be 0. o Remove some unused global variables. o Also in ia64_init(), allocate only 1 page for struct pcpu and remove some Alpha leftovers. o Initialize pc_pcb in cpu_pcpu_init().
88692	30-Dec-2001	marcel	Make vhpt_base and vhpt_size globals so that they can be used by the AP startup code.
88689	30-Dec-2001	marcel	o Remove temporary implementation of cpu_throw in vm_machdep.c and instead make it an alternate entry-point of cpu_switch() in swtch.s o Add SMP support to cpu_switch().
88687	30-Dec-2001	marcel	Draft implementation of IPI handling.
88686	30-Dec-2001	marcel	Add PC_IDLETHREAD. We need it in cpu_switch.
88685	30-Dec-2001	marcel	Add missing predicate in interruption_Data_TLB. Without this predicate we never used the VHPT entry we found. While here, normalize the compares.
88245	20-Dec-2001	peter	Replace a bunch of: for (pv = TAILQ_FIRST(&m->md.pv_list); pv; pv = TAILQ_NEXT(pv, pv_list)) { with: TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {
88088	18-Dec-2001	jhb	Modify the critical section API as follows: - The MD functions critical_enter/exit are renamed to start with a cpu_ prefix. - MI wrapper functions critical_enter/exit maintain a per-thread nesting count and a per-thread critical section saved state set when entering a critical section while at nesting level 0 and restored when exiting to nesting level 0. This moves the saved state out of spin mutexes so that interlocking spin mutexes works properly. - Most low-level MD code that used critical_enter/exit now use cpu_critical_enter/exit. MI code such as device drivers and spin mutexes use the MI wrappers. Note that since the MI wrappers store the state in the current thread, they do not have any return values or arguments. - mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is assigned to curthread->td_savecrit during fork_exit(). Tested on: i386, alpha
87702	11-Dec-2001	jhb	Overhaul the per-CPU support a bit: - The MI portions of struct globaldata have been consolidated into a MI struct pcpu. The MD per-CPU data are specified via a macro defined in machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the interface would be cleaner (PCPU_GET(my_md_field) vs. PCPU_GET(md.md_my_md_field)). - All references to globaldata are changed to pcpu instead. In a UP kernel, this data was stored as global variables which is where the original name came from. In an SMP world this data is per-CPU and ideally private to each CPU outside of the context of debuggers. This also included combining machine/globaldata.h and machine/globals.h into machine/pcpu.h. - The pointer to the thread using the FPU on i386 was renamed from npxthread to fpcurthread to be identical with other architectures. - Make the show pcpu ddb command MI with a MD callout to display MD fields. - The globaldata_register() function was renamed to pcpu_init() and now init's MI fields of a struct pcpu in addition to registering it with the internal array and list. - A pcpu_destroy() function was added to remove a struct pcpu from the internal array and list. Tested on: alpha, i386 Reviewed by: peter, jake
87546	09-Dec-2001	dillon	Allow maxusers to be specified as 0 in the kernel config, which will cause the system to auto-size to between 32 and 512 depending on the amount of memory. MFC after: 1 week
87119	30-Nov-2001	dfr	* Don't use critical_enter/critical_exit when accessing the VHPT - its pointless and would be inadequate for SMP systems. We will rely on the VM system's locks to serialise this for now. * Change pmap_remove() so that if the range being removed is larger than the number of pages mapped by the pmap, we iterate over the currently mapped pages instead of over the virtual address range. This should make a difference when removing large virtual address ranges from an address space.
86951	27-Nov-2001	dfr	Minor tweaks to the TLB handling code - avoid movl instructions and add itc.x instructions to attempt to avoid the little flurry of TLB exceptions for handling access, dirty etc.
86593	19-Nov-2001	peter	s/code/ucode/ (last minute typo)
86592	19-Nov-2001	peter	Initial cut at calling the EFI-provided FPSWA (Floating Point Software Assist) driver to handle the "messy" floating point cases which cause traps to the kernel for handling.
86443	16-Nov-2001	peter	Oops, I accidently merged a whitespace error from the original commit. (whitespace at end of line in rev 1.264 pmap.c). Fix them all.
86442	16-Nov-2001	peter	Merge rev 1.264 from i386/pmap.c (tegge via alfred): Protect against an infinite loop when prefaulting pages. This can happen when the vm system maps past the end of an object or tries to map a zero length object, the pmap layer misses the fact that offsets wrap into negative numbers and we get stuck.
86441	16-Nov-2001	peter	Merge rev 1.202 from i386/pmap.c (back in 1998 by John Dyson): Make flushing dirty pages work correctly on filesystems that unexpectedly do not complete writes even with sync I/O requests. This should help the behavior of mmaped files when using softupdates (and perhaps in other circumstances also.)
86440	16-Nov-2001	peter	Merge rev 1.293 of i386/pmap.c - skip PG_UNMANAGED in pmap_collect()
86438	16-Nov-2001	peter	Converge with i386/pmap.c - dont refer to curproc, use curthread.
86435	16-Nov-2001	peter	As part of a general cleanup and reconvergence of related pmap code, start tidying up some loose ends. The DEBUG_VA stuff has long since passed its use-by date. It wasn't used on ia64 but got cut/pasted there.
86294	12-Nov-2001	peter	Implement eficlock_set() to set hardware clock.
86291	12-Nov-2001	marcel	o os_boot_rendez is responsible for clearing the IRR bit by reading cr.ivr, as well as writing to cr.eoi. o use global variables to pass information to os_boot_rendez so that it doesn't have to jump through hoops to find it out. This avoids traps on the AP without it even being initialized. This fixes SMP configurations. o Move the probing of the MADT to the end of cpu_startup, instead of at the start of cpu_mp_probe. We need to probe the MADT for non-SMP configurations as well. This fixes uniprocessor configurations. o Serialize AP wake-up by waiting for the AP. We need to do this since we use global variables to for the AP to use. As a side-effect, we can use printf() more easily to see what's going on.
86290	12-Nov-2001	marcel	Invoke trap() for the alt. ITLB and alt. DTLB interrrupts when the region is not 6 or 7. This changes the behaviour from inserting a bogus region 6 mapping to a kernel panic.
86286	12-Nov-2001	peter	Remove #if 0'ed code that was replaced by vm_ksubmap_init() and GC'ed on other platforms.
86239	10-Nov-2001	marcel	Avoid using the .align directive to skip to the next vector offset. It doesn't help us catch overflowing vector entries at compile time. Instead use the .org directive. The last entry in the IVT doesn't strictly need to be limited to 256 bytes, but doing so allows the the VHPT to be placed immediately following the IVT without wasting any space due to alignment.
86213	09-Nov-2001	dfr	* Make sure we increment pm_stats.resident_count in pmap_enter_quick * Re-organise RID allocation so that we don't accidentally give a RID to two different processes. Also randomise the order to try to reduce collisions in VHPT and TLB. Don't allocate RIDs for regions which are unused. * Allocate space for VHPT based on the size of physical memory. More tuning is needed here. * Add sysctl instrumentation for VHPT - see sysctl vm.stats.vhpt * Fix a bug in pmap_prefault() which prevented it from actually adding pages to the pmap. * Remove ancient dead debugging code. * Add DDB commands for examining translation registers and region registers. The first change fixes the 'free/cache page %p was dirty' panic which I have been seeing when the system is put under moderate load. It also fixes the negative RSS values in ps which have been confusing me for a while. With this set of changes the ia64 port is reliable enough to build its own kernels, even with a 20-way parallel build. Next stop buildworld.
86212	09-Nov-2001	dfr	Raise SIGILL for General Exceptions - its closer to the correct meaning.
86211	09-Nov-2001	dfr	Reserve more space for phys_avail. Really need to be more careful about overflowing phys_avail.
86210	09-Nov-2001	dfr	Teach DDB about branch registers.
86204	09-Nov-2001	marcel	Implement os_boot_rendez. Application processors are initialized and brought to a point where kernel specific initializations can be done. That will be the next step...
86069	05-Nov-2001	marcel	Don't pass os_boot_rendez directly to SAL_SET_VECTORS, because it's actually the address of the function descriptor. The fdesc has both the address of the function and it's corresponding gp value. Now that we have a gp value, use it instead of passing 0.
85930	03-Nov-2001	dillon	Implement i386/i386/pmap.c 1.292 for alpha, ia64 (avoid free page exhaustion / kernel panic for certain madvise() scenarios)
85866	02-Nov-2001	dfr	Call ast() from exception_restore when we are restoring to user mode.
85865	02-Nov-2001	dfr	Use static storage for the unwind state so that we can still get backtraces when the VM system is hosed.
85856	02-Nov-2001	dfr	Remember to actually free the pv_entry in pmap_remove_entry().
85852	02-Nov-2001	peter	argh! cut/paste typo. :-( (committed on a different machine to what I was testing it on)
85850	02-Nov-2001	peter	"Fix" a problem that got copied from alpha to ia64 and broke there. When we truncate the msgbuf size because the last chunk is too small, correctly terminate the phys_avail[] array - the VM system tests the end for zero, not the start. This leads the VM startup to attempt to recreate a duplicate set of pages for all physical memory. XXX the msgbuf handling is suspiciously different on i386 vs alpha/ia64...
85785	31-Oct-2001	dfr	Experiment with rewriting the syscall() wrapper using explicit bundling and trying to reduce stalls from reading certain high latency registers. This should be faster than the old syscall code. Its certainly a lot smaller.
85777	31-Oct-2001	dfr	Add TF_AR_FPSR, the offset of ar.fpsr in a trapframe.
85766	31-Oct-2001	dfr	Print the bundle template name on the first slot of the bundle.
85685	29-Oct-2001	dfr	* Factor out common code for manipulating the RSE backing store. * Implement a fairly simplistic parser for unwinding stack frames. * Use unwind records for DDB's 'trace' command. Also add support for tracing past exceptions to the context which generated the exception. The stack unwind code requires a toolchain based on binutils-2.11.2 or later and gcc-3.0.1 or later.
85684	29-Oct-2001	dfr	Make the various bits of SMP code conditional on SMP so that I can still build non-SMP kernels.
85682	29-Oct-2001	dfr	Various fixes to make stack traces using the unwind tables work properly.
85681	29-Oct-2001	dfr	Fix disassembly of 'add a=b,c,1' and make the disassembly of the various break and nops consistent.
85674	29-Oct-2001	marcel	o Send a test IPI from the BSP to itself at the same time APs are woken up. o Make IPIs synchronuous by default. If we want asynchronuous IPIs, we may want to make the memory fence controllable.
85670	29-Oct-2001	marcel	Make the clock vector 255 instead of 240. On Lion boxes, 240 is the AP wake-up vector. We probably want a more dynamic approach to assigning vectors in the future...
85656	29-Oct-2001	marcel	o Do not parse the MADT as a side-effect in AcpiOsGetRootPointer, do it as a side-effect of probing for MP hardware. This allows us to scan for local SAPICs early (especially before MBUF initialization). o Fix the Local SAPIC structure so that matches the Local SAPIC table entry. Now that the Local SAPIC info is the same as the Local APIC info, stop dumping the Local APIC entries. o For every Local SAPIC entry in the MADT that's not disabled, let the SMP code know about it. They represent actual CPUs. o Register the OS_BOOT_RENDEZ entry point and provide a (bogus) implementation for the entry point. o Provide a mapping for internal IPI numbers to ExtINT vectors. o In a MP system, announce the CPUs and start them by sending IPI_AP_WAKEUP to each of them. Not that it makes a difference at this time :-) o Miscellaneous style fixes and other adjustments.
85525	26-Oct-2001	jhb	Add a per-thread ucred reference for syscalls and synchronous traps from userland. The per thread ucred reference is immutable and thus needs no locks to be read. However, until all the proc locking associated with writes to p_ucred are completed, it is still not safe to use the per-thread reference. Tested on: x86 (SMP), alpha, sparc64
85439	24-Oct-2001	dfr	* Clear the TLB on boot. * If a pte for a location given to pmap_enter_quick is valid, just give up - don't panic, even if the mapping is different.
85438	24-Oct-2001	dfr	If we get an unhandled page fault in kernel mode, either panic (if pcb_onfault is not set) or arrange to restart at the location in pcb_onfault. This ought to help the stability of a system under moderate load. It certainly stops DDB from hanging the kernel when it tries to access a non-present page.
85401	24-Oct-2001	marcel	Remove call to cninit_finish. This is part of the multiple low-level console support.
85383	23-Oct-2001	peter	Fix RAW dependency violation when compiled with gcc-3 Warning: Use of 'br.ret.sptk.many' violates RAW dependency 'PSR.tb' (data)
85360	23-Oct-2001	peter	Turn off the single-user override. We've been running multi-user for some time. Having a machine boot unattended is useful. :-)
85330	22-Oct-2001	dfr	In the signal trampoline, flush the register stack before calling sigreturn. This appears to fix the last set of problems with csh.
85304	22-Oct-2001	obrien	Setup for a 200MB FS -- 209715200/512= 409600 sectors. (DFR's latest ia64-root-*.tar.gz leaves only 7.7M avail when created by dd if=/dev/zero of=ia64-root.fs bs=1024k count=200)
85297	21-Oct-2001	des	Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to proc_* in the process; procfs_machdep.c is no longer needed. Run-tested on i386, build-tested on Alpha, untested on other platforms.
85293	21-Oct-2001	des	{set,fill}_{,fp,db}regs() fixup: - Add dummy {set,fill}_dbregs() on architectures that don't have them. - KSEfy the powerpc versions (struct proc -> struct thread). - Some architectures had the prototypes in md_var.h, some in reg.h, and some in both; for consistency, move them to reg.h on all platforms. These functions aren't really MD (the implementation is MD, but the interface is MI), so they should move to an MI header, but I haven't figured out which {set,fill}_{,fp,db}regs() fixup: - Add dummy {set,fill}_dbregs() on architectures that don't have them. - KSEfy the powerpc versions (struct proc -> struct thread). - Some architectures had the prototypes in md_var.h, some in reg.h, and some in both; for consistency, move them to reg.h on all platforms. These functions aren't really MD (the implementation is MD, but the interface is MI), so they should move to an MI header, but I haven't figured out which one yet. Run-tested on i386, build-tested on Alpha, untested on other platforms.
85286	21-Oct-2001	dfr	Add some more names for bits of trapframe.
85285	21-Oct-2001	dfr	We need to save a bit more information in the partial syscall trapframe in case we need to take a signal.
85284	21-Oct-2001	dfr	Set ar.fpsr to something sane before trying to handle a trap - the user might have trashed it.
85283	21-Oct-2001	dfr	Use ia64_set_fpsr() instead of __asm to set ar.fpsr.
85276	21-Oct-2001	marcel	Implement the IPI send functions. No mapping between IPI message Id and interrupt vector has been made yet.
85211	20-Oct-2001	marcel	Save the AP wake-up vector from the SAL descriptor under SMP. Note that the descriptor is optional. Add a comment to indicate that we want to register the OS_BOOT_RENDEZ here as well.
85210	20-Oct-2001	marcel	Make this compile under option SMP.
85199	19-Oct-2001	dfr	Make a start at an unaligned trap handler. Only integer loads and stores are handled so far.
85195	19-Oct-2001	dfr	Translate various userland traps into SIGBUS (instead of just panicing).
85185	19-Oct-2001	jhb	Remove unneeded sys/mutex.h includes.
85142	19-Oct-2001	dfr	Rework pmap so that it separates the PTE structure from the pv_entry structure. This makes it possible to pre-allocate PTEs for the kernel, which is necessary for a reliable implementation of pmap_kenter(). This also avoids wasting space (about 48 bytes per page) for kernel mappings and user mappings of memory-mapped devices. This also fixes a bug with the previous version where the implementation required the pv_entry structure to be physically contiguous but did not enforce this (the structure size was not a power of two). This meant that the pv_entry free list was quickly corrupted as soon as the system was even mildly loaded.
85109	18-Oct-2001	dfr	Shift the code which packs and unpacks instruction bundles out of DDB since it is useful for various emulations duties (e.g. unaligned trap handling).
85088	18-Oct-2001	marcel	Fix typos in previous commit: o s/sys_narg/sy_narg/ o s/SYS_MPSAFE/SYF_MPSAFE/
85082	17-Oct-2001	jhb	- Small cleanups to the Giant handling in trap(). - Only release Giant in trap() if we locked it, otherwise we could release Giant in a kernel trap if we didn't get it for a page fault and the previous frame had grabbed the lock. - Only get Giant for !MP safe syscalls.
85024	16-Oct-2001	dfr	Size the number of pv_entries we use to bootstrap the pv_entry allocator based on the size of physical memory. This should eliminate the tweaking needed for larger memory configurations.
84966	15-Oct-2001	marcel	When compiling with SKI support, create the fake memory regions when either the memory descriptor in the bootinfo is NULL or the descriptor count is 0.
84876	13-Oct-2001	dfr	Only the first eight arguments can possibly be in stacked registers.
84840	12-Oct-2001	dfr	Pass the correct trapframe pointer to fork_exit - sp is trapframe-16.
84839	12-Oct-2001	dfr	If the faulting instruction is a cmpxchg, then isr.w and isr.r will both be set. We need to check isr.w before isr.r so that we can correctly handle a cmpxchg to a copy-on-write page. This fixes the hang-after-fork problem for dynamically linked programs.
84798	11-Oct-2001	dfr	* Change the calling convention for execve so that it conforms to normal C calling conventions. This allows crt1.c to be written nearly without any inline assembler. * Initialise cpu_model[] so that the hw.model sysctl works properly.
84732	09-Oct-2001	dfr	Clarify a comment. Requested by: jhb
84714	09-Oct-2001	dfr	Don't include isavar.h - we don't need it.
84677	08-Oct-2001	dfr	Make printtrap() more informative.
84637	07-Oct-2001	des	Dissociate ptrace from procfs. Until now, the ptrace syscall was implemented as a wrapper that called various functions in procfs depending on which ptrace operation was requested. Most of these functions were themselves wrappers around procfs_{read,write}_{,db,fp}regs(), with only some extra error checks, which weren't necessary in the ptrace case anyway. This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c (renaming it to proc_rwmem() in the process), and implements ptrace() directly in terms of procfs_{read,write}_{,db,fp}regs() instead of having it fake up a struct uio and then call procfs_do{,db,fp}regs(). It also moves the prototypes for procfs_{read,write}_{,db,fp}regs() and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files except procfs_machdep.c as "optional procfs" instead of "standard".
84632	07-Oct-2001	dfr	* Use srlz.i to serialise changes to psr.ic * Don't enable psr.i at the same time as psr.dt and psr.ic These changes improve stability considerably.
84621	07-Oct-2001	dfr	Remove bogus include.
84592	06-Oct-2001	dfr	Move console probes until after we set boothowto so that 'boot -h' works.
84557	05-Oct-2001	dfr	Add BOOTP support.
84556	05-Oct-2001	dfr	Fix some dependency violations (don't know why gas didn't catch this).
84555	05-Oct-2001	dfr	Use physical addresses, not virtual addresses when calling PHYS_TO_VM_PAGE.
84554	05-Oct-2001	dfr	Eliminate some alpha craziness.
84553	05-Oct-2001	dfr	In in_cksumdata, len must be a signed type.
84543	05-Oct-2001	dfr	Low-level code for programming the I/O SAPIC.
84541	05-Oct-2001	dfr	Wire up most of the interrupt handling infrastructure. Not sure it works right yet but its enough for the ATA probe to work. The SCSI probes which follow are broken though.
84540	05-Oct-2001	dfr	Fix typo which meant that we never actually found the ACPI 2.0 table.
84535	05-Oct-2001	dfr	Disable interrupts when we are in DDB.
84476	04-Oct-2001	dfr	* Don't pretend the object passed to clockattach is a device - it isn't. * Declare itc_frequency properly.
84475	04-Oct-2001	dfr	Use EFI (or some reasonable simulation) to read the RTC.
84474	04-Oct-2001	dfr	Fake the EFI runtime call GetTime.
84381	02-Oct-2001	mjacob	Fix problem where a user buffer outside of the area being tested will be corrupted. PR: 29194 Obtained from: Tor.Egge@fast.no MFC after: 2 weeks
84128	29-Sep-2001	dfr	Various changes to use the firmware on a real machine.
84127	29-Sep-2001	dfr	* Read parameters for ptc.e instruction from PAL Code. * Add pmap_unmapdev().
84126	29-Sep-2001	dfr	Fake PAL Code for SKI.
84124	29-Sep-2001	dfr	Start hooking up devices.
84121	29-Sep-2001	dfr	Add code to initialise firmware resources (and to fake them if we are running in simulation).
84120	29-Sep-2001	dfr	Add shims for calling PAL Code in physical mode.
84118	29-Sep-2001	dfr	Add some move definitions.
84117	29-Sep-2001	dfr	Call cpu_boot from cpu_reset.
84116	29-Sep-2001	dfr	Give up on the backtrace if the calculated pc isn't in region 7.
84115	29-Sep-2001	dfr	Use PAGE_SHIFT instead of a hardcoded constant for log2(PAGE_SIZE).
84114	29-Sep-2001	dfr	* Preserve ar.rsc in ia64_change_mode. * Convert sp to/from physical in ia64_change_mode. * Add a shim for calling EFI procedures in virtual mode.
84113	29-Sep-2001	dfr	Change END(locorestart) to END(__start).
83983	26-Sep-2001	rwatson	o Modify the access control checks for the ia64 /dev/mem (and friends) to use securelevel_gt() instead of direct variable checks. Obtained from: TrustedBSD Project
83964	26-Sep-2001	dfr	Tidy up and fix a runtime warning.
83913	24-Sep-2001	dfr	Use b6 instead of b1 - b1 is supposed to be preserved and b6 is scratch.
83912	24-Sep-2001	dfr	Make the Alternate {I,D} TLB vector code actually work for virtual addresses greater than 256M (the page size for region 6 and 7).
83908	24-Sep-2001	dfr	Don't try to access external files from SKI unless we are actually running in SKI.
83907	24-Sep-2001	dfr	Increase the number of bootstrap PVs.
83906	24-Sep-2001	dfr	Include <machine/pte.h> instead of <machine/pmap.h>
83905	24-Sep-2001	dfr	We need different call stubs for static and stacked calling conventions.
83895	24-Sep-2001	dfr	Fix a few comment typos from the last commit.
83893	24-Sep-2001	dfr	Add some code which can be used to change to/from physical mode when calling various firmware functions.
83837	22-Sep-2001	dfr	Don't activate the ssc console unless we are running in SKI.
83834	22-Sep-2001	dfr	* Turn off memory descriptor debugging - its served its purpose. * Don't get confused when memory regions don't lie on page boundaries - remember our page size is typically larger than the firmware's page size. * Add a function ia64_running_in_simulator() which is intended to detect whether the kernel is running in SKI or on real hardware.
83833	22-Sep-2001	dfr	Remove a redundant stop.
83767	21-Sep-2001	dfr	Fix a warning and make sure we flush the cache after writing an instruction bundle otherwise the CPU won't see the changed bundle.
83734	20-Sep-2001	dfr	If two @fptr relocations refer to the same symbol, use the same fptr structure to resolve them. This is necessary to allow code to compare function pointers.
83733	20-Sep-2001	dfr	Don't clear the single-step bit after a trap - leave it up to the debugger. The code was broken anyway - it clear every bit except the single-step bit (oops).
83732	20-Sep-2001	dfr	The second instruction in an MLX bundle is slot one, not slot two, even though the actual opcode is stored in the value in slot two.
83727	20-Sep-2001	dfr	Tidy.
83718	20-Sep-2001	dfr	Don't include NFS headers. I have no idea why they were here in the first place - NFS has no assembler in it.
83691	20-Sep-2001	peter	Replicate a change from alpha/genassym.c to other arches. This should fix nfs-related build breakage.
83651	18-Sep-2001	peter	Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.
83611	18-Sep-2001	dfr	Flesh out identifycpu().
83522	15-Sep-2001	dfr	Rearrange so we search for I/O port space as early as possible (i.e. before console probing). Also fix a confusion between EFI's page size which is fixed at 4096 and our own page size which is variable at compile time.
83520	15-Sep-2001	dfr	Avoid the region used for thread0's trapframe when setting up the stack for ia64_init. If we use this area for ia64_init's stack, it ends up containing garbage which causes cpu_fork to die horribly later.
83512	15-Sep-2001	dfr	Use the MI console code to initialise the console.
83509	15-Sep-2001	dfr	* Use Intel's EFI headers instead of home-grown ones. * Use the bootinfo's memory map if present instead of hard-coding SKI's memory map. * Record the location of the I/O Port Space if present in the memory map.
83506	15-Sep-2001	dfr	Fill out some gaps in ia64 DDB support. This involves generalising DDB's breakpoint handling slightly to cope with the fact that ia64 instructions are not located on byte boundaries.
83407	13-Sep-2001	dfr	* Enable dynamically linked kernel. This involves adding a self-relocator to locore to process the @fptr relocations in the dynamic executable. * Don't initialise the timer until after we install the timecounter to avoid a race between timecounter initialisation and hardclock. * Tidy up bootinfo somewhat including adding sanity checks for when the kernel is loaded without a recognisable bootinfo.
83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
83359	12-Sep-2001	marcel	o Fix struct ssc_time and enable the SSC call to get the RTC. o Print a message that the TODR is not set in sscclock_set.
83301	10-Sep-2001	dfr	* Make a start on a realistic definition for bootinfo. * Switch to proc0's stack and backing store before calling ia64_init so that we don't rely on the loader's stack at all. * Change kernel entry point name from locorestart to __start.
83276	10-Sep-2001	peter	Rip some well duplicated code out of cpu_wait() and cpu_exit() and move it to the MI area. KSE touched cpu_wait() which had the same change replicated five ways for each platform. Now it can just do it once. The only MD parts seemed to be dealing with fpu state cleanup and things like vm86 cleanup on x86. The rest was identical. XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional stub in place. Reviewed by: jake, tmm, dillon
83223	08-Sep-2001	peter	Missing part of dillon's coredump commit. cpu_coredump() was still passing IO_NODELOCKED to vn_rdwr(), this would cause operations on the unlocked core vnode and softupdates nastiness if an a.out binary cored.
83197	07-Sep-2001	dfr	Typo in comment.
83196	07-Sep-2001	dfr	* Track ref/mod information properly when a mapping changes. * Fix a panic in pmap_remove() for a non-current pmap.
83195	07-Sep-2001	dfr	Remove old setjmp/longjmp stubs.
83163	06-Sep-2001	jhb	Call sendsig() with the proc lock held and return with it held.
82938	04-Sep-2001	peter	Nuke #if 0'ed "setredzone()" stub. We never used it, and probably never will. I've implemented an optional redzone as part of the KSE upage breakup.
82867	03-Sep-2001	dfr	Add a working version of setjmp/longjmp. Obtained from: Intel's EFI toolkit.
82849	03-Sep-2001	peter	Dont conflict with sysctl debug.mddebug
82788	02-Sep-2001	peter	Sync with i386 / alpha. Whitespace unindent / style prep for kse.
82785	02-Sep-2001	peter	Merge from i386: various cleanups including moving the map calculations to MI code. This gets ia64 to compile again.
82632	31-Aug-2001	peter	Same as i386/i386/pmap.c: clean up some style. This is irrelevant since it is inside #if 0'ed code, but it would be a shame if this stuff got cut/pasted elsewhere.
82393	27-Aug-2001	peter	Enable hardwiring of things like tunables from embedded enironments that do not start from loader(8).
82025	21-Aug-2001	peter	Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this stuff.
81493	10-Aug-2001	jhb	- Close races with signals and other AST's being triggered while we are in the process of exiting the kernel. The ast() function now loops as long as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with preemption disabled so that any further AST's that arrive via an interrupt will be delayed until the low-level MD code returns to user mode. - Use u_int's to store the tick counts for profiling purposes so that we do not need sched_lock just to read p_sticks. This also closes a problem where the call to addupc_task() could screw up the arithmetic due to non-atomic reads of p_sticks. - Axe need_proftick(), aston(), astoff(), astpending(), need_resched(), clear_resched(), and resched_wanted() in favor of direct bit operations on p_sflag. - Fix up locking with sched_lock some. In addupc_intr(), use sched_lock to ensure pr_addr and pr_ticks are updated atomically with setting PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing PS_OWEUPC. We also do not grab the lock just to test a flag. - Simplify the handling of Giant in ast() slightly. Reviewed by: bde (mostly)
81265	08-Aug-2001	peter	Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they are a really nasty interface that should have been killed long ago when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they operate on (struct user) will not be around much longer since it is part-per-process and part-per-thread in a post-KSE world. gdb does not actually use this except for the obscure 'info udot' command which does a hexdump of as much of the child's 'struct user' as it can get. It carries its own #defines so it doesn't break compiles.
81255	07-Aug-2001	jhb	Grab Giant arond page faults. ia64 boots again in the simulator now.
81198	06-Aug-2001	dfr	Make this compile again.
81197	06-Aug-2001	dfr	Remove usage of nonexistent vm_mtx.
80729	31-Jul-2001	jhb	GC some obsolete alpha code.
80431	27-Jul-2001	peter	Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel for this, since it is easy to run into with large systems with lots of shared mmap space. Obtained from: yahoo
80421	26-Jul-2001	peter	Call the early tunable setup functions as soon as kern_envp is available. Some things depend on hz being set not long after this.
80399	26-Jul-2001	bmilekic	- Do not handle the per-CPU containers in mbuf code as though the cpuids were indices in a dense array. The cpuids are a sparse set and treat them as such, setting up containers only for CPUs activated during mb_init(). - Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse map, in accordance with the above. This allows us to properly boot with certain CPUs disactivated. However, if we later decide to re-activate said CPUs, we will barf until we decide to implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU containers to get configured on their activation. Reported by: mjacob Partially (sys/ diffs) Submitted by: mjacob
79573	11-Jul-2001	bsd	Add 'hwatch' and 'dhwatch' ddb commands analogous to 'watch' and 'dwatch'. The new commands install hardware watchpoints if supported by the architecture and if there are enough registers to cover the desired memory area. No objection by: audit@, hackers@ MFC after: 2 weeks
79418	08-Jul-2001	julian	A set of changes to reduce the number of include files the kernel takes from /usr/include. I cannot check them on alpha.. (will try beast) Briefly looked at by: Warner Losh <imp@harmony.village.org>
79265	05-Jul-2001	dillon	Move vm_page_zero_idle() from machine-dependant sections to a machine-independant source file, vm/vm_zeroidle.c. It was exactly the same for all platforms and updating them all was getting annoying.
79263	04-Jul-2001	dillon	Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc). Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system. vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)
79224	04-Jul-2001	dillon	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
79123	03-Jul-2001	jhb	Allow Giant to be recursed when a process terminates.
78983	29-Jun-2001	jhb	Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.
78962	29-Jun-2001	jhb	Add a new MI pointer to the process' trapframe p_frame instead of using various differently named pointers buried under p_md. Reviewed by: jake (in principle)
78888	27-Jun-2001	jhb	Catch up to mbuf allocator changes from last September so this compiles again.
78887	27-Jun-2001	jhb	Make this compile again. Broken since June 1.
78762	25-Jun-2001	jhb	Fix cut-n-paste brain-o. Pointy-hat to: me
78636	22-Jun-2001	jhb	- Grab the proc lock around CURSIG and postsig(). Don't release the proc lock until after grabbing the sched_lock to avoid CURSIG racing with psignal. - Don't grab Giant for addupc_task() as it isn't needed. Reported by: tegge (signal race), bde (addupc_task a while back)
78269	15-Jun-2001	peter	oops. prepare_usermode() died in August 2000 in the MI and x86 code. Issue raised by: scottl
77796	06-Jun-2001	jhb	Don't hold sched_lock across addupc_task(). Reported by: David Taylor <davidt@yadt.co.uk> Submitted by: bde
77507	30-May-2001	jhb	Catch up to the axeing of MFS and fix the ia64 build. Forgotten by: a Danish axe-wielder
77448	30-May-2001	jhb	- Catch up to the VM mutex changes. - Sort includes in a few places.
77031	23-May-2001	ru	- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.
76770	17-May-2001	jhb	- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT. - Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be toggled after boot. - Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just afer the display of the copyright message instead of doing it by hand in three MD places.
76659	16-May-2001	jhb	Lock the procfs functions for doing a single step and reading/writing registers better. Hold sched_lock not only for checking the flag but also while performing the actual operation to ensure the process doesn't get swapped out by another CPU while we the operation is being performed.
76650	15-May-2001	jhb	Remove unneeded includes of sys/ipl.h and machine/ipl.h.
76494	11-May-2001	jhb	Simplify the vm fault trap handling code a bit by using if-else instead of duplicating code in the then case and then using a goto to jump around the else case.
76440	10-May-2001	jhb	- Split out the support for per-CPU data from the SMP code. UP kernels have per-CPU data and gdb on the i386 at least needs access to it. - Clean up includes in kern_idle.c and subr_smp.c. Reviewed by: jake
76411	09-May-2001	jhb	Add include of sys/mutex.h and resort include of sys/lock.h.
76410	09-May-2001	jhb	Add needed sys/lock.h include.
76322	06-May-2001	phk	Actually biofinish(struct bio , struct devstat , int error) is more general than the bioerror(). Most of this patch is generated by scripts.
76078	27-Apr-2001	jhb	Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP. - It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the _process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the _process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the _process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the _process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in. Reviewed by: jake, peter Looked over by: eivind
75914	24-Apr-2001	dfr	When switching backing store during signal delivery, do the switch before creating the register frame for calling the handler. Also discard that frame before switching back to the old backing store after the handler returns.
75913	24-Apr-2001	dfr	Align stack pointer and backing store pointer to 16 byte boundary when delivering signals.
75912	24-Apr-2001	dfr	Don't trash the user's pr on syscalls.
75701	19-Apr-2001	dfr	Don't unwrap the function descriptor used as the callout argument to fork_exit(). The MI version of fork_exit() needs a real function descriptor, not a simple function pointer.
75700	19-Apr-2001	dfr	Don't take the Giant mutex for clock interrupts.
75668	18-Apr-2001	dfr	Don't panic when we try to modify the kernel pmap.
75667	18-Apr-2001	dfr	Print an approximation of the function arguments in the stack trace.
75666	18-Apr-2001	dfr	Implement a simple stack trace for DDB. This will have to be redone if/when we change to a more modern toolchain.
75665	18-Apr-2001	dfr	Record the right value for tf_ndirty for kernel interruptions so that we can examine the interrupted register stack frame in DDB.
75421	11-Apr-2001	jhb	Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just "redundant noise" and to match the IPI constant namespace (IPI_*). Requested by: bde
75002	29-Mar-2001	obrien	Reduce the emasculation of bounds_check_with_label() by one line, so we propagate a bio error condition to the caller and above.
74927	28-Mar-2001	jhb	Convert the allproc and proctree locks from lockmgr locks to sx locks.
74912	28-Mar-2001	jhb	Rework the witness code to work with sx locks as well as mutexes. - Introduce lock classes and lock objects. Each lock class specifies a name and set of flags (or properties) shared by all locks of a given type. Currently there are three lock classes: spin mutexes, sleep mutexes, and sx locks. A lock object specifies properties of an additional lock along with a lock name and all of the extra stuff needed to make witness work with a given lock. This abstract lock stuff is defined in sys/lock.h. The lockmgr constants, types, and prototypes have been moved to sys/lockmgr.h. For temporary backwards compatability, sys/lock.h includes sys/lockmgr.h. - Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin locks held. By making this per-cpu, we do not have to jump through magic hoops to deal with sched_lock changing ownership during context switches. - Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with proc->p_sleeplocks, which is a list of held sleep locks including sleep mutexes and sx locks. - Add helper macros for logging lock events via the KTR_LOCK KTR logging level so that the log messages are consistent. - Add some new flags that can be passed to mtx_init(): - MTX_NOWITNESS - specifies that this lock should be ignored by witness. This is used for the mutex that blocks a sx lock for example. - MTX_QUIET - this is not new, but you can pass this to mtx_init() now and no events will be logged for this lock, so that one doesn't have to change all the individual mtx_lock/unlock() operations. - All lock objects maintain an initialized flag. Use this flag to export a mtx_initialized() macro that can be safely called from drivers. Also, we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness performs the corresponding checks using the initialized flag. - The lock order reversal messages have been improved to output slightly more accurate file and line numbers.
74903	28-Mar-2001	jhb	Switch from save/disable/restore_intr() to critical_enter/exit().
74902	28-Mar-2001	jhb	Catch up to the mtx_saveintr -> mtx_savecrit change.
74810	26-Mar-2001	phk	Send the remains (such as I have located) of "block major numbers" to the bit-bucket.
74733	24-Mar-2001	jhb	- Define and use MAXCPU like the alpha and i386 instead of NCPUS. - Sort the sys/mutex.h include in mp_machdep.c into a closer to correct location.
74732	24-Mar-2001	jhb	Stick a prototype for handleclock() in machine/clock.h and include it interrupt.c to quiet a warning.
74031	09-Mar-2001	dfr	Allow the config file to specify a root filesystem filename.
74030	09-Mar-2001	dfr	Adjust a comment slightly.
73936	07-Mar-2001	jhb	Unrevert the pmap_map() changes. They weren't broken on x86. Sense beaten into me by: peter
73931	07-Mar-2001	jhb	- Release Giant a bit earlier on syscall exit. - Don't try to grab Giant before postsig() in userret() as it is no longer needed. - Don't grab Giant before psignal() in ast() but get the proc lock instead.
73929	07-Mar-2001	jhb	Grab the process lock while calling psignal and before calling psignal.
73922	07-Mar-2001	jhb	Use the proc lock to protect p_pptr when waking up our parent in cpu_exit() and remove the mpfixme() message that is now fixed.
73903	07-Mar-2001	jhb	Back out the pmap_map() change for now, it isn't completely stable on the i386.
73867	06-Mar-2001	jhb	Don't psignal() a process from forward_hardclock() but set the appropriate pending flag in p_sflag instead.
73862	06-Mar-2001	jhb	- Rework pmap_map() to take advantage of direct-mapped segments on supported architectures such as the alpha. This allows us to save on kernel virtual address space, TLB entries, and (on the ia64) VHPT entries. pmap_map() now modifies the passed in virtual address on architectures that do not support direct-mapped segments to point to the next available virtual address. It also returns the actual address that the request was mapped to. - On the IA64 don't use a special zone of PV entries needed for early calls to pmap_kenter() during pmap_init(). This gets us in trouble because we end up trying to use the zone allocator before it is initialized. Instead, with the pmap_map() change, the number of needed PV entries is small enough that we can get by with a static pool that is used until pmap_init() is complete. Submitted by: dfr Debugging help: peter Tested by: me
72991	24-Feb-2001	jhb	sched_swi -> swi_sched
72990	24-Feb-2001	jhb	Don't include machine/mutex.h and relocate sys/mutex.h's include to be closer to alphabetical order and identical to that of the alpha.
72907	22-Feb-2001	jhb	Axe pcb_schednest as it is no longer used.
72906	22-Feb-2001	jhb	Rename switch_trampoline() to fork_trampoline() on the alpha and ia64. Suggested by: dfr
72905	22-Feb-2001	jhb	Don't set the sched_lock lesting level for new processes as it is no longer used.
72904	22-Feb-2001	jhb	Catch comments up to child_return() -> fork_return() as well.
72903	22-Feb-2001	jhb	Synch up with the other architectures: - Remove unneeded spl()'s around mi_switch() in userret(). - Don't hold sched_lock across addupc_task(). - Remove the MD function child_return() now that the MI function fork_return() is used instead. - Use TRAPF_USERMODE() instead of dinking with the trapframe directly to check for ast's in kernel mode. - Check astpending(curproc) and resched_wanted() in ast() and return if neither is true. - Use astoff() rather than setting the non-existent per-cpu variable astpending to 0 to clear an ast.
72899	22-Feb-2001	jhb	Use the MI fork_return() fork trampoline callout function for child processes instead of the MD child_return().
72898	22-Feb-2001	jhb	- Don't dink with sched_lock in cpu_switch() since mi_switch() does this for us. - Change the switch_trampoline() to call fork_exit() passing in the required arguments instead of calling the fork trampoline callout function directly. Warning: this hasn't been tested. Looked over by: dfr
72893	22-Feb-2001	jhb	Axe the astpending per-cpu variable.
72884	22-Feb-2001	jhb	Catch up to new MI astpending and need_resched handling.
72376	12-Feb-2001	jake	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
72358	11-Feb-2001	markm	RIP <machine/lock.h>. Some things needed bits of <i386/include/lock.h> - cy.c now has its own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK() has been moved to <i386/include/apic.h> (AKA <machine/apic.h>). Reviewed by: jhb
72226	09-Feb-2001	jhb	Move the initailization of the proc lock for proc0 very early into the MD startup code.
72221	09-Feb-2001	jhb	Remove bogus #if 0'd code that dinked with the saved interrupt state in sched_lock.
72200	09-Feb-2001	bmilekic	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
72011	04-Feb-2001	peter	Clean up some leftovers from the root mount cleanup that was done some time ago. FFS_ROOT and CD9660_ROOT are obsolete.
71984	04-Feb-2001	peter	All the world is not an i386. Merge rev 1.438 of i386/i386/machdep.c. Make buffer_map a system map.
71880	31-Jan-2001	peter	Remove count for NSIO. The only places it was used it were incorrect. (alpha-gdbstub.c got sync'ed up a bit with the i386 version)
71803	29-Jan-2001	dfr	Flesh out EFI support somewhat.
71785	29-Jan-2001	peter	Send "#if NISA > 0" to the bit-bucket and replace it with an option. These were compile-time "is the isa code present?" tests and not 'how many isa busses' tests.
71729	28-Jan-2001	marcel	Improve kernel bootstrapping: o Use objdump instead of gensetdefs(1) to build the linker sets. o Allow overriding of nm and objdump in resp. genassym.sh and gensetdefs.pl for non-native toolchains. Reviewed by: arch Perl improvements: Jos Backus <josb@cncdsl.com>, benno
71684	26-Jan-2001	dfr	Initialise proc0.p_heldmtx and proc0.p_contested and call mtx_enter(&Giant, MTX_DEF) after Giant is initialised. Reviewed by: jhb
71596	24-Jan-2001	dfr	Change cpuno to cpuid.
71595	24-Jan-2001	dfr	Fix typo.
71576	24-Jan-2001	jasone	Convert all simplelocks to mutexes and remove the simplelock implementations.
71553	24-Jan-2001	jhb	- Proc locking. - Update userret() to take a struct trapframe * as a second argument. - Axe have_giant and use mtx_owned(&Giant) where appropriate.
71552	24-Jan-2001	jhb	- Proc locking. - P_FOO -> PS_FOO.
71551	24-Jan-2001	jhb	- Proc locking. - Bring across forwarded_statclock() fixes from i386 and alpha.
71350	21-Jan-2001	des	First step towards an MP-safe zone allocator: - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant. Reviewed by: bmilekic, jasone
71337	21-Jan-2001	jake	Make intr_nesting_level per-process, rather than per-cpu. Setup interrupt threads to run with it always >= 1, so that malloc can detect M_WAITOK from "interrupt" context. This is also necessary in order to context switch from sched_ithd() directly. Reviewed By: peter
71320	21-Jan-2001	jasone	Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex initialization until after malloc() is safe to call, then iterate through all mutexes and complete their initialization. This change is necessary in order to avoid some circular bootstrapping dependencies.
71241	19-Jan-2001	peter	Remove the now-empty ipl_funcs.c file on all platforms.
71240	19-Jan-2001	peter	Remove the static splXXX functions and replace them by static __inline stubs. Remove the xxx_imask variables which have been all but gone for a while.
71228	19-Jan-2001	bmilekic	Implement MTX_RECURSE flag for mtx_init(). All calls to mtx_init() for mutexes that recurse must now include the MTX_RECURSE bit in the flag argument variable. This change is in preparation for an upcoming (further) mutex API cleanup. The witness code will call panic() if a lock is found to recurse but the MTX_RECURSE bit was not set during the lock's initialization. The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to MTX_RECURSED, which is more appropriate given its meaning. The following locks have been made "recursive," thus far: eventhandler, Giant, callout, sched_lock, possibly some others declared in the architecture-specific code, all of the network card driver locks in pci/, as well as some other locks in dev/ stuff that I've found to be recursive. Reviewed by: jhb
70861	10-Jan-2001	jake	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
70509	30-Dec-2000	dfr	Don't include <stddef.h> for offsetof() - its also defined in <sys/types.h>
70507	30-Dec-2000	dfr	Fix typo.
70317	23-Dec-2000	jake	Protect proc.p_pptr and proc.p_children/p_sibling with the proctree_lock. linprocfs not locked pending response from informal maintainer. Reviewed by: jhb, -smp@
70210	20-Dec-2000	marcel	Resolve RAW dependency violation between tbit and adds.
70034	14-Dec-2000	jhb	Remove the "machine dependent" KTR trace buffer ddb commands. The code was exactly the same on all platforms.
69947	13-Dec-2000	jake	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
69881	12-Dec-2000	jake	- Add code to detect if a system call returns with locks other than Giant held and panic if so (conditional on witness). - Change witness_list to return the number of locks held so this is easier. - Add kern/syscalls.c to the kernel build if witness is defined so that the panic message can contain the name of the offending system call. - Add assertions that Giant and sched_lock are not held when returning from a system call, which were missing for alpha and ia64.
69781	08-Dec-2000	dwmalone	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
69586	05-Dec-2000	jake	Remove the last of the MD netisr code. It is now all MI. Remove spending, which was unused now that all software interrupts have their own thread. Make the legacy schednetisr use an atomic op for setting bits in the netisr mask. Reviewed by: jhb
69399	30-Nov-2000	alfred	remove unneded sys/ucred.h includes
69379	30-Nov-2000	marcel	Don't use p->p_sigstk.ss_flags to keep state of whether the process is on the alternate stack or not. For compatibility with sigstack(2) state is being updated if such is needed. We now determine whether the process is on the alternate stack by looking at its stack pointer. This allows a process to siglongjmp from a signal handler on the alternate stack to the place of the sigsetjmp on the normal stack. When maintaining state, this would have invalidated the state information and causing a subsequent signal to be delivered on the normal stack instead of the alternate stack. PR: 22286
69207	26-Nov-2000	jlemon	Add 'mpsafe' parameter to callout_init() in MD bits. Reminded by: jake
69022	22-Nov-2000	jake	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
68889	19-Nov-2000	jake	- Protect the callout wheel with a separate spin mutex, callout_lock. - Use the mutex in hardclock to ensure no races between it and softclock. - Make softclock be INTR_MPSAFE and provide a flag, CALLOUT_MPSAFE, which specifies that a callout handler does not need giant. There is still no way to set this flag when regstering a callout. Reviewed by: -smp@, jlemon
68862	17-Nov-2000	jake	- Split the run queue and sleep queue linkage, so that a process may block on a mutex while on the sleep queue without corrupting it. - Move dropping of Giant to after the acquire of sched_lock. Tested by: John Hay <jhay@icomtek.csir.co.za> jhb
68808	16-Nov-2000	jhb	Don't release and acquire Giant in mi_switch(). Instead, release and acquire Giant as needed in functions that call mi_switch(). The releases need to be done outside of the sched_lock to avoid potential deadlocks from trying to acquire Giant while interrupts are disabled. Submitted by: witness
68762	15-Nov-2000	jhb	Don't perform an mi_switch() when we release Giant during cpu_exit(). We are about to call cpu_switch() anyways. Found by: witness
67708	27-Oct-2000	phk	Convert all users of fldoff() to offsetof(). fldoff() is bad because it only takes a struct tag which makes it impossible to use unions, typedefs etc. Define __offsetof() in <machine/ansi.h> Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h> Remove myriad of local offsetof() definitions. Remove includes of <stddef.h> in kernel code. NB: Kernelcode should never include from /usr/include ! Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API. Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001. Paritials reviews by: various. Significant brucifications by: bde
67636	26-Oct-2000	dfr	Minor build fixes.
67551	25-Oct-2000	jhb	- Overhaul the software interrupt code to use interrupt threads for each type of software interrupt. Roughly, what used to be a bit in spending now maps to a swi thread. Each thread can have multiple handlers, just like a hardware interrupt thread. - Instead of using a bitmask of pending interrupts, we schedule the specific software interrupt thread to run, so spending, NSWI, and the shandlers array are no longer needed. We can now have an arbitrary number of software interrupt threads. When you register a software interrupt thread via sinthand_add(), you get back a struct intrhand that you pass to sched_swi() when you wish to schedule your swi thread to run. - Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit more intuitive. Also, prefix all the members of struct intrhand with 'ih_'. - Make swi_net() a MI function since there is now no point in it being MD. Submitted by: cp
67522	24-Oct-2000	dfr	* Various fixes to breakage introduced by the atomic and mutex reorgs. * Fixes to the signal delivery code. Not quite right yet. I would have preferred to wait until I have signal delivery actually working but the current kernel in CVS doesn't build.
67358	20-Oct-2000	jhb	- machine/mutex.h -> sys/mutex.h - Catch up to the MI mutex structure due to saveflags,saveipl,savepsr becoming saveintr.
67357	20-Oct-2000	jhb	- machine/mutex.h -> sys/mutex.h - Use MUTEX_DECLARE() and MTX_COLD for Giant and sched_lock.
67352	20-Oct-2000	jhb	- Make the mutex code almost completely machine independent. This greatly reducues the maintenance load for the mutex code. The only MD portions of the mutex code are in machine/mutex.h now, which include the assembly macros for handling mutexes as well as optionally overriding the mutex micro-operations. For example, we use optimized micro-ops on the x86 platform #ifndef I386_CPU. - Change the behavior of the SMP_DEBUG kernel option. In the new code, mtx_assert() only depends on INVARIANTS, allowing other kernel developers to have working mutex assertiions without having to include all of the mutex debugging code. The SMP_DEBUG kernel option has been renamed to MUTEX_DEBUG and now just controls extra mutex debugging code. - Abolish the ugly mtx_f hack. Instead, we dynamically allocate seperate mtx_debug structures on the fly in mtx_init, except for mutexes that are initiated very early in the boot process. These mutexes are declared using a special MUTEX_DECLARE() macro, and use a new flag MTX_COLD when calling mtx_init. This is still somewhat hackish, but it is less evil than the mtx_f filler struct, and the mtx struct is now the same size with and without mutex debugging code. - Add some micro-micro-operation macros for doing the actual atomic operations on the mutex mtx_lock field to make it easier for other archs to override/optimize mutex ops if needed. These new tiny ops also clean up the code in some places by replacing long atomic operation function calls that spanned 2-3 lines with a short 1-line macro call. - Don't call mi_switch() from mtx_enter_hard() when we block while trying to obtain a sleep mutex. Calling mi_switch() would bogusly release Giant before switching to the next process. Instead, inline most of the code from mi_switch() in the mtx_enter_hard() function. Note that when we finally kill Giant we can back this out and go back to calling mi_switch().
67325	19-Oct-2000	dfr	Don't force bootverbose anymore.
67324	19-Oct-2000	dfr	Decrease the number of ticks between clock interrupts by a factor of ten to place more pressure on the exception handling code.
67323	19-Oct-2000	dfr	* Disable interrupts when restoring a trapframe. * Make sure we reset ar.k6 (used to hold the kernel stack pointer when we are returning to user mode after a syscall.
67247	17-Oct-2000	ps	Implement write combining for crashdumps. This is useful when write caching is disabled on both SCSI and IDE disks where large memory dumps could take up to an hour to complete. Taking an i386 scsi based system with 512MB of ram and timing (in seconds) how long it took to complete a dump, the following results were obtained: Before: After: WCE TIME WCE TIME ------------------ ------------------ 1 141.820972 1 15.600111 0 797.265072 0 65.480465 Obtained from: Yahoo! Reviewed by: peter
67213	16-Oct-2000	dfr	In pmap_remove_pv(), only manipulate the page's list if the pv is managed.
67212	16-Oct-2000	dfr	Do a full exception_restore after an execve syscall to ensure that the new program gets the right values for its arguments etc.
67211	16-Oct-2000	dfr	Clear the register stack frame before using loadrs to invalidate the stacked registers.
67210	16-Oct-2000	dfr	Clear ar.pfs for the child process in cpu_fork - switch_trampoline doesn't want a stack frame.
67201	16-Oct-2000	dfr	Track changes to trapframe.
67199	16-Oct-2000	dfr	* Correct some of my misunderstandings about how best to switch to the kernel backing store. * Implement syscalls via break instructions. * Fix backing store copying in cpu_fork() so that the child gets the right register values. This thing is actually starting to work now. This set of changes takes me up to the second execve (the one which runs the first shell). Next stop single-user mode :-).
67196	16-Oct-2000	dfr	Use the right mask for extracting sof from cr.ifs.
67195	16-Oct-2000	dfr	Remember to re-initialise cr.itm on clock interrupts so that we get more than just one tick.
67194	16-Oct-2000	dfr	Merge a fix from the alpha port - put softintr in the right place in the table.
67193	16-Oct-2000	dfr	Give names to app registers and control registers. Fix a typo handling mov from branch register instructions.
67032	12-Oct-2000	dfr	Implement a rudimentary interrupt handling system which should be good enough for clock interrupts in SKI.
67031	12-Oct-2000	dfr	Turn off a debugging printf.
67020	12-Oct-2000	dfr	* Fix exception handling so that it actually works. We can now handle exceptions from both kernel and user mode. * Fix context switching so that we can switch back to a proc which we switched away from (we were saving the state in the wrong place). * Implement lazy switching of the high-fp state. This needs to be looked at again for SMP to cope with the case of a process migrating from one processor to another while it has the high-fp state. * Make setregs() work properly. I still think this should be called cpu_exec() or something. * Various other minor fixes. With this lot, we can execve() /sbin/init and we get all the way up to its first syscall. At that point, we stop because syscall handling is not done yet.
67018	12-Oct-2000	dfr	Fix this so that it can cope with transfers to/from regions which are not physically contiguous.
67017	12-Oct-2000	dfr	* Allocate kernel stacks with contigmalloc() to make exception handling safe - we can't afford to take a TLB trap when we are writing a trapframe. Possibly revisit this later. * Various fixes to pmap_enter() so that it actually works properly.
67016	12-Oct-2000	dfr	Some minor fixes and simplifications.
66937	10-Oct-2000	dfr	* Add rudimentary DDB support (no kgdb, no backtrace, no single step). * Track recent changes to SWI code. * Allocate RIDs for pmaps (untested). * Implement assembler version of cpu_switch - its cleaner that way.
66633	04-Oct-2000	dfr	Next round of fixes to the ia64 code. This includes simulated clock and disk drivers along with a load of fixes to context switching, fork handling and a load of other stuff I can't remember now. This takes us as far as start_init() before it dies. I guess now I will have to finish off the VM system and syscall handling :-).
66486	30-Sep-2000	dfr	Next round of ia64 work, including fixes to context switching, implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these changes, my test kernel reaches the mountroot prompt.
66464	29-Sep-2000	dfr	Ansify and fix warnings.
66463	29-Sep-2000	dfr	Implement dirty and access bit exceptions.
66462	29-Sep-2000	dfr	Bodge the simplelocks in a way which works UP.
66460	29-Sep-2000	dfr	Use write-back instead of write-combining for region 7.
66458	29-Sep-2000	dfr	This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will not work on any real hardware (or fully work on any simulator). Much more needs to happen before this is actually functional but its nice to see the FreeBSD copyright message appear in the ia64 simulator.