Cross Reference: /freebsd-10.0-release/sys/kern/imgact

History log of /freebsd-10.0-release/sys/kern/imgact_elf.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 259065	07-Dec-2013	gjb	- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1 [1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.0-release /freebsd-10.0-release/sys/conf/newvers.sh /freebsd-10.0-release/sys/sys/param.h
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 255426	09-Sep-2013	jhb	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
# 253953	05-Aug-2013	attilio	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib
# 253939	04-Aug-2013	attilio	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho
# 250145	01-May-2013	trociny	Introduce a constant, ELF_NOTE_ROUNDSIZE, which evidently declare our intention to use 4-byte padding for elf notes. MFC after: 3 weeks
# 249558	16-Apr-2013	trociny	Add a new set of notes to a process core dump to store procstat data. The notes format is a header of sizeof(int), which stores the size of the corresponding data structure to provide some versioning, and data in the format as it is returned by a related sysctl call. The userland tools (procstat(1)) will be taught to extract this data, providing additional info for postmortem analysis. PR: kern/173723 Suggested by: jhb Discussed with: jhb, kib Reviewed by: jhb (initial version), kib MFC after: 1 month
# 249486	14-Apr-2013	trociny	Re-factor coredump routines. For each type of notes an output function is provided, which is used either to calculate the note size or output it to sbuf. On the first pass the notes are registered in a list and the resulting size is found, on the second pass the list is traversed outputing notes to sbuf. For the sbuf a drain routine is provided that writes data to a core file. The main goal of the change is to make coredump to write notes directly to the core file, without preliminary preparing them all in a memory buffer. Storing notes in memory is not a problem for the current, rather small, set of notes we write to the core, but it may becomes an issue when we start to store procstat notes. Reviewed by: jhb (initial version), kib Discussed with: jhb, kib MFC after: 3 weeks
# 249277	08-Apr-2013	attilio	Switch some "low-hanging fruit" to acquire read lock on vmobjects rather than write locks. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
# 249239	07-Apr-2013	trociny	Fill p_flags and p_align fields of the core dump note segement. Reviewed by: kib MFC after: 2 weeks
# 249238	07-Apr-2013	trociny	Use 4-byte padding for core dump notes on both 32 and 64bit archs. Although native word padding (i.e. 8-byte on 64bit arch) looks to be in agreement with standards, other parts of our code and other OSes use 4-byte alignment. This is not expected to change alignment for currently generated core dump notes, as the notes look to consist of structures with sizes multiple of 8 on 64-bit archs. But there are plans to add additional notes, where 4-byte vs 8-byte alignment makes difference. Discussed with: kib Reviewed by: kib MFC after: 2 weeks
# 248256	13-Mar-2013	tijl	- Fix two possible overflows when testing if ELF program headers are on the first page: 1. Cast uint16_t operands in a multiplication to unsigned int because otherwise the implicit promotion to int results in a signed multiplication that can overflow and the behaviour on integer overflow is undefined. 2. Replace (offset + size > PAGE_SIZE) with (size > PAGE_SIZE - offset) because the sum may overflow. - Use the same tests to see if the path to the interpreter is on the first page. There's no overflow here because size is already limited by MAXPATHLEN, but the compiler optimises the new tests better. Also fix an off-by-one error. - Simplify tests to see if an ELF note program header is on the first page. This also fixes an off-by-one error. Reviewed by: kib MFC after: 1 week
# 248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
# 246636	10-Feb-2013	kib	Remove the ia64-specific code fragment, which effect is more cleanly done by the call to trans_prot() function a line before. Discussed with: Oliver Pinter <oliver.pntr@gmail.com> MFC after: 1 week
# 241896	22-Oct-2012	kib	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 241025	28-Sep-2012	kib	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
# 238617	19-Jul-2012	kib	Fix several reads beyond the mapped first page of the binary in the ELF parser. Specifically, do not allow note reader and interpreter path comparision in the brandelf code to read past end of the page. This may happen if specially crafter ELF image is activated. Submitted by: Lukasz Wojcik <lukasz.wojcik zoho com> MFC after: 3 days
# 237433	22-Jun-2012	kib	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month
# 232828	11-Mar-2012	kib	ELF image can have several PT_NOTE program headers. Look for the ELF brand note in each header, instead of using only first one. Reviewed by: kan Tested by: andrew (arm), flo (sparc64) MFC after: 3 weeks
# 230767	30-Jan-2012	kib	Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported, and some variants of powerpc32. MFC after: 2 months (if ever)
# 230268	17-Jan-2012	alc	Explain why it is safe to unlock the vnode. Requested by: kib
# 230246	16-Jan-2012	alc	Improve abstraction. Eliminate direct access by elf_load_section() to an OBJT_VNODE-specific field of the vm object. The same information can be just as easily obtained from the struct vattr that is in struct image_params if the latter is passed to elf_load_section(). Moreover, by replacing the vmspace and vm object parameters to elf*_load_section() with a struct image_params parameter, we actually reduce the size of the object code. In collaboration with: kib
# 230132	15-Jan-2012	uqs	Convert files to UTF-8
# 226388	15-Oct-2011	kib	Control the execution permission of the readable segments for i386 binaries on the amd64 and ia64 with the sysctl, instead of unconditionally enabling it. Reviewed by: marcel
# 226342	13-Oct-2011	marcel	In elf32_trans_prot() and when compiling for amd64 or ia64, add PROT_EXECUTE when PROT_READ is needed. By default i386 allows execution when reading is allowed and JDK 1.4.x depends on that.
# 223825	06-Jul-2011	trasz	All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
# 223692	30-Jun-2011	jonathan	Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c. Approved by: mentor(rwatson), re(bz)
# 220373	05-Apr-2011	trasz	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
# 218195	02-Feb-2011	mdf	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week
# 217160	08-Jan-2011	kib	Use the same expression to report stack protection mode for AT_STACKEXEC as the expression used by exec_new_vmspace().
# 217152	08-Jan-2011	kib	In elf image activator, read and apply the stack protection mode from PT_GNU_STACK program header, if present and enabled. Two new sysctls are provided, kern.elf32.nxstack and kern.elf64.nxstack, that allow to enable PT_GNU_STACK for ABIs of specified bitsize, if ABI decided to support shared page. Inform rtld about access mode of the stack initial mapping by AT_STACKPROT aux vector. At the moment, the default is disabled, waiting for the usermode support bits.
# 217150	08-Jan-2011	kib	Collect code to translate between vm_prot_t and p_flags into helper functions. MFC after: 1 week
# 215679	22-Nov-2010	attilio	Add the ability for GDB to printout the thread name along with other thread specific informations. In order to do that, and in order to avoid KBI breakage with existing infrastructure the following semantic is implemented: - For live programs, a new member to the PT_LWPINFO is added (pl_tdname) - For cores, a new ELF note is added (NT_THRMISC) that can be used for storing thread specific, miscellaneous, informations. Right now it is just popluated with a thread name. GDB, then, retrieves the correct informations from the corefile via the BFD interface, as it groks the ELF notes and create appropriate pseudo-sections. Sponsored by: Sandvine Incorporated Tested by: gianni Discussed with: dim, kan, kib MFC after: 2 weeks
# 211412	17-Aug-2010	kib	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month
# 207416	30-Apr-2010	alfred	Don't leak core_buf or gzfile if doing a compressed core file and we hit an error condition. Obtained from: Juniper Networks
# 205643	25-Mar-2010	nwhitehorn	Add the ELF relocation base to struct image_params. This will be required to correctly relocate the executable entry point's function descriptor on powerpc64.
# 205641	25-Mar-2010	nwhitehorn	Change the way text_addr and data_addr are computed to use the executable status of segments instead of detecting the main text segment by which segment contains the program entry point. This affects obreak() and is required for correct operation of that function on 64-bit PowerPC systems. The previous behavior was apparently required only for the Alpha, which is no longer supported. Reviewed by: jhb Tested on: amd64, sparc64, powerpc
# 205014	11-Mar-2010	nwhitehorn	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
# 204737	04-Mar-2010	alfred	put calls to gzclose() under ifdef COMPRESS_USER_CORES to prevent undefined symbols on kernels without this option. Reported by: Alexander Best
# 204552	02-Mar-2010	alfred	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan
# 198202	18-Oct-2009	kib	If ET_DYN binary has non-zero base address for some reason, honour it and do not relocate the binary to ET_DYN_LOAD_ADDR. This allows for the binary author to influence address map of the process. In particular, when the binary is actually an interpeter, this allows to have almost usual process address map. Communicate the relocation bias of the mapping for interpeter-less ET_DYN binary, that is interperter itself, in AT_BASE aux entry. This way, rtld is able to find its dynamic structure and relocate itself. Note that mapbase in the rtld is still wrong and requires further fixing. Reported and tested by: rwatson Discussed with: kan MFC after: 3 days
# 197934	10-Oct-2009	kib	Map PIE binaries at non-zero base address. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time
# 197932	10-Oct-2009	kib	Do not map segments of zero length. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time
# 197726	03-Oct-2009	bz	Print a warning in case we cannot add more brandinfo because we would overflow the MAX_BRANDS sized array. Reviewed by: kib MFC After: 1 month
# 196653	30-Aug-2009	bz	Make sure FreeBSD binaries without .note.ABI-tag section work correctly and do not match a colliding Debian GNU/kFreeBSD brandinfo statements. For this mark the Debian GNU/kFreeBSD brandinfo that it must have an .note.ABI-tag section and ignore the old EI_OSABI brandinfo when comparing a possibly colliding set of options. Due to SYSINIT we add the brandinfo in a non-deterministic order, so native FreeBSD is not always first. We may want to consider to force native FreeBSD to come first as well. The only way a problem could currently be noticed is when running an i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD brandinfo was matched first, as the fallback to ld-elf32.so.1 does not exist in that case. Reported and tested by: ticso In collaboration with: kib MFC after: 3 days
# 196512	24-Aug-2009	bz	Fix handling of .note.ABI-tag section for GNU systems [1]. Handle GNU/Linux according to LSB Core Specification 4.0, Chapter 11. Object Format, 11.8. ABI note tag. Also check the first word of desc, not only name, according to glibc abi-tags specification to distinguish between Linux and kFreeBSD. Add explicit handling for Debian GNU/kFreeBSD, which runs on our kernels as well [2]. In {amd64,i386}/trap.c, when checking osrel of the current process, also check the ABI to not change the signal behaviour for Linux binary processes, now that we save an osrel version for all three from the lists above in struct proc [2]. These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD and Linux binaries on the same machine again for at least i386 and amd64, and no longer break kFreeBSD which was detected as GNU(/Linux). PR: kern/135468 Submitted by: dchagin [1] (initial patch) Suggested by: kib [2] Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD Reviewed by: kib MFC after: 3 days
# 190708	05-Apr-2009	dchagin	Fix KBI breakage by r190520 which affects older linux.ko binaries: 1) Move the new field (brand_note) to the end of the Brandinfo structure. 2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer is valid. 3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old modules won't have the flag set, so the new field brand_note would be ignored. Suggested by: jhb Reviewed by: jhb Approved by: kib (mentor) MFC after: 6 days
# 190264	22-Mar-2009	kib	Fix several issues with parsing the notes for ELF objects. Badly formed ELF note may cause the caclulated pointer to the next note to point both after the note region, that was checked in the code, but also to point before the region, that was not checked [1]. Remember the first note location in note0 and leap out if the note is not between note0 and note_end. In the similar way, badly formed note may cause infinite loop by pointing next note into the same or previous note. Guard against this by limiting amount of loop iterations by arbitrary choosen big number. For clarity, check the calculated note alignment in each iteration. Reported by: Chris Palmer <chris noncombatant org> [1] PR: kern/132886 Reviewed and tested by: dchagin MFC after: 3 days
# 189927	17-Mar-2009	kib	Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and compat32 binaries. Tested by: pho Reviewed by: kan
# 189919	17-Mar-2009	kib	Use the properly sized types for ELF object header and program headers. This fixes osrel fetching from the FreeBSD branding note for the 64bit platforms. Reported by: swell.k gmail com Reviewed by: dchagin Tested by: dchagin, swell.k gmail com
# 189771	13-Mar-2009	dchagin	Implement new way of branding ELF binaries by looking to a ".note.ABI-tag" section. The search order of a brand is changed, now first of all the ".note.ABI-tag" is looked through. Move code which fetch osreldate for ELF binary to check_note() handler. PR: 118473 Approved by: kib (mentor)
# 187686	25-Jan-2009	rwatson	When a statically linked binary is executed (or at least, one without an interpreter definition in its program header), set the auxiliary ELF argument AT_BASE to 0 rather than to the address that we would have mapped the interpreter at if there had been one. The ELF ABI specifications appear to be ambiguous as to the desired behavior in this situation, as they define AT_BASE as the base address of the interpreter, but do not mention what to do if there is none. On Solaris, AT_BASE will be set to the base address of the static binary if there is no interpreter, and on Linux, AT_BASE is set to 0. We go with the Linux semantics as they are of more immediate utility and allow the early runtime environment to know that the kernel has not mapped an interpreter, but because AT_PHDR points at the ELF header for the running binary, it is still possible to retrieve all required mapping information when the process starts should it be required. Either approach would be preferable to our current behavior of passing a pointer to an unmapped region of user memory as AT_BASE. MFC after: 3 weeks
# 186235	17-Dec-2008	peter	Remove sysctl debug.elf_trace and the trace field in auxargs. They go nowhere. It used to be the equivalent of $LD_DEBUG in rtld-elf. Elf_Auxargs is an internal structure.
# 186233	17-Dec-2008	imp	Minor style(9) nit.
# 186225	17-Dec-2008	kib	Remove two remnant uses of AT_DEBUG.
# 183694	08-Oct-2008	kib	If the ABI-overriden interpreter was not loaded, do not set have_interp to TRUE. This allows the code in image activator to try /libexec/ld-elf.so.1 as interpreter when newinterp is not found to execute. Reviewed by: peter MFC after: 2 weeks (together with r175105)
# 179008	15-May-2008	jhb	Go back to using the process command name (p_comm) for the file name and command line arguments stored in the note at the beginning of a core dump instead of the current thread name. Reviewed by: julian
# 177091	12-Mar-2008	jeff	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 175294	13-Jan-2008	attilio	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# 175202	09-Jan-2008	attilio	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# 175105	05-Jan-2008	peter	Fall back to the binary-specified interpreter (ld-elf.so.1) if the ABI override binary isn't found. This could probably be smoother, but it is what I did in p4 change #126891 on 2007/09/27. It should solve the "ld-elf32.so.1"-in-chroot problem.
# 174253	04-Dec-2007	kib	Implement fetching of the __FreeBSD_version from the ELF ABI-tag note. The value is read into the p_osrel member of the struct proc. p_osrel is set to 0 for the binaries without the note. MFC after: 3 days
# 174252	04-Dec-2007	kib	Check for the program headers alignment of the ELF images before dereferencing. Unaligned access could cause panic on strict alignment architectures. Reviewed by: marcel, marius (also tested on sparc64, thanks !) MFC after: 3 days
# 173601	14-Nov-2007	julian	A bunch more files that should probably print out a thread name instead of a process name.
# 173361	05-Nov-2007	kib	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
# 169565	14-May-2007	jhb	Rework the support for ABIs to override resource limits (used by 32-bit processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week
# 166073	17-Jan-2007	delphij	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.
# 164418	19-Nov-2006	alc	Add vm map and object locking to each_writable_segment(). Noticed by: jhb@ MFC after: 3 weeks
# 154651	21-Jan-2006	alc	Avoid a vm object reference leak in a rarely used code path. An executable contains at most one PT_INTERP program header. Therefore, the loop that searches for it can terminate after it is found rather than iterating over the entire set of program headers. Eliminate an unneeded initialization. Reviewed by: tegge
# 153743	26-Dec-2005	sobomax	Fix breakage introduced in the previous commit.
# 153741	26-Dec-2005	sobomax	Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually allow executing elf dynamic binaries (aka shared libraries). When it is requested to execute ET_DYN elf image check if this flag is on after we know the elf brand allowing execution if so. PR: kern/87615 Submitted by: Marcin Koziej <creep@desk.pl>
# 153698	24-Dec-2005	alc	Maintain the lock on the vnode for most of exec_elfN_imgact(). Specifically, it is required for the I/O that may be performed by elfN_load_section(). Avoid an obscure deadlock in the a.out, elf, and gzip image activators. Add a comment describing why the deadlock does not occur in the common case and how it might occur in less usual circumstances. Eliminate an unused variable from exec_aout_imgact(). In collaboration with: tegge
# 153620	21-Dec-2005	alc	Maintain the vnode lock throughout elfN_load_file() rather than releasing it and reacquiring it in vrele(). Consequently, there is no reason to increase the reference count on the vm object caching the file's pages. Reviewed by: tegge Eliminate unused parameters to elfN_load_file().
# 153585	20-Dec-2005	alc	Eliminate an unneeded (vm_prot_t) parameter from two functions. Eliminate unnecessary uses of a local variable. Reviewed by: tegge
# 153499	17-Dec-2005	alc	Correct a long-standing problem in elfN_map_insert(): In order to copy a page to user space, the user space mapping must allow write access. In collaboration with: tegge@ MFC after: 3 weeks
# 153487	16-Dec-2005	alc	Style: The second argument to vm_map_find() should be NULL instead of 0.
# 153485	16-Dec-2005	alc	Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors. Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks
# 152436	14-Nov-2005	cognet	Add a new sysctl, kern.elf[32\|64].can_exec_dyn. When set to 1, one can execute a ET_DYN binary (shared object). This does not make much sense, but some linux scripts expect to be able to execute /lib/ld-linux.so.2 (ldd comes to mind). The sysctl defaults to 0. MFC after: 3 days
# 150663	28-Sep-2005	rwatson	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde
# 150335	19-Sep-2005	rwatson	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week
# 150164	15-Sep-2005	csjp	Improve the MP safeness associated with the creation of symbolic links and the execution of ELF binaries. Two problems were found: 1) The link path wasn't tagged as being MP safe and thus was not properly protected. 2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was insufficiently protected. This commit makes the following changes: -Sets the MPSAFE flag in NDINIT for symbolic link paths -Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been picked up. -Drop in an assertion into vfs_lookup which ensures that if the MPSAFE flag is NOT set, that we have picked up giant. If not panic (if WITNESS compiled into the kernel). This should help us find conditions where vnode operations are in-sufficiently protected. This is a RELENG_6 candidate. Discussed with: jeff MFC after: 4 days
# 147692	30-Jun-2005	peter	Jumbo-commit to enhance 32 bit application support on 64 bit kernels. This is good enough to be able to run a RELENG_4 gdb binary against a RELENG_4 application, along with various other tools (eg: 4.x gcore). We use this at work. ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace, procfs and core dumps. procfs_regs.c: vary the format of proc/XXX/regs depending on the client and target application. procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their sscanf fails. They expect an unsigned long. imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps. sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note that 64 bit consumers can still debug 32 bit targets. IA64 has got stubs for ia32_reg.c. Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't implemented in the 32/64 wrapper yet. We also make a tiny patch to gdb pacify it over conflicting formats of ld-elf.so.1. Approved by: re
# 146598	24-May-2005	cognet	Don't set the default of kern.fallback_elf_brand to FreeBSD for arm, as binutils now do the job for us
# 145819	03-May-2005	jeff	- Neither of our image formats require Giant now that the vm and vfs have been locked.
# 144577	03-Apr-2005	alc	Remove GIANT_REQUIRED from elfN_load_section().
# 140992	29-Jan-2005	sobomax	o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
# 140782	24-Jan-2005	phk	Don't use VOP_GETVOBJECT, use vp->v_object directly.
# 135687	23-Sep-2004	cognet	On arm, set the default elf brand to FreeBSD, until the binutils do it for us.
# 133464	11-Aug-2004	marcel	Add __elfN(dump_thread). This function is called from __elfN(coredump) to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes. Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc
# 133323	08-Aug-2004	dfr	Make sure that AT_PHDR has a useful value even for static programs.
# 132364	18-Jul-2004	marcel	After maintaining previous behaviour in writing out the core notes, it's time now to break with the past: do not write the PID in the first note. Rationale: 1. [impact of the breakage] Process IDs in core files serve no immediate purpose to the debugger itself. They are only useful to relate a core file to a process. This can provide context to the person looking at the core file, provided one keeps track of this. Overall, not having the PID in the core file is only in very rare occasions unfortunate. 2. [reason of the breakage] Having one PRSTATUS note contain the PID, while all others contain the LWPID of the corresponding kernel thread creates an irregularity for the debugger that cannot easily be worked around. This is caused by libthread_db correlating user thread IDs to kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs. Update comments accordingly.
# 131149	26-Jun-2004	marcel	Allocate TIDs in thread_init() and deallocate them in thread_fini(). The overhead of unconditionally allocating TIDs (and likewise, unconditionally deallocating them), is amortized across multiple thread creations by the way UMA makes it possible to have type-stable storage. Previously the cost was kept down by having threads created as part of a fork operation use the process' PID as the TID. While this had some nice properties, it also introduced complexity in the way TIDs were allocated. Most importantly, by using the type-stable storage that UMA gives us this was also unnecessary. This change affects how core dumps are created and in particular how the PRSTATUS notes are dumped. Since we don't have a thread with a TID equalling the PID, we now need a different way to preserve the old and previous behavior. We do this by having the given thread (i.e. the thread passed to the core dump code in td) dump it's state first and fill in pr_pid with the actual PID. All other threads will have pr_pid contain their TIDs. The upshot of all this is that the debugger will now likely select the right LWP (=TID) as the initial thread. Credits to: julian@ for spotting how we can utilize UMA. Thanks to: all who provided julian@ with test results.
# 130101	05-Jun-2004	tjr	Change the types of vn_rdwr_inchunks()'s len and aresid arguments to size_t and size_t *, respectively. Update callers for the new interface. This is a better fix for overflows that occurred when dumping segments larger than 2GB to core files.
# 130100	05-Jun-2004	tjr	Back out workaround for vn_rdwr_inchunks()'s INT_MAX length limitation after discussions with bde; vn_rdwr_inchunks() itself should be fixed.
# 130053	04-Jun-2004	tjr	Write segments to core dump files in maximally-sized chunks that neither exceed vn_rdwr_inchunks()'s INT_MAX length limitation nor span a block boundary. This fixes dumping segments larger than 2GB. PR: 67546
# 128568	23-Apr-2004	alc	Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
# 128029	08-Apr-2004	marcel	Do not assume that the initial thread (i.e. the thread with the ID equal to the process ID) is still present when we dump a core. It already may have been destroyed. In that case we would end up dereferencing a NULL pointer, so specifically test for that as well. Reported & tested by: Dan Nelson <dnelson@allantgroup.com>
# 127802	03-Apr-2004	marcel	Create NT_PRSTATUS and NT_FPREGSET notes for each and every thread in the process. This is required for proper debugging of corefiles created by 1:1 or M:N threaded processes. Add an XXX comment where we should actually call a function that dumps MD specific notes. An example of a MD specific note is the NT_PRXFPREG note for SSE registers. Since BFD creates non-annotated pseudo-sections for the first PRSTATUS and FPREGSET notes (non-annotated in the sense that the name of the section does not contain the pid/tid), make sure those sections describe the initial thread of the process (i.e. the thread which tid equals the pid). This is not strictly necessary, but makes sure that tools that use the non-annotated section names will not change behaviour due to this change. The practical upshot of this all is that one can see the threads in the debugger when looking at a corefile. For 1:1 threading this means that all threads are visible.
# 127172	18-Mar-2004	nectar	Verify more bits of the ELF header: the program header table entry size and the ELF version. Also, avoid a potential integer overflow when determining whether the ELF header fits entirely within the first page. Reviewed by: jdp A panic when attempting to execute an ELF binary with a bogus program header table entry size was Reported by: Christer Öberg <christer.oberg@texonet.com>
# 125454	04-Feb-2004	jhb	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
# 123743	23-Dec-2003	peter	Forced commit; previous commit also included: - eliminate a malloc()/snprintf()/free() in the native exec(2) case and in the easy emulation environments. - Allow the brand emul_path (ie: /compat/xxx) to be NULL rather than needing it to be an empty string that is always referenced.
# 123742	23-Dec-2003	peter	Add an additional field to the elf brandinfo structure to support quicker exec-time replacement of the elf interpreter on an emulation environment where an entire /compat/* tree isn't really warranted.
# 120422	24-Sep-2003	peter	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
# 116182	10-Jun-2003	obrien	Use __FBSDID().
# 115524	31-May-2003	marcel	Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage on the stack to be changed in a way incompatible with elf32_map_insert() where we used data_buf without initializing it for when the partial mapping resulting in a misaligned image (typical when the page size implied by the image is not the same as the page size in use by the kernel). Since data_buf is passed by reference to vm_map_find(), the compiler cannot warn about it. While here, move all local variables to the top of the function.
# 111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
# 109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 108696	05-Jan-2003	jake	- Provide backwards compatibility for kern.fallback_elf_brand. - Use the generic elf type macros in imgact_elf.h instead of ifdefing the entire contents of the header.
# 108685	04-Jan-2003	jake	Improve the way that an elf image activator for an alternate word size is included in the kernel. Include imgact_elf.c in conf/files, instead of both imgact_elf32.c and imgact_elf64.c, which will use the default word size for an architecture as defined in machine/elf.h. Architectures that wish to build an additional image activator for an alternate word size can include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which allows it to be dependent on MD options instead of solely on architecture. Glanced at by: peter
# 108148	20-Dec-2002	marcel	Fix multiple registration of the elf_legacy_coredump sysctl variable. The duplication is caused by the fact that imgact_elf.c is included by both imgact_elf32.c and imgact_elf64.c and both are compiled by default on ia64. Consequently, we have two seperate copies of the elf_legacy_coredump variable due to them being declared static, and two entries for the same sysctl in the linker set, both referencing the unique copy of the elf_legacy_coredump variable. Since the second sysctl cannot be registered, one of the elf_legacy_coredump variables can not be tuned (if ordering still holds, it's the ELF64 related one). The only solution is to create two different sysctl variables, just like the elf<32\|64>_trace sysctl variables. This unfortunately is an (user) interface change, but unavoidable. Thus, on ELF32 platforms the sysctl variable is called elf32_legacy_coredump and on ELF64 platforms it is called elf64_legacy_coredump. Platforms that have both ELF formats have both sysctl variables. These variables should probably be retired sooner rather than later.
# 107948	16-Dec-2002	dillon	Change the way ELF coredumps are handled. Instead of unconditionally skipping read-only pages, which can result in valuable non-text-related data not getting dumped, the ELF loader and the dynamic loader now mark read-only text pages NOCORE and the coredump code only checks (primarily) for complete inaccessibility of the page or NOCORE being set. Certain applications which map large amounts of read-only data will produce much larger cores. A new sysctl has been added, debug.elf_legacy_coredump, which will revert to the old behavior. This commit represents collaborative work by all parties involved. The PR contains a program demonstrating the problem. PR: kern/45994 Submitted by: "Peter Edwards" <pmedwards@eircom.net>, Archie Cobbs <archie@dellroad.org> Reviewed by: jdp, dillon MFC after: 7 days
# 106660	08-Nov-2002	rwatson	Assign value of NULL to imgp->execlabel when imgp is initialized in the ELF code. Missed in earlier merge from the MAC tree. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 106437	04-Nov-2002	rwatson	Remove reference to struct execve_args from struct imgact, which describes an image activation instance. Instead, make use of the existing fname structure entry, and introduce two new entries, userspace_argv, and userspace_envv. With the addition of mac_execve(), this divorces the image structure from the specifics of the execve() system call, removes a redundant pointer, etc. No semantic change from current behavior, but it means that the structure doesn't depend on syscalls.master-generated includes. There seems to be some redundant initialization of imgact entries, which I have maintained, but which could probably use some cleaning up at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 105755	22-Oct-2002	kan	Handle binaries with arbitrary number PT_LOAD sections, not only ones with one text and one data section. The text and data rlimit checks still needs to be fixed to properly accout for additional sections. Reviewed by: peter (slightly different patch version)
# 105354	17-Oct-2002	robert	Use strlcpy() instead of strncpy() to copy NUL terminated strings for safety and consistency.
# 103767	21-Sep-2002	jake	Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
# 103087	08-Sep-2002	peter	Do not blow up when we walk off the end of the brands list. Found by: kris, jake
# 102922	04-Sep-2002	dillon	Alright, fix the problems with the elf loader for the Alpha. It turns out that there is no easy way to discern the difference between a text segment and a data segment through the read-only OR execute attribute in the elf segment header, so revert the algorithm to what it was before. Neither can we account for multiple data load segments in the vmspace structure (at least not without more work), due to assumptions obreak() makes in regards to the data start and data size fields. Retain RLIMIT_VMEM checking by using a local variable to track the total bytes of data being loaded. Reviewed by: peter X-MFC after: ASAP
# 102913	03-Sep-2002	peter	Make the text segment locating heuristics from rev 1.121 more reliable so that it works on the Alpha. This defines the segment that the entry point exists in as 'text' and any others (usually one) as data. Submitted by: tmm Tested on: i386, alpha
# 102857	02-Sep-2002	dillon	Grammer cleanup
# 102836	02-Sep-2002	jake	Moved elf brand identification into a function. Fully identify the brand early in the process of loading an elf file, so that we can identify the sysentvec, and so that we do not continue if we do not have a brand (and thus a sysentvec). Use the values in the sysentvec for the page size and vm ranges unconditionally, since they are all filled in now.
# 102832	02-Sep-2002	jake	Fixed more indentation bugs.
# 102630	30-Aug-2002	dillon	Implement data, text, and vmem limit checking in the elf loader and svr4 compat code. Clean up accounting for multiple segments. Part 1/2. Submitted by: Andrey Alekseyev <uitm@zenon.net> (with some modifications) MFC after: 3 days
# 102424	25-Aug-2002	jake	Fixed most indentation bugs.
# 102423	25-Aug-2002	jake	Fixed placement of operators. Wrapped long lines.
# 102381	24-Aug-2002	jake	Fixed white space around operators, casts and reserved words. Reviewed by: md5
# 102377	24-Aug-2002	jake	return x; -> return (x); return(x); -> return (x); Reviewed by: md5
# 101941	15-Aug-2002	rwatson	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 101771	13-Aug-2002	jeff	- Hold the vnode lock throughout execve. - Set VV_TEXT in the top level execve code. - Fixup the image activators to deal with the newly locked vnode.
# 101308	04-Aug-2002	jeff	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
# 100384	20-Jul-2002	peter	Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable handler in the kernel at the same time. Also, allow for the exec_new_vmspace() code to build a different sized vmspace depending on the executable environment. This is a big help for execing i386 binaries on ia64. The ELF exec code grows the ability to map partial pages when there is a page size difference, eg: emulating 4K pages on 8K or 16K hardware pages. Flesh out the i386 emulation support for ia64. At this point, the only binary that I know of that fails is cvsup, because the cvsup runtime tries to execute code in pages not marked executable. Obtained from: dfr (mostly, many tweaks from me).
# 99487	06-Jul-2002	jeff	Clean up execve locking: - Grab the vnode object early in exec when we still have the vnode lock. - Cache the object in the image_params. - Make use of the cached object in imgact_*.c
# 97748	02-Jun-2002	schweikh	Fix typo in the BSD copyright: s/withough/without/ Spotted and suggested by: des MFC after: 3 weeks
# 92723	19-Mar-2002	alfred	Remove __P.
# 91406	27-Feb-2002	jhb	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
# 88021	16-Dec-2001	mp	Remove whitespace at end of line.
# 84783	10-Oct-2001	ps	Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable. Reviewed by: peter MFC after: 2 weeks
# 83959	26-Sep-2001	dillon	Make uio_yield() a global. Call uio_yield() between chunks in vn_rdwr_inchunks(), allowing other processes to gain an exclusive lock on the vnode. Specifically: directory scanning, to avoid a race to the root directory, and multiple child processes coring simultaniously so they can figure out that some other core'ing child has an exclusive adv lock and just exit instead. This completely fixes performance problems when large programs core. You can have hundreds of copies (forked children) of the same binary core all at once and not notice. MFC after: 3 days
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 83239	09-Sep-2001	dillon	The basis for the recent coredump commit had the wrong attribution. The new attribution is below. Submitted by: peter, ps
# 83222	08-Sep-2001	dillon	This brings in a Yahoo coredump patch from Paul, with additional mods by me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that if you have large process images core dumping, or you have a large number of forked processes all core dumping at the same time, the original coredump code would leave the vnode locked throughout. This can cause the directory vnode to get locked up, which can cause the parent directory vnode to get locked up, and so on all the way to the root node, locking the entire machine up for extremely long periods of time. This patch solves the problem in two ways. First it uses an advisory non-blocking lock to abort multiple processes trying to core to the same file. Second (my contribution) it chunks up the writes and uses bwillwrite() to avoid holding the vnode locked while blocking in the buffer cache. Submitted by: ps Reviewed by: dillon MFC after: 2 weeks
# 82789	02-Sep-2001	peter	For ia64, set the default elf brand to be FreeBSD. This is temporarily necessary only for as long as we're using a linux toolchain.
# 82477	28-Aug-2001	brian	OR M_WAITOK with M_ZERO in malloc()s args for clarity.
# 81881	18-Aug-2001	mp	Unbreak linux compatibility by providing the correct length of the buffer. Reported by: "Pierre Y. Dampure" <pierre.dampure@westmarsh.com>, "Niels Chr. Bank-Pedersen" <ncbp@bank-pedersen.dk> Pointy hat to: mp
# 81799	16-Aug-2001	peter	Don't explicitly null-terminate. The buffer we are copying into is already zeroed, and we explicitly leave the last byte untouched. Submitted by: bde
# 81781	16-Aug-2001	mp	Reduce stack allocation (stack-fast?). elf_load_file() => 352 to 52 bytes exec_elf_imgact() => 1072 to 48 bytes elf_corehdr() => 396 to 8 bytes Reviewed by: julian
# 81757	16-Aug-2001	peter	Use explicit sizes for the prpsinfo command length string so that we dont have any more unexpected changes in core dumps. This gets us back to the original core dump layout from a few days ago.
# 79224	04-Jul-2001	dillon	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
# 77075	23-May-2001	jhb	Lock the VM while twiddling the vmspace.
# 76827	18-May-2001	alfred	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
# 74927	28-Mar-2001	jhb	Convert the allproc and proctree locks from lockmgr locks to sx locks.
# 74914	28-Mar-2001	jhb	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
# 73509	04-Mar-2001	obrien	Do not set a default ELF syscall ABI fallback. If one runs an un-branded Linux static binary that calls Linux's fcntl the machine will reboot when interupted by the FreeBSD syscall ABI.
# 72999	24-Feb-2001	obrien	MFS: bring the consistent `compat_3_brand' support into -CURRENT (the work was first done in the RELENG_4 branch near a release during a MFC to make the code cleaner and more consistent)
# 72200	09-Feb-2001	bmilekic	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 71699	26-Jan-2001	jhb	Back out proc locking to protect p_ucred for obtaining additional references along with the actual obtaining of additional references.
# 71497	23-Jan-2001	jhb	Proc locking.
# 69947	12-Dec-2000	jake	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
# 69022	22-Nov-2000	jake	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
# 68520	09-Nov-2000	marcel	Make MINSIGSTKSZ machine dependent, and have the sigaltstack syscall compare against a variable sv_minsigstksz in struct sysentvec as to properly take the size of the machine- and ABI dependent struct sigframe into account. The SVR4 and iBCS2 modules continue to have a minsigstksz of 8192 to preserve behavior. The real values (if different) are not known at this time. Other ABI modules use the real values. The native MINSIGSTKSZ is now defined as follows: Arch MINSIGSTKSZ ---- ----------- alpha 4096 i386 2048 ia64 12288 Reviewed by: mjacob Suggested by: bde
# 68356	05-Nov-2000	obrien	ELF kernels should use an ELF sysvec. This allows us to move a.out specific files to those platforms that acutally support a.out.
# 67365	20-Oct-2000	jhb	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# 66615	03-Oct-2000	jasone	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
# 65770	12-Sep-2000	bp	Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
# 65687	10-Sep-2000	dfr	Move the include of <sys/systm.h> so that KTR gets a declaration for snprintf().
# 63784	23-Jul-2000	green	Using an atomic operation here won't help if nobody else uses them (for this). Use the simple_lock() on v_interlock like elsewhere.
# 63769	23-Jul-2000	green	Clarification (forced commit): The immutability flag referred to in the previous revision is actually VTEXT, not VEXEC.
# 63768	23-Jul-2000	green	Solve the problem where it is possible to get the kernel stuck in a loop down in pmap_init_pt(). A subtraction causes the number of pages to become negative, that was assigned to an unsigned variable, and there is a lot of iteration. The bug is due to the ELF image activator not properly checking for its files being the correct size as specified by the ELF header. The solution is to check that the header doesn't ask for part of a file when that part of the file doesn't exist. Make sure to set VEXEC at the proper times to make the executables immutable (remove race conditions). Also, the ELF format specifiies header entries that allow embedding of other executables (hence how ld-elf.so.1 gets loaded, but not the same as loading shared libraries), so those executables need to be set VEXEC, too, so they're immutable. Reviewed by: peter
# 59794	30-Apr-2000	phk	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
# 59342	18-Apr-2000	obrien	Change our ELF binary branding to something more acceptable to the Binutils maintainers. After we established our branding method of writing upto 8 characters of the OS name into the ELF header in the padding; the Binutils maintainers and/or SCO (as USL) decided that instead the ELF header should grow two new fields -- EI_OSABI and EI_ABIVERSION. Each of these are an 8-bit unsigned integer. SCO has assigned official values for the EI_OSABI field. In addition to this, the Binutils maintainers and NetBSD decided that a better ELF branding method was to include ABI information in a ".note" ELF section. With this set of changes, we will now create ELF binaries branded using both "official" methods. Due to the complexity of adding a section to a binary, binaries branded with ``brandelf'' will only brand using the EI_OSABI method. Also due to the complexity of pulling a section out of an ELF file vs. poking around in the ELF header, our image activator only looks at the EI_OSABI header field. Note that a new kernel can still properly load old binaries except for Linux static binaries branded in our old method. * * For a short period of time, ``ld'' will also brand ELF binaries * using our old method. This is so people can still use kernel.old * with a new world. This support will be removed before 5.0-RELEASE, * and may not last anywhere upto the actual release. My expiration * time for this is about 6mo. *
# 57552	28-Feb-2000	ps	Update a comment in elf_coredump to reflect that if you madvise with MADV_NOCORE, its address space is also excluded from a core file. Pointed out by: alc
# 57550	28-Feb-2000	ps	Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2). This This feature allows you to specify if mmap'd data is included in an application's corefile. Change the type of eflags in struct vm_map_entry from u_char to vm_eflags_t (an unsigned int). Reviewed by: dillon,jdp,alfred Approved by: jkh
# 55141	27-Dec-1999	bde	Changed the type used to represent the user stack pointer from `long ' to `register_t '. This fixes bugs like misplacement of argc and argv on the user stack on i386's with 64-bit longs. We still use longs to represent "words" like argc and argv, and assume that they are on the stack (and that there is stack). The suword() and fuword() families should also use register_t.
# 54655	15-Dec-1999	eivind	Introduce NDFREE (and remove VOP_ABORTOP)
# 53503	21-Nov-1999	phk	s/p_cred->pc_ucred/p_ucred/g
# 53446	20-Nov-1999	bp	Vnode was left referenced in the case if ELF image is broken. Reviewed by: Peter Wemm <peter@netplex.com.au>
# 53212	16-Nov-1999	phk	This is a partial commit of the patch from PR 14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
# 52635	29-Oct-1999	phk	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.
# 52128	11-Oct-1999	peter	Trim unused options (or #ifdef for undoc options). Submitted by: phk
# 50717	31-Aug-1999	julian	General cleanup of core-dumping code. Submitted by: Sean Fagan,
# 50477	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 50415	26-Aug-1999	dima	Don't follow symlinks on coredumps. Reviewed by: dillon && security-officer
# 48718	09-Jul-1999	peter	Fix the previous warning a different way since the emul_path exposure was intentional. Avoid the warning by propagating the const filename through to elf_load_file() instead.
# 48716	09-Jul-1999	peter	Minor tweak - don't cause a warning. I don't know if it was intentional or not, but it would have printed out: /compat/linux/foo/bar.so: interpreter not found If it was, then I've broken it. De-constifying the 'interp' variable or carrying the constness through to elf_load_file() are alternatives.
# 48594	05-Jul-1999	marcel	Also try to load the interpreter without prepending "emul_path". This allows dynamicly linked binaries to run in a chroot'd environment with "emul_path" as the new root. The new behavior of loading interpreters is identical to the principle of overlaying. PR: 10145
# 47258	16-May-1999	alc	Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert, eliminating the need for the pmap_object_init_pt calls in imgact_* and mmap. Reviewed by: David Greenman <dg@root.com>
# 47207	14-May-1999	alc	Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option. It never makes sense to specify MAP_COPY_NEEDED without also specifying MAP_COPY_ON_WRITE, and vice versa. Thus, MAP_COPY_ON_WRITE suffices. Reviewed by: David Greenman <dg@root.com>
# 46803	09-May-1999	peter	Fix a couple of warnings and some bitrot in comments.
# 44176	20-Feb-1999	jdp	If you merge this into -stable, please increment __FreeBSD_version in "src/sys/sys/param.h". Fix the ELF image activator so that it can handle dynamic linkers which are executables linked at a fixed address. This improves compliance with the ABI spec, and it opens the door to possibly better dynamic linker performance in the future. I've experimented a bit with a fixed-address dynamic linker, and it works fine. But I don't have any measurements yet to determine whether it's worthwhile. Also, remove a few calculations that were never used for anything. I will increment __FreeBSD_version, since this adds a new capability to the kernel that the dynamic linker might some day rely upon.
# 44146	19-Feb-1999	luoqi	Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper. Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
# 43750	07-Feb-1999	jdp	Change the load address of the ELF dynamic linker from "2L*MAXDSIZ" to an architecture-specific value defined in <machine/elf.h>. This solves problems on large-memory systems that have a high value for MAXDSIZ. The load address is controlled by a new macro ELF_RTLD_ADDR(vmspace). On the i386 it is hard-wired to 0x08000000, which is the standard SVR4 location for the dynamic linker. On the Alpha, the dynamic linker is loaded MAXDSIZ bytes beyond the start of the program's data segment. This is the same place a userland mmap(0, ...) call would put it, so it ends up just below all the shared libraries. The rationale behind the calculation is that it allows room for the data segment to grow to its maximum possible size. These changes have been tested on the i386 for several months without problems. They have been tested on the Alpha as well, though not for nearly as long. I would like to merge the changes into 3.1 within a week if no problems have surfaced as a result of them.
# 43748	07-Feb-1999	dillon	Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to attempt to optimize forks but were essentially given-up on due to problems and replaced with an explicit dup of the vm_map_entry structure. Prior to the removal, they were entirely unused.
# 43687	05-Feb-1999	jdp	Correct an "&" operator which should have been "&&". Submitted by: mjacob
# 43632	05-Feb-1999	newton	Additional note on last rev: The rationale for this is to allow you to run Solaris executables (or executables from any other ELF system) directly off the CD-ROM without having to waste megabytes of disk by copying them to another filesystem just to brand them.
# 43631	05-Feb-1999	newton	Created sysctl kern.fallback_elf_brand. Defaults to "none", which will give the same behaviour produced before today. If sysadmin sets it to a valid ELF brand, ELF image activator will attempt to run unbranded ELF exectutables as if they were branded with that value. Suggested by: Dima Ruban <dima@best.net>
# 43596	04-Feb-1999	newton	Provide elf_brand_inuse() as a method an emulator can use to find out whether it is currently in use (which is kinda useful when it's about to unload itself: Lockups are never very much fun, are they?).
# 43402	29-Jan-1999	dillon	*_execsw static structures cannot be const due to the way they interact with EXEC_SET, DECLARE_MODULE, and module_register. Specifically, module_register. We may eventually be able to make these const, but not now.
# 43301	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# 43208	26-Jan-1999	julian	Enable Linux threads support by default. This takes the conditionals out of the code that has been tested by various people for a while. ps and friends (libkvm) will need a recompile as some proc structure changes are made. Submitted by: "Richard Seaman, Jr." <dick@tar.com>
# 41931	19-Dec-1998	julian	Reviewed by: Luoqi Chen, Jordan Hubbard Submitted by: "Richard Seaman, Jr." <lists@tar.com> Obtained from: linux :-) Code to allow Linux Threads to run under FreeBSD. By default not enabled This code is dependent on the conditional COMPAT_LINUX_THREADS (suggested by Garret) This is not yet a 'real' option but will be within some number of hours.
# 41514	04-Dec-1998	archie	Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc. These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>
# 40648	25-Oct-1998	phk	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.
# 40514	18-Oct-1998	peter	Some cleanups and optimizations: - Use the system headers method for Elf32/Elf64 symbol compatability - get rid of the UPRINTF debugging. - check the ELF header for compatability much more completely - optimize the section mapper. Use the same direct VM interfaces that imgact_aout.c and kern_exec.c use. - Check the return codes from the vm_* functions better. Some return KERN_* results, not an errno. - prefault the page tables to reduce startup faults on page tables like a.out does. - reset the segment protection to zero for each loop, otherwise each segment could get progressively more privs. (eg: if the first was read/write/execute, and the second was meant to be read/execute, the bug would make the second r/w/x too. In practice this was not a problem because executables are normally laid out with text first.) - Don't impose arbitary limits. Use the limits on headers imposed by the need to fit them into one page. - Remove unused switch() cases now that the verbose debugging is gone. I've been using an earlier version of this for a month or so. This sped up ELF exec speed a bit for me but I found it hard to get consistant benchmarks when I tested it last (a few weeks ago). I'm still bothered by the page read out of order caused by the transition from data to bss. This which requires either part filling the transition page or clearing the remainder.
# 40435	16-Oct-1998	peter	gulp. Jordan specifically OK'ed this.. This is the bulk of the support for doing kld modules. Two linker_sets were replaced by SYSINIT()'s. VFS's and exec handlers are self registered. kld is now a superset of lkm. I have converted most of them, they will follow as a seperate commit as samples. This all still works as a static a.out kernel using LKM's.
# 40376	15-Oct-1998	dfr	Don't frob the user stack directly, use suword instead. This fixes the elf_freebsd_fixup() panic which many people have noticed on the alpha.
# 40286	13-Oct-1998	dg	Fixed two potentially serious classes of bugs: 1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.
# 40235	11-Oct-1998	jdp	If an ELF executable has a recognized brand, then believe it. Formerly, the heuristic involving the interpreter path took precedence. Also, print a better error message if the brand is missing or not recognized. If there is no brand at all, give the user a hint that "brandelf" needs to be run.
# 39910	03-Oct-1998	jdp	Fix a bug which caused the dynamic linker pathname in the PT_INTERP program header entry to be ignored if a recognized brand was found.
# 39320	16-Sep-1998	jdp	Restore the core-dumping of all writable segments for ELF executables, minus the NULL pointer dereference in rev. 1.33. Also simplify things somewhat by eliminating one traversal of the VM map entries. Finally, eliminate calls to vm_map_{un,}lock_read() which aren't needed here. I originally took them from procfs_map.c, but here we know we are dealing only with the map of the current process.
# 39313	15-Sep-1998	jdp	Erk. Revert back to 1.31, dumping only data and stack to the core file, until I can solve a panic that has just cropped up.
# 39311	15-Sep-1998	jdp	When choosing segments to write to the core file, don't assume that writable implies readable.
# 39309	15-Sep-1998	jdp	Instead of just the data and stack segments, include all writable segments (except memory-mapped devices) in the ELF core file. This is really nice. You get access to the data areas of all shared libraries, and even to files that are mapped read-write. In the future, it might be good to add a new resource limit in the spirit of RLIMIT_CORE. It would specify the maximum sized writable segment to include in core dumps. Segments larger than that would be omitted. This would be useful for programs that map very large files read/write but that still would like to get usable core dumps.
# 39198	14-Sep-1998	jdp	Viola! The kernel now generates standard ELF core dumps for ELF executables. Currently only data and stack are included in the core dumps. I am looking into adding the other (mmapped) writable segments as well.
# 39154	14-Sep-1998	jdp	Add provisions for variant core dump file formats, depending on the object format of the executable being dumped. This is the first step toward producing ELF core dumps in the proper format. I will commit the code to generate the ELF core dumps Real Soon Now. In the meantime, ELF executables won't dump core at all. That is probably no less useful than dumping a.out-style core dumps as they have done until now. Submitted by: Alex <garbanzo@hooked.net> (with very minor changes by me)
# 37957	29-Jul-1998	dfr	Default to FreeBSD if no brand detected. This makes life easier when bootstrapping from NetBSD/alpha.
# 37656	15-Jul-1998	bde	Cast u_longs to uintptr_t before casting them to pointers. Don't attempt to even partially support systems with function pointers larger than object pointers.
# 37558	11-Jul-1998	bde	Fixed printf format errors.
# 36765	08-Jun-1998	dfr	Fix a typo which prevented i386 elf from working at all (including Linux emulated elf binaries).
# 36735	07-Jun-1998	dfr	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
# 35496	28-Apr-1998	eivind	Translate T_PROTFLT to SIGSEGV instead of SIGBUS when running under Linux emulation. This make Allegro Common Lisp 4.3 work under FreeBSD! Submitted by: Fred Gilham <gilham@csl.sri.com> Commented on by: bde, dg, msmith, tg Hoping he got everything right: eivind
# 34928	28-Mar-1998	bde	Removed unused #includes.
# 33983	02-Mar-1998	peter	Update the ELF image activator to use some of the exec resources rather than rolling it's own. This means that it now uses the "safe" exec_map_first_page() to get the ld.so headers rather than risking a panic on a page fault failure (eg: NFS server goes down). Since all the ELF tools go to a lot of trouble to make sure everything lives in the first page for executables, this is a win. I have not seen any ELF executable on any system where all the headers didn't fit in the first page with lots of room to spare. I have been running variations of this code for some time on my pure ELF systems.
# 33181	09-Feb-1998	eivind	Staticize.
# 29649	21-Sep-1997	peter	We were (I think) missing a vrele() on the vnode for the object loaded via PT_INTERP (usually /usr/libexec/ld-elf.so.1).
# 24848	12-Apr-1997	dyson	Fully implement vfork. Vfork is now much much faster than even our fork. (On my machine, fork is about 240usecs, vfork is 78usecs.) Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory from the other threads of a group. Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating possible existing shares with other threads/processes. Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a thread from the rest of the group. Fix the case where a thread does an exec. It is almost nonsense for a thread to modify the other threads address space by an exec, so we now automatically divorce the address space before modifying it.
# 24482	01-Apr-1997	bde	Use OID_AUTO instead of magic number for old sysctl debug.elf_trace. The magic number conflicted with the one for the Lite2 sysctl debug.busyprt. Staticized some variables. Removed unused #includes.
# 24131	23-Mar-1997	bde	Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
# 22975	22-Feb-1997	peter	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 22521	10-Feb-1997	dyson	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 20821	22-Dec-1996	joerg	Make DFLDSIZ and MAXDSIZ fully-supported options. "Don't forget to do a ``make depend''" :-)
# 19162	24-Oct-1996	sos	Added a missing break, so all static bins would be missed :(
# 18967	16-Oct-1996	sos	Oops forgot to remove a debug printf.
# 18959	16-Oct-1996	sos	Prepare kernel to take advantage of "branded" ELF binaries.
# 18651	03-Oct-1996	peter	Drop an unused param to unmap_pages().
# 17974	31-Aug-1996	bde	Fixed the easy cases of const poisoning in the kernel. Cosmetic.
# 16474	18-Jun-1996	dyson	Clean-up the new VM map procfs code, and also add support for executable format file "etype". It contains a description of the binary type for a process.
# 16322	12-Jun-1996	gpalmer	Clean up -Wunused warnings. Reviewed by: bde
# 15494	01-May-1996	bde	Removed unnecessary #includes from <sys/imgact.h> so that it is self-sufficient and added explicit #includes where required.
# 14584	12-Mar-1996	peter	Remove references to MAP_FILE.. That is now "default" and is only a "#define MAP_FILE 0" that is still there for net-2 source compatability.
# 14473	10-Mar-1996	peter	Tweak the data/bss segment page count. The last version worked with all the test cases I tried, I'm sure this is more correct. Tweak some prototypes.
# 14467	10-Mar-1996	peter	Fix some rounding problems.. In some (fairly rare) situtaions it mapped one page too many, which caused obreak() to fail in vm_map_find() with ENOMEM because of the conflicting page.
# 14456	10-Mar-1996	sos	First attempt at FreeBSD & Linux ELF support. Compile and link a new kernel, that will give native ELF support, and provide the hooks for other ELF interpreters as well. To make native ELF binaries use John Polstras elf-kit-1.0.1.. For the time being also use his ld-elf.so.1 and put it in /usr/libexec. The Linux emulator has been enhanced to also run ELF binaries, it is however in its very first incarnation. Just get some Linux ELF libs (Slackware-3.0) and put them in the prober place (/compat/linux/...). I've ben able to run all the Slackware-3.0 binaries I've tried so far. (No it won't run quake yet :)