Cross Reference: /freebsd-current/sys/kern/imgact

History log of /freebsd-current/sys/kern/imgact_elf.c
Revision	Date	Author	Comments
# 364d1b2f	04-Mar-2024	John Baldwin <jhb@FreeBSD.org>	imgact_elf: Add const to the checknote parameter to __elfN(parse_notes) Reviewed by: imp, kib Sponsored by: University of Cambridge, Google, Inc. Differential Revision: https://reviews.freebsd.org/D44215
# 169641f7	04-Mar-2024	Alex Richardson <arichardson@FreeBSD.org>	imgact_elf: Add const to a few struct image_params pointers This makes it more obvious which functions modify fields in this struct. Reviewed by: imp, kib Obtained from: CheriBSD Differential Revision: https://reviews.freebsd.org/D44214
# 29d4f8bf	09-Feb-2024	Konstantin Belousov <kib@FreeBSD.org>	ELF note parser: provide more info on failure Print reasons when parser declined to parse notes, due to mis-alignment, invalid length, or too many notes (the later typically means that there is a loop). Also increase the loop limit to 4096, which gives enough iterations for notes to fill whole notes' page. Sponsored by: The FreeBSD Foundation MFC after: 3 days
# a67edb56	09-Feb-2024	Konstantin Belousov <kib@FreeBSD.org>	imgact_elf.c: remove sys/cdefs.h include Sponsored by: The FreeBSD Foundation MFC after: 3 days
# eb32c1c7	02-Nov-2023	Andrew Turner <andrew@FreeBSD.org>	sysent: Add sv_protect To allow for architecture specific protections add sv_protect to struct sysent. This can be used to apply these after the executable is loaded into the new address space. Reviewed by: kib Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D42440
# a04633ce	01-Nov-2023	Andrew Turner <andrew@FreeBSD.org>	imgact_elf: Export __elfN(parse_notes) This is useful to check if a note is present and contains an expected value, e.g. to read NT_GNU_PROPERTY_TYPE_0 on arm64 to see if we should enable BTI. Reviewed by: kib, markj Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D42439
# 9d2612fc	01-Nov-2023	Andrew Turner <andrew@FreeBSD.org>	imgact_elf: Move GNU_ABI_VENDOR to a common header Move the definition of GNU_ABI_VENDOR to a common location so it can be used in multiple files. Reviewed by: emaste, kib, imp Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D42442
# 326bf508	26-Oct-2023	Brooks Davis <brooks@FreeBSD.org>	auxv: make AT_BSDFLAGS unsigned AT_BSDFLAGS shouldn't be sign extended on 64-bit systems so use a uint32_t instead of an int. Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D42365
# 1798b44f	24-Oct-2023	Konstantin Belousov <kib@FreeBSD.org>	user stack randomization: only enable by default for 64bit processes All aslr knobs are disabled by default for 32bit processes, except stack. This results in weird stack location, typically making around 1G of user address space hard to use. Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42356
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 659a0041	30-May-2023	Jessica Clarke <jrtc27@FreeBSD.org>	imgact: Make et_dyn_addr part of image_params This already gets passed around between various imgact_elf functions, so moving it removes an argument from all those places. A future commit will make use of this for hwpmc, though, to provide the load base for PIEs, which currently isn't available to tools like pmcstat. Reviewed by: kib, markj, jhb Differential Revision: https://reviews.freebsd.org/D39594
# 57578dea	29-May-2023	Dmitry Chagin <dchagin@FreeBSD.org>	Brandinfo: Retire emul_path as unneeded anymore The Barndinfo emul_path was used by the Elf image activator to fixup interpreter file name according to ABI root directory. Since the non-native ABI can now specify its root directory directly to namei() via pwd_altroot() call this facility is not needed anymore. Differential Revision: https://reviews.freebsd.org/D40091 MFC after: 2 month
# ff41239f	12-Sep-2022	Konstantin Belousov <kib@FreeBSD.org>	Add AT_USRSTACK{BASE, LIM} AT vectors, and ELF_BSDF_VMNOOVERCOMMIT flag Reviewed by: brooks, imp (previous version) Discussed with: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36540
# fbafa98a	18-Mar-2022	Ed Maste <emaste@FreeBSD.org>	Disallow invalid PT_GNU_STACK Stack must be at least readable and writable. PR: 242570 Reviewed by: kib, markj MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35867
# 00d17cf3	03-Jun-2022	Konstantin Belousov <kib@FreeBSD.org>	elf_note_prpsinfo: handle more failures from proc_getargv() Resulting sbuf_len() from proc_getargv() might return 0 if user mangled ps_strings enough. Also, sbuf_len() API contract is to return -1 if the buffer overflowed. The later should not occur because get_ps_strings() checks for catenated length, but check for this subtle detail explicitly as well to be more resilent. The end result is that p_comm is used in this situations. Approved by: so Security: FreeBSD-SA-22:09.elf Reported by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: delphij, markj admbugs: 988 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35391
# f0687f3e	05-Aug-2022	Ed Maste <emaste@FreeBSD.org>	Clarify code comments on ASLR default settings Sponsored by: The FreeBSD Foundation
# 939f0b63	10-May-2022	Kornel Dulęba <kd@FreeBSD.org>	Implement shared page address randomization It used to be mapped at the top of the UVA. If the randomization is enabled any address above .data section will be randomly chosen and a guard page will be inserted in the shared page default location. The shared page is now mapped in exec_map_stack, instead of exec_new_vmspace. The latter function is called before image activator has a chance to parse ASLR related flags. The KERN_PROC_VM_LAYOUT sysctl was extended to provide shared page address. The feature is enabled by default for 64 bit applications on all architectures. It can be toggled kern.elf64.aslr.shared_page sysctl. Approved by: mw(mentor) Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35349
# 361971fb	02-Jun-2022	Kornel Dulęba <kd@FreeBSD.org>	Rework how shared page related data is stored Store the shared page address in struct vmspace. Also instead of storing absolute addresses of various shared page segments save their offsets with respect to the shared page address. This will be more useful when the shared page address is randomized. Approved by: mw(mentor) Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35393
# 0288d427	30-Jun-2022	John Baldwin <jhb@FreeBSD.org>	Add register sets for NT_THRMISC and NT_PTLWPINFO. For the kernel this is mostly a non-functional change. However, this will be useful for simplifying gcore(1). Reviewed by: markj MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D35666
# bb92cd7b	24-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)
# 1babcad6	22-Mar-2022	Mark Johnston <markj@FreeBSD.org>	elf: Avoid dumping uninitialized bytes in PRSTATUS core dump notes elf_prstatus_t contains pad space. Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34606
# 6b71405b	10-Mar-2022	John Baldwin <jhb@FreeBSD.org>	Store core dump notes for all valid register sets for FreeBSD processes. In particular, use a generic wrapper around struct regset rather than requiring per-regset helpers. This helper replaces the MI __elfN(note_prstatus) and __elfN(note_fpregset) helpers. It also removes the need to explicitly dump NT_ARM_ADDR_MASK in the arm64 __elfN(dump_thread). Reviewed by: markj, emaste Sponsored by: University of Cambridge, Google, Inc. Differential Revision: https://reviews.freebsd.org/D34446
# 0b25cbc7	03-Mar-2022	John Baldwin <jhb@FreeBSD.org>	Fix the size returned for NT_FPREGSET. Sponsored by: University of Cambridge, Google, Inc.
# 548a2ec4	24-Jan-2022	Andrew Turner <andrew@FreeBSD.org>	Add PT_GETREGSET This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be used to access all the registers from a specified core dump note type. The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other machine-dependant types are expected to be added in the future. The ptrace addr points to a struct iovec pointing at memory to hold the registers along with its length. On success the length in the iovec is updated to tell userspace the actual length the kernel wrote or, if the base address is NULL, the length the kernel would have written. Because the data field is an int the arguments are backwards when compared to the Linux PTRACE_GETREGSET call. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19831
# 1811c1e9	17-Jan-2022	Mark Johnston <markj@FreeBSD.org>	exec: Reimplement stack address randomization The approach taken by the stack gap implementation was to insert a random gap between the top of the fixed stack mapping and the true top of the main process stack. This approach was chosen so as to avoid randomizing the previously fixed address of certain process metadata stored at the top of the stack, but had some shortcomings. In particular, mlockall(2) calls would wire the gap, bloating the process' memory usage, and RLIMIT_STACK included the size of the gap so small (< several MB) limits could not be used. There is little value in storing each process' ps_strings at a fixed location, as only very old programs hard-code this address; consumers were converted decades ago to use a sysctl-based interface for this purpose. Thus, this change re-implements stack address randomization by simply breaking the convention of storing ps_strings at a fixed location, and randomizing the location of the entire stack mapping. This implementation is simpler and avoids the problems mentioned above, while being unlikely to break compatibility anywhere the default ASLR settings are used. The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack, and is re-enabled by default. PR: 260303 Reviewed by: kib Discussed with: emaste, mw MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33704
# 758d98de	17-Jan-2022	Mark Johnston <markj@FreeBSD.org>	exec: Remove the stack gap implementation ASLR stack randomization will reappear in a forthcoming commit. Rather than inserting a random gap into the stack mapping, the entire stack mapping itself will be randomized in the same way that other mappings are when ASLR is enabled. No functional change intended, as the stack gap implementation is currently disabled by default. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33704
# 706f4a81	17-Jan-2022	Mark Johnston <markj@FreeBSD.org>	exec: Introduce the PROC_PS_STRINGS() macro Rather than fetching the ps_strings address directly from a process' sysentvec, use this macro. With stack address randomization the ps_strings address is no longer fixed. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33704
# bfd45121	14-Dec-2021	Mark Johnston <markj@FreeBSD.org>	imgact_elf: Disable the stack gap for now The integration with RLIMIT_STACK is still causing problems for some programs such as lang/sdcc and syzkaller's executor. Until this is resolved by some work currently in progress, disable the stack gap by default. PR: 260303 Reviewed by: kib, emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33438
# e499988f	12-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	exec_elf: use intermediate u_long variable to correct mismatched type vm_offset_t * vs. u_long * Sponsored by: The FreeBSD Foundation MFC after: 1 week
# bf839416	08-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	imgact_elf: avoid mapsz overflow Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33359
# 36df8f54	09-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	imgact_elf: check that the alignment of PT_LOAD segment is power of two and stop recalculating alignment for PIE base, which was off by one power of two. Suggested and reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33359
# 714d6d09	08-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	imgact_elf: exclude invalid alignment requests Only accept at most superpage alignment, or if the arch does not have superpages supported, artificially limit it to PAGE_SIZE * 1024. This is somewhat arbitrary, and e.g. could change what binaries do we accept between native i386 vs. amd64 ia32 with superpages disabled, but I do not believe the difference there is affecting anybody with real (useful) binaries. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33359
# a4007ae1	09-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	rnd_elf: add comment explaining the interface Requested and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33359
# 9cf78c1c	07-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	elf image activator: convert asserts into errors Invalid (artificial) layout of the loadable ELF segments might result in triggering the assertion. This means that the file should not be executed, regardless of the kernel debug mode. Change calling conventions for rnd_elf{32,64} helpers to allow returning an error, and abort activation with ENOEXEC if its invariants are broken. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33359
# b4b20492	09-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	exec_elf: assert that the image vnode is still locked on return Suggested and reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33359
# 88dd7a0a	08-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	Style Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33359
# eb029587	24-Nov-2021	Konstantin Belousov <kib@FreeBSD.org>	Add kern.elf{32,64}.vdso knobs to enable/disable vdso preloading Reviewed by: emaste Discussed with: jrtc27 Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D32960
# 01c77a43	11-Nov-2021	Konstantin Belousov <kib@FreeBSD.org>	Pass vdso address to userspace Reviewed by: emaste Discussed with: jrtc27 Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D32960
# 7e1d3eef	25-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
# 4082b189	17-Nov-2021	Alex Richardson <arichardson@FreeBSD.org>	elf*_brand_inuse: Change return type to bool. Reviewed by: kib Obtained from: CheriBSD Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D33052
# 19621645	17-Nov-2021	Alex Richardson <arichardson@FreeBSD.org>	imgact_elf: Use bool instead of boolean_t. Reviewed by: kib Obtained from: CheriBSD Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D33051
# b014e0f1	24-Oct-2021	Marcin Wojtas <mw@FreeBSD.org>	Enable ASLR by default for 64-bit executables Address Space Layout Randomization (ASLR) is an exploit mitigation technique implemented in the majority of modern operating systems. It involves randomly positioning the base address of an executable and the position of libraries, heap, and stack, in a process's address space. Although over the years ASLR proved to not guarantee full OS security on its own, this mechanism can make exploitation more difficult. Tests on the tier 1 64-bit architectures demonstrated that the ASLR is stable and does not result in noticeable performance degradation, therefore it should be safe to enable this mechanism by default. Moreover its effectiveness is increased for PIE (Position Independent Executable) binaries. Thanks to commit 9a227a2fd642 ("Enable PIE by default on 64-bit architectures"), building from src is not necessary to have PIE binaries. It is enough to control usage of ASLR in the OS solely by setting the appropriate sysctls. This patch toggles the kernel settings to use address map randomization for PIE & non-PIE 64-bit binaries. It also disables SBRK, in order to allow utilization of the bss grow region for mappings. The latter has no effect if ASLR is disabled, so apply it to all architectures. As for the drawbacks, a consequence of using the ASLR is more significant VM fragmentation, hence the issues may be encountered in the systems with a limited address space in high memory consumption cases, such as buildworld. As a result, although the tests on 32-bit architectures with ASLR enabled were mostly on par with what was observed on 64-bit ones, the defaults for the former are not changed at this time. Also, for the sake of safety keep the feature disabled for 32-bit executables on 64-bit machines, too. The committed change affects the overall OS operation, so the following should be taken into consideration: * Address space fragmentation. * A changed ABI due to modified layout of address space. * More complicated debugging due to: * Non-reproducible address space layout between runs. * Some debuggers automatically disable ASLR for spawned processes, making target's environment different between debug and non-debug runs. In order to confirm/rule-out the dependency of any encountered issue on ASLR it is strongly advised to re-run the test with the feature disabled - it can be done by setting the following sysctls in the /etc/sysctl.conf file: kern.elf64.aslr.enable=0 kern.elf64.aslr.pie_enable=0 Co-developed by: Dawid Gorecki <dgr@semihalf.com> Reviewed by: emaste, kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential revision: https://reviews.freebsd.org/D27666
# 889b56c8	13-Oct-2021	Dawid Gorecki <dgr@semihalf.com>	setrlimit: Take stack gap into account. Calling setrlimit with stack gap enabled and with low values of stack resource limit often caused the program to abort immediately after exiting the syscall. This happened due to the fact that the resource limit was calculated assuming that the stack started at sv_usrstack, while with stack gap enabled the stack is moved by a random number of bytes. Save information about stack size in struct vmspace and adjust the rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max, then the value is truncated to rlim_max. PR: 253208 Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31516
# 796a8e1a	01-Sep-2021	Konstantin Belousov <kib@FreeBSD.org>	procctl(2): Add PROC_WXMAP_CTL/STATUS It allows to override kern.elf{32,64}.allow_wx on per-process basis. In particular, it makes it possible to run binaries without PT_GNU_STACK and without elfctl note while allow_wx = 0. Reviewed by: brooks, emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31779
# b7924341	27-Aug-2021	Andrew Turner <andrew@FreeBSD.org>	Create sys/reg.h for the common code previously in machine/reg.h Move the common kernel function signatures from machine/reg.h to a new sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2). Reviewed by: imp, markj Sponsored by: DARPA, AFRL (original work) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19830
# ebf98866	23-Jul-2021	Mark Johnston <markj@FreeBSD.org>	imgact_elf: Avoid redefining suword() Otherwise this interferes with the definition for sanitizer interceptors. MFC after: 1 week Sponsored by: The FreeBSD Foundation
# 5d9f7901	29-Jun-2021	Dmitry Chagin <dchagin@FreeBSD.org>	Eliminate p_elf_machine from struct proc. Instead of p_elf_machine use machine member of the Elf_Brandinfo which is now cached in the struct proc at p_elf_brandinfo member. Note to MFC: D30918, KBI Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D30926 MFC after: 2 weeks
# 615f22b2	29-Jun-2021	Dmitry Chagin <dchagin@FreeBSD.org>	Add a link to the Elf_Brandinfo into the struc proc. To allow the ABI to make a dicision based on the Brandinfo add a link to the Elf_Brandinfo into the struct proc. Add a note that the high 8 bits of Elf_Brandinfo flags is private to the ABI. Note to MFC: it breaks KBI. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D30918 MFC after: 2 weeks
# 435754a5	29-Jun-2021	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add infrastructure required for Linux coredump support This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`, and `sv_elf_core_prepare_notes` fields to `struct sysentvec`, and modifies imgact_elf.c to make use of them instead of hardcoding FreeBSD-specific values. It also updates all of the ABI definitions to preserve current behaviour. This makes it possible to implement non-native ELF coredump support without unnecessary code duplication. It will be used for Linux coredumps. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30921
# 61b4c627	22-Jun-2021	Edward Tomasz Napierala <trasz@FreeBSD.org>	imgact_elf.c: style, remove unnecessary casts Remove unnecessary type casts and redundant brackets. No functional changes. Suggested By: kib Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30841
# 06250515	21-Jun-2021	Edward Tomasz Napierala <trasz@FreeBSD.org>	imgact_elf: compute auxv buffer size instead of using magic value The new buffer is somewhat larger, but there should be no functional changes. Reviewed By: kib, imp Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30821
# 905d192d	26-May-2021	Edward Tomasz Napierala <trasz@FreeBSD.org>	Unstaticize parts of coredumping code This makes it possible to call __elfN(size_segments) and __elfN(puthdr) from Linux coredump code. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30455
# 33621dfc	22-May-2021	Edward Tomasz Napierala <trasz@FreeBSD.org>	Refactor core dumping code a bit This makes it possible to use core_write(), core_output(), and sbuf_drain_core_output(), in Linux coredump code. Moving them out of imgact_elf.c is necessary because of the weird way it's being built. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30369
# 86ffb3d1	24-Apr-2021	Konstantin Belousov <kib@FreeBSD.org>	ELF coredump: define several useful flags for the coredump operations - SVC_ALL request dumping all map entries, including those marked as non-dumpable - SVC_NOCOMPRESS disallows compressing the dump regardless of the sysctl policy - SVC_PC_COREDUMP is provided for future use by userspace core dump request Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29955
# 5bc3c617	24-Apr-2021	Konstantin Belousov <kib@FreeBSD.org>	imgact_elf: consistently pass flags from coredump down to helper functions Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29955
# 409ab7e1	26-Apr-2021	Mark Johnston <markj@FreeBSD.org>	imgact_elf: Ensure that the return value in parse_notes is initialized parse_notes relies on the caller-supplied callback to initialize "res". Two callbacks are used in practice, brandnote_cb and note_fctl_cb, and the latter fails to initialize res. Fix it. In the worst case, the bug would cause the inner loop of check_note to examine more program headers than necessary, and the note header usually comes last anyway. Reviewed by: kib Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29986
# 41032835	14-Feb-2021	Jason A. Harmening <jah@FreeBSD.org>	Fix divide-by-zero panic when ASLR is enabled and superpages disabled When locating the anonymous memory region for a vm_map with ASLR enabled, we try to keep the slid base address aligned on a superpage boundary to minimize pagetable fragmentation and maximize the potential usage of superpage mappings. We can't (portably) do this if superpages have been disabled by loader tunable and pagesizes[1] is 0, and it would be less beneficial in that case anyway. PR: 253511 Reported by: johannes@jo-t.de MFC after: 1 week Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28678
# 0659df6f	12-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	vm_map_protect: allow to set prot and max_prot in one go. This prevents a situation where other thread modifies map entries permissions between setting max_prot, then relocking, then setting prot, confusing the operation outcome. E.g. you can get an error that is not possible if operation is performed atomic. Also enable setting rwx for max_prot even if map does not allow to set effective rwx protection. Reviewed by: brooks, markj (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28117
# 2e1c94aa	08-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	Implement enforcing write XOR execute mapping policy. It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE \| PROT_EXEC are never specified together, if vm_map has MAP_WX flag set. FreeBSD control flag allows specific binary to request WX exempt, and there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/ disable globally. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050
# 4daea938	31-Dec-2020	Konstantin Belousov <kib@FreeBSD.org>	Lock proctree in around fill_kinfo_proc(). Proctree lock is needed for correct calculation and collection of the job-control related data in kinfo_proc. There was even an XXX comment about it. Satisfy locking and lock ordering requirements by taking proctree lock around pass over each bucket in proc_iterate(), and in sysctl_kern_proc() and note_procstat_proc() for individual process reporting. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871
# 673e2dd6	18-Dec-2020	Konstantin Belousov <kib@FreeBSD.org>	Add ELF flag to disable ASLR stack gap. Also centralize and unify checks to enable ASLR stack gap in a new helper exec_stackgap(). PR: 239873 Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 85078b85	17-Nov-2020	Conrad Meyer <cem@FreeBSD.org>	Split out cwd/root/jail, cmask state from filedesc table No functional change intended. Tracking these structures separately for each proc enables future work to correctly emulate clone(2) in linux(4). __FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof. Reviewed by: kib Discussed with: markj, mjg Differential Revision: https://reviews.freebsd.org/D27037
# f8e8a06d	10-Oct-2020	Conrad Meyer <cem@FreeBSD.org>	random(4) FenestrasX: Push root seed version to arc4random(3) Push the root seed version to userspace through the VDSO page, if the RANDOM_FENESTRASX algorithm is enabled. Otherwise, there is no functional change. The mechanism can be disabled with debug.fxrng_vdso_enable=0. arc4random(3) obtains a pointer to the root seed version published by the kernel in the shared page at allocation time. Like arc4random(9), it maintains its own per-process copy of the seed version corresponding to the root seed version at the time it last rekeyed. On read requests, the process seed version is compared with the version published in the shared page; if they do not match, arc4random(3) reseeds from the kernel before providing generated output. This change does not implement the FenestrasX concept of PCPU userspace generators seeded from a per-process base generator. That change is left for future discussion/work. Reviewed by: kib (previous version) Approved by: csprng (me -- only touching FXRNG here) Differential Revision: https://reviews.freebsd.org/D22839
# c88285c5	02-Oct-2020	Mark Johnston <markj@FreeBSD.org>	Fix the INVARIANTS build for 32-bit platforms Reported by: Jenkins MFC with: r366368
# f31695cc	02-Oct-2020	Mark Johnston <markj@FreeBSD.org>	Implement sparse core dumps Currently we allocate and map zero-filled anonymous pages when dumping core. This can result in lots of needless disk I/O and page allocations. This change tries to make the core dumper more clever and represent unbacked ranges of virtual memory by holes in the core dump file. Add a new page fault type, VM_FAULT_NOFILL, which causes vm_fault() to clean up and return an error when it would otherwise map a zero-filled page. Then, in the core dumper code, prefault all user pages and handle errors by simply extending the size of the core file. This also fixes a bug related to the fact that vn_io_fault1() does not attempt partial I/O in the face of errors from vm_fault_quick_hold_pages(): if a truncated file is mapped into a user process, an attempt to dump beyond the end of the file results in an error, but this means that valid pages immediately preceding the end of the file might not have been dumped either. The change reduces the core dump size of trivial programs by a factor of ten simply by excluding unaccessed libc.so pages. PR: 249067 Reviewed by: kib Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26590
# fec41f07	02-Oct-2020	Mark Johnston <markj@FreeBSD.org>	Simplify the check for non-dumpable VM object types OBJT_DEFAULT, _SWAP, _VNODE and _PHYS is exactly the set of non-fictitious object types, so just check for OBJ_FICTITIOUS. The check no longer excludes dead objects, but such objects have to be handled regardless. No functional change intended. Reviewed by: alc, dougm, kib Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26589
# 7de1bc13	07-Sep-2020	Konstantin Belousov <kib@FreeBSD.org>	imgact_elf.c: unify check for phdr fitting into the first page. Similar to the userspace rtld check. Reviewed by: dim, emaste (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26339
# 6fed89b1	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	kern: clean up empty lines in .c and .h files
# 0cad2aa2	23-Aug-2020	Konstantin Belousov <kib@FreeBSD.org>	Pass pointers to info parsed from notes, to brandinfo->header_supported filter. Currently, we parse notes for the values of ELF FreeBSD feature flags and osrel. Knowing these values, or knowing that image does not carry the note if pointers are NULL, is useful to decide which ABI variant (brand) we want to activate for the image. Right now this is only a plumbing change Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273
# b24e6ac8	16-Apr-2020	Brooks Davis <brooks@FreeBSD.org>	Convert canary, execpathp, and pagesizes to pointers. Use AUXARGS_ENTRY_PTR to export these pointers. This is a followup to r359987 and r359988. Reviewed by: jhb Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24446
# 9df1c38b	15-Apr-2020	Brooks Davis <brooks@FreeBSD.org>	Export argc, argv, envc, envv, and ps_strings in auxargs. This simplifies discovery of these values, potentially with reducing the number of syscalls we need to make at runtime. Longer term, we wish to convert the startup process to pass an auxargs pointer to _start() and use that rather than walking off the end of envv. This is cleaner, more C-friendly, and for systems with strong bounds (e.g. CHERI) necessary. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24407
# 59838c1a	01-Apr-2020	John Baldwin <jhb@FreeBSD.org>	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# 944cf37b	08-Feb-2020	Konstantin Belousov <kib@FreeBSD.org>	Add AT_BSDFLAGS auxv entry. The intent is to provide bsd-specific flags relevant to interpreter and C runtime. I did not want to reuse AT_FLAGS which is common ELF auxv entry. Use bsdflags to report kernel support for sigfastblock(2). This allows rtld and libthr to safely infer the syscall presence without SIGSYS. The tunable kern.elf{32,64}.sigfastblock blocks reporting. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773
# 3ff65f71	30-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	Remove duplicated empty lines from kern/*.c No functional changes.
# b249ce48	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
# 741dfd86	27-Dec-2019	Justin Hibbits <jhibbits@FreeBSD.org>	Fix the powerpc copyout fixup from r356113 Summary: r356113 used an older patch, which predated the freebsd_copyout_auxargs() addition. Fix this by using a private powerpc_copyout_auxargs() instead, and keep it private to powerpc, not in MI files. Reviewed by: kib, bdragon Differential Revision: https://reviews.freebsd.org/D22935
# 6b457273	26-Dec-2019	Justin Hibbits <jhibbits@FreeBSD.org>	Fix the build from r356113. Types had changed from when the patch was first created, and a final build was not done pre-commit.
# adea0d63	26-Dec-2019	Justin Hibbits <jhibbits@FreeBSD.org>	Eliminate the last MI difference in AT_* definitions (for powerpc). Summary: As a transition aide, implement an alternative elfN_freebsd_fixup which is called for old powerpc binaries. Similarly, add a translation to rtld to convert old values to new ones (as expected by a new rtld). Translation of old<->new values is incomplete, but sufficient to allow an installworld of a new userspace from an old one when a new kernel is running. Test Plan: Someone needs to see how a new kernel/rtld/libc works with an old binary. If if works we can probalby ship this. If not we probalby need some more compat bits. Submitted by: brooks Reviewed by: jhibbits Differential Revision: https://reviews.freebsd.org/D20799
# d8010b11	09-Dec-2019	John Baldwin <jhb@FreeBSD.org>	Copy out aux args after the argument and environment vectors. Partially revert r354741 and r354754 and go back to allocating a fixed-size chunk of stack space for the auxiliary vector. Keep sv_copyout_auxargs but change it to accept the address at the end of the environment vector as an input stack address and no longer allocate room on the stack. It is now called at the end of copyout_strings after the argv and environment vectors have been copied out. This should fix a regression in r354754 that broke the stack alignment for newer Linux amd64 binaries (and probably broke Linux arm64 as well). Reviewed by: kib Tested on: amd64 (native, linux64 (only linux-base-c7), and i386) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22695
# 31174518	03-Dec-2019	John Baldwin <jhb@FreeBSD.org>	Use uintptr_t instead of register_t * for the stack base. - Use ustringp for the location of the argv and environment strings and allow destp to travel further down the stack for the stackgap and auxv regions. - Update the Linux copyout_strings variants to move destp down the stack as was done for the native ABIs in r263349. - Stop allocating a space for a stack gap in the Linux ABIs. This used to hold translated system call arguments, but hasn't been used since r159992. Reviewed by: kib Tested on: md64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22501
# 03b0d68c	18-Nov-2019	John Baldwin <jhb@FreeBSD.org>	Check for errors from copyout() and suword*() in sv_copyout_args/strings. Reviewed by: brooks, kib Tested on: amd64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22401
# e3532331	15-Nov-2019	John Baldwin <jhb@FreeBSD.org>	Add a sv_copyout_auxargs() hook in sysentvec. Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv instead of doing it in the sv_fixup hook. In particular, this new hook allows the stack space to be allocated at the same time the auxv values are copied out to userland. This allows us to avoid wasting space for unused auxv entries as well as not having to recalculate where the auxv vector is by walking back up over the argv and environment vectors. Reviewed by: brooks, emaste Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64 Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22355
# 2e5f9189	31-Oct-2019	Ed Maste <emaste@FreeBSD.org>	avoid kernel stack data leak in core dump thrmisc note bzero the entire thrmisc struct, not just the padding. Other core dump notes are already done this way. Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reviewed by: markj MFC after: 3 days Sponsored by: The FreeBSD Foundation
# 2288078c	08-Oct-2019	Doug Moore <dougm@FreeBSD.org>	Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map. In case the implementation ever changes from using a chain of next pointers, then changing the macro definition will be necessary, but changing all the files that iterate over vm_map entries will not. Drop a counter in vm_object.c that would have an effect only if the vm_map entry count was wrong. Discussed with: alc Reviewed by: markj Tested by: pho (earlier version) Differential Revision: https://reviews.freebsd.org/D21882
# f33533da	21-Sep-2019	Konstantin Belousov <kib@FreeBSD.org>	kern.elf{32,64}.pie_base sysctl: enforce page alignment. Requested by: rstone Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 95aafd69	21-Sep-2019	Konstantin Belousov <kib@FreeBSD.org>	Make non-ASLR pie base tunable. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 1073d17e	07-Sep-2019	Konstantin Belousov <kib@FreeBSD.org>	When loading ELF interpreter, initialize whole nested image_params with zero. Otherwise we could mishandle imgp->textset. Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560
# a1549acb	05-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	Fix mis-merge. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# f422bc30	02-Aug-2019	John Baldwin <jhb@FreeBSD.org>	Set ISOPEN in namei flags when opening executable interpreters. These vnodes are explicitly opened via VOP_OPEN via exec_check_permissions identical to the main exectuable image. Setting ISOPEN allows filesystems to perform suitable checks in VOP_LOOKUP (e.g. close-to-open consistency in the NFS client). Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D21129
# fc83c5a7	31-Jul-2019	Konstantin Belousov <kib@FreeBSD.org>	Make randomized stack gap between strings and pointers to argv/envs. This effectively makes the stack base on the csu _start entry randomized. The gap is enabled if ASLR is for the ABI is enabled, and then kern.elf{64,32}.aslr.stack_gap specify the max percentage of the initial stack size that can be wasted for gap. Setting it to zero disables the gap, and max is capped at 50%. Only amd64 for now. Reviewed by: cem, markj Discussed with: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21081
# e020a35f	24-Jul-2019	Mark Johnston <markj@FreeBSD.org>	Remove a redundant offset computation in elf_load_section(). With r344705 the offset is always zero. Submitted by: Wuyang Chung <wuyang.chung1@gmail.com>
# 5c32e9fc	18-Jun-2019	Alexander Motin <mav@FreeBSD.org>	Optimize kern.geom.conf* sysctls. On large systems those sysctls may generate megabytes of output. Before this change sbuf(9) code was resizing buffer by 4KB each time many times, generating tons of TLB shootdowns. Unfortunately in this case existing sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not applicable, since all the sbuf writes are done in different kernel thread. This change improves situation in two ways: - on first sysctl call, not providing any output buffer, it sets special sbuf drain function, just counting the data and so not needing big buffer; - on second sysctl call it uses as initial buffer size value saved on previous call, so that in most cases there will be no reallocation, unless GEOM topology changed significantly. MFC after: 1 week Sponsored by: iXsystems, Inc.
# f1f81d3b	17-May-2019	Konstantin Belousov <kib@FreeBSD.org>	Grammar fixes for r347690. Submitted by: alc MFC after: 3 days
# 0ddfdc60	16-May-2019	Konstantin Belousov <kib@FreeBSD.org>	imgact_elf.c: Add comment explaining the malloc/VOP_UNLOCK() dance from r347148. Requested by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 78022527	05-May-2019	Konstantin Belousov <kib@FreeBSD.org>	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923
# 2d6b8546	05-May-2019	Konstantin Belousov <kib@FreeBSD.org>	imgact_elf: do not relock the text vnode if possible. We unlock the vnode around malloc(M_WAITOK), to make it possible for pagedaemon to flush vnode pages for us. Instead of doing it unconditionally, first try M_NOWAIT allocation, which typically succeed. Only on failure, unlock the vnode and retry with M_WAITOK. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19923
# 4033ecc9	11-Apr-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Use shared vnode locks for the ELF interpreter. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19874
# b65ca345	10-Apr-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Improve vnode lock assertions. MFC after: 2 weeks Sponsored by: DARPA, AFRL
# 9bcd7482	09-Apr-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Factor out section loading into a separate function. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19846
# 9274fb35	08-Apr-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Refactor ELF interpreter loading into a separate function. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19741
# be7808dc	30-Mar-2019	Konstantin Belousov <kib@FreeBSD.org>	Fix branding after r345661. In particular, elf32 FreeBSD binaries were not executed on LP64 hosts. The interp_name_len value should account for the nul terminator. This is needed for strncmp()s in brand checking code to work. Reported by: andreast Sponsored by: The FreeBSD Foundation MFC after: 12 days (together with r345661)
# 09c78d53	28-Mar-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Factor out retrieving the interpreter path from the main ELF loader routine. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19715
# 20e1174a	26-Mar-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Factor out resource limit enforcement code in the ELF loader. It makes the code slightly easier to follow, and might make it easier to fix the resouce accounting to also account for the interpreter. The PROC_UNLOCK() is moved earlier - I don't see anything it should protect; the lim_max() is a wrapper around lim_rlimit(), and that, differently from lim_rlimit_proc(), doesn't require the proc lock to be held. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19689
# 545517f1	23-Mar-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove trunc_page_ps() and round_page_ps() macros. This completes the undoing of r100384. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19680
# 1699546d	01-Mar-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove sv_pagesize, originally introduced with r100384. In all of the architectures we have today, we always use PAGE_SIZE. While in theory one could define different things, none of the current architectures do, even the ones that have transitioned from 32-bit to 64-bit like i386 and arm. Some ancient mips binaries on other systems used 8k instead of 4k, but we don't support running those and likely never will due to their age and obscurity. Reviewed by: imp (who also contributed the commit message) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19280
# fa50a355	10-Feb-2019	Konstantin Belousov <kib@FreeBSD.org>	Implement Address Space Layout Randomization (ASLR) With this change, randomization can be enabled for all non-fixed mappings. It means that the base address for the mapping is selected with a guaranteed amount of entropy (bits). If the mapping was requested to be superpage aligned, the randomization honours the superpage attributes. Although the value of ASLR is diminshing over time as exploit authors work out simple ASLR bypass techniques, it elimintates the trivial exploitation of certain vulnerabilities, at least in theory. This implementation is relatively small and happens at the correct architectural level. Also, it is not expected to introduce regressions in existing cases when turned off (default for now), or cause any significant maintaince burden. The randomization is done on a best-effort basis - that is, the allocator falls back to a first fit strategy if fragmentation prevents entropy injection. It is trivial to implement a strong mode where failure to guarantee the requested amount of entropy results in mapping request failure, but I do not consider that to be usable. I have not fine-tuned the amount of entropy injected right now. It is only a quantitive change that will not change the implementation. The current amount is controlled by aslr_pages_rnd. To not spoil coalescing optimizations, to reduce the page table fragmentation inherent to ASLR, and to keep the transient superpage promotion for the malloced memory, locality clustering is implemented for anonymous private mappings, which are automatically grouped until fragmentation kicks in. The initial location for the anon group range is, of course, randomized. This is controlled by vm.cluster_anon, enabled by default. The default mode keeps the sbrk area unpopulated by other mappings, but this can be turned off, which gives much more breathing bits on architectures with small address space, such as i386. This is tied with the question of following an application's hint about the mmap(2) base address. Testing shows that ignoring the hint does not affect the function of common applications, but I would expect more demanding code could break. By default sbrk is preserved and mmap hints are satisfied, which can be changed by using the kern.elf{32,64}.aslr.honor_sbrk sysctl. ASLR is enabled on per-ABI basis, and currently it is only allowed on FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support for additional architectures will be added after further testing. Both per-process and per-image controls are implemented: - procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS; - NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible to force ASLR off for the given binary. (A tool to edit the feature control note is in development.) Global controls are: - kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2); - kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings; - kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2); - vm.cluster_anon - enables anon mapping clustering. PR: 208580 (exp runs) Exp-runs done by: antoine Reviewed by: markj (previous version) Discussed with: emaste Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5603
# eb785fab	06-Feb-2019	Konstantin Belousov <kib@FreeBSD.org>	Port sysctl kern.elf32.read_exec from amd64 to i386. Make it more comprehensive on i386, by not setting nx bit for any mapping, not just adding PF_X to all kernel-loaded ELF segments. This is needed for the compatibility with older i386 programs that assume that read access implies exec, e.g. old X servers with hand-rolled module loader. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
# b0b246b0	08-Dec-2018	Mateusz Guzik <mjg@FreeBSD.org>	Remove proctree acquire from note_procstat_proc It is not needed since r340482 ("proc: always store parent pid in p_oppid") Sponsored by: The FreeBSD Foundation
# cefb93f2	23-Nov-2018	Konstantin Belousov <kib@FreeBSD.org>	Parse FreeBSD Feature Control note on the ELF image activation. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 92328a32	23-Nov-2018	Konstantin Belousov <kib@FreeBSD.org>	Generalize ELF parse_notes(). Remove the knowledge of the ABI note type and brandnote from it, instead provide it with a callback to do note-specific matching and data fetching. Implement callback to match against ELF brand. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# eda8fe63	23-Nov-2018	Konstantin Belousov <kib@FreeBSD.org>	Trivial reduction of the code duplication, reuse the return FALSE code. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 4bf4b0f1	07-Nov-2018	John Baldwin <jhb@FreeBSD.org>	Enable non-executable stacks by default on RISC-V. Reviewed by: markj Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D17878
# c9e562b1	11-Sep-2018	Gordon Tetlow <gordon@FreeBSD.org>	Correct ELF header parsing code to prevent invalid ELF sections from disclosing memory. Submitted by: markj Reported by: Thomas Barabosch, Fraunhofer FKIE Approved by: re (implicit) Approved by: so Security: FreeBSD-SA-18:12.elf Security: CVE-2018-6924 Sponsored by: The FreeBSD Foundation
# 455d3589	30-Jul-2018	David E. O'Brien <obrien@FreeBSD.org>	Correct copyright dates.
# 53e20b27	19-Jul-2018	Konstantin Belousov <kib@FreeBSD.org>	When reporting an error, print the errno value. Sponsored by: The FreeBSD Foundation MFC after: 3 days
# d8b2f079	29-May-2018	Brooks Davis <brooks@FreeBSD.org>	Correct pointer subtraction in KASSERT(). The assertion would never fire without truly spectacular future programming errors. Reported by: Coverity CID: 1391367, 1391368 Sponsored by: DARPA, AFRL
# 5f77b8a8	24-May-2018	Brooks Davis <brooks@FreeBSD.org>	Avoid two suword() calls per auxarg entry. Instead, construct an auxargs array and copy it out all at once. Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent the array. This is the correct type where pairs of words just happend to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act on this array rather than introducing a new macro. Return errors on copyout() and suword() failures and handle them in the caller. Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64 which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32 (now removed due to the use of proper types). Reviewed by: kib Comments from: emaste, jhb Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15485
# 6469bdcd	06-Apr-2018	Brooks Davis <brooks@FreeBSD.org>	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941
# a95659f7	13-Mar-2018	Ed Maste <emaste@FreeBSD.org>	Use C99 boolean type for translate_osrel Migrate to modern types before creating MD Linuxolator bits for new architectures. Reviewed by: cem Sponsored by: Turing Robotic Industries Inc. Differential Revision: https://reviews.freebsd.org/D14676
# b7feabf9	13-Mar-2018	Ed Maste <emaste@FreeBSD.org>	Use C99 designated initializers for struct execsw It it makes use slightly more clear and facilitates grepping.
# 5cc6d253	12-Mar-2018	Ed Maste <emaste@FreeBSD.org>	ANSIfy sys/kern/imgact_*
# d722231b	05-Feb-2018	John Baldwin <jhb@FreeBSD.org>	Always give ELF brands a chance to veto a match. If a brand provides a header_supported hook, check it when trying to find a brand based on a matching interpreter as well as in the final loop for the fallback brand. Previously a brand might reject a binary via a header_supported hook in one of the earlier loops, but still be chosen by one of these later loops. Reviewed by: kib Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA / AFRL Differential Revision: https://reviews.freebsd.org/D13945
# 78f57a9c	08-Jan-2018	Mark Johnston <markj@FreeBSD.org>	Generalize the gzio API. We currently use a set of subroutines in kern_gzio.c to perform compression of user and kernel core dumps. In the interest of adding support for other compression algorithms (zstd) in this role without complicating the API consumers, add a simple compressor API which can be used to select an algorithm. Also change the (non-default) GZIO kernel option to not enable compressed user cores by default. It's not clear that such a default would be desirable with support for multiple algorithms implemented, and it's inconsistent in that it isn't applied to kernel dumps. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D13632
# 8a36da99	27-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
# 904d8c49	20-Oct-2017	Michal Meloun <mmel@FreeBSD.org>	Add AT_HWCAP2 ELF auxiliary vector. - allocate value for new AT_HWCAP2 auxiliary vector on all platforms. - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly same way as for AT_HWCAP. MFC after: 1 month Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D12699
# c2f37b92	14-Sep-2017	John Baldwin <jhb@FreeBSD.org>	Add AT_HWCAP and AT_EHDRFLAGS on all platforms. A new 'u_long sv_hwcap' field is added to 'struct sysentvec'. A process ABI can set this field to point to a value holding a mask of architecture-specific CPU feature flags. If an ABI does not wish to supply AT_HWCAP to processes the field can be left as NULL. The support code for AT_EHDRFLAGS was already present on all systems, just the #define was not present. This is a step towards unifying the AT_ constants across platforms. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12290
# 51645e83	29-Jun-2017	John Baldwin <jhb@FreeBSD.org>	Store a 32-bit PT_LWPINFO struct for 32-bit process core dumps. Process core notes for a 32-bit process running on a 64-bit host need to use 32-bit structures so that the note layout matches the layout of notes of a core dump of a 32-bit process under a 32-bit kernel. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D11407
# 86be94fc	30-Mar-2017	Tycho Nightingale <tychon@FreeBSD.org>	Add support for capturing 'struct ptrace_lwpinfo' for signals resulting in a process dumping core in the corefile. Also extend procstat to view select members of 'struct ptrace_lwpinfo' from the contents of the note. Sponsored by: Dell EMC Isilon
# 3aeacc55	29-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	A followup to r315749, two more places where brand->interp_path was accessed unconditionally. Reported by: se Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 0fe98320	23-Mar-2017	Ed Schouten <ed@FreeBSD.org>	Don't require the presence of the compat_3_brand. The existing ELF image activator requires the brandinfo to provide such a string unconditionally, even if the executable format in question doesn't use this type of branding. Skip matching when it's a null pointer. Reviewed by: kib MFC after: 2 weeks
# 2274ab3d	22-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Update r315753 with the proper flag name. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 1438fe3c	22-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only matches static binaries. Interpretation of the 'static' there is that the binary must not specify an interpreter. In particular, shared objects are matched by the brand if BI_CAN_EXEC_DYN is also set. This improves precision of the brand matching, which should eliminate surprises due to brand ordering. Revert r315701. Discussed with and tested by: ed (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 7aab7a80	22-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Adjust r314851 to not require every brand to specify interpreter path. Reported and tested by: ed Sponsored by: The FreeBSD Foundation MFC after: 1 week
# c547cbb4	18-Mar-2017	Alan Cox <alc@FreeBSD.org>	Avoid unnecessary calls to vm_map_protect() in elf_load_section(). Typically, when elf_load_section() unconditionally passed VM_PROT_ALL to elf_map_insert(), it was needlessly enabling execute access on the mapping, and it would later have to call vm_map_protect() to correct the mapping's access rights. Now, instead, elf_load_section() always passes its parameter "prot" to elf_map_insert(). So, elf_load_section() must only call vm_map_protect() if it needs to remove the write access that was temporarily granted to perform a copyout(). Reviewed by: kib MFC after: 1 week
# 9bcf2f2d	12-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Accept linkers representation for ELF segments with zero on-disk length. For such segments, GNU bfd linker writes knowingly incorrect value into the the file offset field of the program header entry, with the motivation that file should not be mapped for creation of this segment at all. Relax checks for the ELF structure validity when on-disk segment length is zero, and explicitely set mapping length to zero for such segments to avoid validating rounding arithmetic. PR: 217610 Reported by: Robert Clausecker <fuz@fuz.su> Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 973d67c4	12-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# e383e820	11-Mar-2017	Alan Cox <alc@FreeBSD.org>	Simplify the control flow and tidy up a comment in map_insert. In collaboration with: kib MFC after: 1 week
# 15a9aedf	07-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	When selecting brand based on old Elf branding, prefer the brand which interpreter exactly matches the one requested by the activated image. This change applies r295277, which did the same for note branding, to the old brand selection, with the same reasoning of fixing compat32 interpreter substitution. PR: 211837 Reported by: kenji@kens.fm Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 3d560b4b	07-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Require whole brand string matching for old Elf branding. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 0bbee4cd	07-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Consistently use vm_ooffset_t type for the vm object offset in elf_load_section. The values passed currently as vm_offset_t are phdr.p_offset, which have the native Elf word size. Since elf_load_section interprets them as the file offset, use vm object offset type. Noted and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
# aaadc41f	06-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Instead of direct use of vm_map_insert(), call vm_map_fixed(MAP_CHECK_EXCL). This KPI explicitely indicates the intent of creating the mapping at the fixed address, and incorporates the map locking into the callee. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 28e8da65	05-Mar-2017	Alan Cox <alc@FreeBSD.org>	Style and punctuation fixes. Reviewed by: kib MFC after: 3 days
# fe0a8a39	02-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Style. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 55b985b4	01-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Use vm_map_insert() instead of vm_map_find() in elf_map_insert(). Elf_map_insert() needs to create mapping at the known fixed address. Usage of vm_map_find() assumes, on the other hand, that any suitable address space range above or equal the specified hint, is acceptable. Due to operating on the fresh or cleared address space, vm_map_find() usually creates mapping starting exactly at hint. Switch to vm_map_insert() use to clearly request fixed mapping from the VM. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# e3d8f8fe	01-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	When deallocating the vm object in elf_map_insert() due to vm_map_insert() failure, drop the vnode lock around the call to vm_object_deallocate(). Since the deallocated object is the vm object of the vnode, we might get the vnode lock recursion there. In fact, it is almost impossible to make vm_map_insert() failing there on stock kernel. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 885f13dc	07-Feb-2017	John Baldwin <jhb@FreeBSD.org>	Copy the e_machine and e_flags fields from the binary into an ELF core dump. In the kernel, cache the machine and flags fields from ELF header to use in the ELF header of a core dump. For gcore, the copy these fields over from the ELF header in the binary. This matters for platforms which encode ABI information in the flags field (such as o32 vs n32 on MIPS). Reviewed by: kib Sponsored by: DARPA / AFRL Differential Revision: https://reviews.freebsd.org/D9392
# 77ebe276	24-Jan-2017	Ed Maste <emaste@FreeBSD.org>	imgact_elf: refactor et_dyn_addr calculation This simplifies the logic somewhat. It is extracted from the change in review in D5603. Differential Revision: https://reviews.freebsd.org/D9321
# c468ff88	20-Jan-2017	Andriy Gapon <avg@FreeBSD.org>	don't abort writing of a core dump after EFAULT It's possible to get EFAULT when writing a segment backed by a file if the segment extends beyond the file. The core dump could still be useful if we skip the rest of the segment and proceed to other segements. The skipped segment (or a portion of it) will be zero-filled. While there, use 'const' to signify that core_write() only reads the buffer and use __DECONST before calling vn_rdwr_inchunks() because it can be used for both reading and writing. Before the change: kernel: Failed to write core file for process mmap_trunc_core (error 14) kernel: pid 77718 (mmap_trunc_core), uid 1001: exited on signal 6 After the change: kernel: Failed to fully fault in a core file segment at VA 0x800645000 with size 0x4000 to be written at offset 0x29000 for process mmap_trunc_core kernel: pid 4901 (mmap_trunc_core), uid 1001: exited on signal 6 (core dumped) Reviewed by: julian, kib Obtained from: Panzura (older version of the change) MFC after: 5 days Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D9233
# 5420f76b	04-Oct-2016	Konstantin Belousov <kib@FreeBSD.org>	Style. Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 09c69701	30-Aug-2016	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Back out misfired extra file in r305108.
# c9a124dc	30-Aug-2016	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Refix operation on sparse CPU mappings as in r302372, temporarily broken by r304716. PR: kern/210106 MFC after: 2 days
# 1005d8af	20-Jul-2016	Conrad Meyer <cem@FreeBSD.org>	imgact_elf: Rename the segment iterator to match reality The each_writable_segment routine evaluates segments on a slightly little more nuanced metric than simply "writable" or not. Rename the function to more closely match its behavior (each_dumpable_segment). Suggested by: jhb Sponsored by: EMC / Isilon Storage Division
# f3325003	20-Jul-2016	Conrad Meyer <cem@FreeBSD.org>	ANSI-fy imgact_elf.c Sponsored by: EMC / Isilon Storage Division
# 07f825e8	20-Jul-2016	Conrad Meyer <cem@FreeBSD.org>	Fix DEBUG build on 64-bit arch after r303099 Reported by: Larry Rosenman <ler at lerctr.org>
# c17b0bd2	20-Jul-2016	Conrad Meyer <cem@FreeBSD.org>	Extend ELF coredump to support more than 65535 segments The ELF e_phnum field is only 16 bits wide. To support more than 65535 segments (program headers), Sun's "Linker and Libraries Guide" table 7-7 (or 12-7, depending on document version) prescribes a special first section header where sh_info represents the real number of program headers. Test code to follow, when it is ready. Reference: http://docs.oracle.com/cd/E18752_01/pdf/817-1984.pdf Reviewed by: emaste, markj Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7255
# ccb83afd	18-Jul-2016	John Baldwin <jhb@FreeBSD.org>	Include process IDs in core dumps. When threads were added to the kernel, the pr_pid member of the NT_PRSTATUS note was repurposed to store LWP IDs instead of process IDs. However, the process ID was no longer recorded in core dumps. This change adds a pr_pid field to prpsinfo (NT_PRSINFO). Rather than bumping the prpsinfo version number, note parsers can use the note's payload size to determine if pr_pid is present. Reviewed by: kib, emaste (older version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D7117
# c77547d2	14-Jul-2016	John Baldwin <jhb@FreeBSD.org>	Include command line arguments in core dump process info. Fill in pr_psargs in the NT_PRSINFO ELF core dump note with command line arguments. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D7116
# 1cbb879d	05-Jul-2016	Ed Maste <emaste@FreeBSD.org>	add description for debug.elf{32,64}_legacy_coredump sysctl Approved by: re (kib) MFC after: 1 week Sponsored by: The FreeBSD Foundation
# a66dc0c5	25-May-2016	Ian Lepore <ian@FreeBSD.org>	Include machine/acle-compat.h in cdefs.h on arm if the compiler doesn't have ACLE support built in. The ACLE (ARM C Language Extensions) defines a set of standardized symbols which indicate the architecture version and features available. ACLE support is built in to modern compilers (both clang and gcc), but absent from gcc prior to 4.4. ARM (the company) provides the acle-compat.h header file to define the right symbols for older versions of gcc. Basically, acle-compat.h does for arm about the same thing cdefs.h does for freebsd: defines standardized macros that work no matter which compiler you use. If ARM hadn't provided this file we would have ended up with a big #ifdef __arm__ section in cdefs.h with our own compatibility shims. Remove #include <machine/acle-compat.h> from the zillion other places (an ever-growing list) that it appears. Since style(9) requires sys/types.h or sys/param.h early in the include list, and both of those lead to including cdefs.h, only a couple special cases still need to include acle-compat.h directly. Loves it: imp
# d9c9c81c	21-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.
# 35030a5d	29-Mar-2016	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove some NULL checks for M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation
# af582aae	04-Feb-2016	Konstantin Belousov <kib@FreeBSD.org>	When matching brand to the ELF binary by notes, try to find a brand with interpreter name exactly matching one wanted by the binary. If no such brand exists, return first brand which accepted the binary by note. The change fixes a regression after r292749, where e.g. our two ia32 compat brands, ia32_brand_info and ia32_brand_oinfo, only differ by the interpeter path and binary matches to a brand by linkage order. Then old binaries which require /usr/libexec/ld-elf.so.1 but matched against ia32_brand_info with interp_path /libexec/ld-elf.so.1, were considered requiring non-standard interpreter name, and magic to force ld-elf32.so.1 did not happen. Note that it might make sense to apply the same selection of brands for other matching criteria, SCO EI_OSABI and 3.x string. Reported and tested by: dwmalone Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 18995077	26-Dec-2015	Konstantin Belousov <kib@FreeBSD.org>	Do not substitute interpeter if the brand interpreter path is different from the interpreter path requested by the binary. Before this change, it is impossible to activate non-default interpreter for 32bit image on amd64, when /libexec/ld-elf32.so.1 file exists. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# d3ee0a15	23-Dec-2015	Jonathan T. Looney <jtl@FreeBSD.org>	Only allow one PT_INTERP ELF program header. This also fixes a potential memory leak for interp_buf. Differential Revision: https://reviews.freebsd.org/D4692 Reviewed by: kib MFC after: 2 weeks Sponsored by: Juniper Networks
# d943fa35	22-Dec-2015	Konstantin Belousov <kib@FreeBSD.org>	If we annoy user with the terminal output due to failed load of interpreter, also show the actual error code instead of some interpretation. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 4c22b468	07-Dec-2015	Ed Maste <emaste@FreeBSD.org>	Replace magic value ELF note type with NT_FREEBSD_ABI_TAG As of r291909 elf_common.h provides a definition. Suggested by: kib Sponsored by: The FreeBSD Foundation
# 4d22d07a	06-Dec-2015	Konstantin Belousov <kib@FreeBSD.org>	Add support for usermode (vdso-like) gettimeofday(2) and clock_gettime(2) on ARMv7 and ARMv8 systems which have architectural generic timer hardware. It is similar how the RDTSC timer is used in userspace on x86. Fix a permission problem where generic timer access from EL0 (or userspace on v7) was not properly initialized on APs. For ARMv7, mark the stack non-executable. The shared page is added for all arms (including ARMv8 64bit), and the signal trampoline code is moved to the page. Reviewed by: andrew Discussed with: emaste, mmel Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4209
# f19d421a	01-Dec-2015	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Missed header_supported call from r291020: make really, really sure the brand likes the executable.
# 686d2f31	18-Nov-2015	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Extend r270123 to run the brand info's header_supported() routine for branded as well as unbranded binaries. This will be required to add support for the new ELFv2 ABI on powerpc64, which is distinguished from ELFv1 by the contents of the ELF header's flags field. Reviewed by: imp MFC after: 2 weeks
# 9a12e282	01-Nov-2015	Enji Cooper <ngie@FreeBSD.org>	Define `compress` in `__elfN(coredump)` when #ifdef GZIO is true to mute an -Wunused-but-set-variable warning Reported by: FreeBSD_HEAD_amd64_gcc4.9 jenkins job Sponsored by: EMC / Isilon Storage Division
# 6c775eb6	14-Oct-2015	Konstantin Belousov <kib@FreeBSD.org>	Allow PT_INTERP and PT_NOTES segments to be located anywhere in the executable image. Keep one page (arbitrary) limit on the max allowed size of the PT_NOTES. The ELF image activators still require that program headers of the executable are fully contained in the first page of the image file. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D3871
# e6b95927	06-Oct-2015	Conrad Meyer <cem@FreeBSD.org>	Fix core corruption caused by race in note_procstat_vmmap This fix is spiritually similar to r287442 and was discovered thanks to the KASSERT added in that revision. NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to the length of filenames corresponding to vnodes in the process' vm map via vn_fullpath. As vnodes may move during coredump, this is racy. We do not remove the race, only prevent it from causing coredump corruption. - Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption and truncation, even if names change, at the cost of up to PATH_MAX bytes per mapped object. The new sysctl is documented in core.5. - Fix note_procstat_vmmap to self-limit in the second pass. This addresses corruption, at the cost of sometimes producing a truncated result. - Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste) to grok the new zero padding. Reported by: pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3824
# bcb60d52	07-Sep-2015	Conrad Meyer <cem@FreeBSD.org>	Follow-up to r287442: Move sysctl to compiled-once file Avoid duplicate sysctl nodes. Found by: tijl Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3586
# 14bdbaf2	03-Sep-2015	Conrad Meyer <cem@FreeBSD.org>	Detect badly behaved coredump note helpers Coredump notes depend on being able to invoke dump routines twice; once in a dry-run mode to get the size of the note, and another to actually emit the note to the corefile. When a note helper emits a different length section the second time around than the length it requested the first time, the kernel produces a corrupt coredump. NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to the length of filenames corresponding to vnodes in the process' fd table via vn_fullpath. As vnodes may move around during dump, this is racy. So: - Detect badly behaved notes in putnote() and pad underfilled notes. - Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to exercise the NT_PROCSTAT_FILES corruption. It simply picks random lengths to expand or truncate paths to in fo_fill_kinfo_vnode(). - Add a sysctl, kern.coredump_pack_fileinfo, to allow users to disable kinfo packing for PROCSTAT_FILES notes. This should avoid both FILES note corruption and truncation, even if filenames change, at the cost of about 1 kiB in padding bloat per open fd. Document the new sysctl in core.5. - Fix note_procstat_files to self-limit in the 2nd pass. Since sometimes this will result in a short write, pad up to our advertised size. This addresses note corruption, at the risk of sometimes truncating the last several fd info entries. - Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the zero padding. With suggestions from: bjk, jhb, kib, wblock Approved by: markj (mentor) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3548
# 02d131ad	14-Jul-2015	Mark Johnston <markj@FreeBSD.org>	Fix some error-handling bugs when core dump compression is enabled: - Ensure that core dump parameters are initialized in the error path. - Don't call gzio_fini() on a NULL stream. Reported by: rpaulo
# f6f6d240	10-Jun-2015	Mateusz Guzik <mjg@FreeBSD.org>	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
# 6b16d664	08-Jun-2015	Ed Maste <emaste@FreeBSD.org>	Add user facing errors for exceeding process memory limits Previously the process terminating with SIGABRT at startup was the only notification. PR: 200617 Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D2731
# ee960398	22-May-2015	Warner Losh <imp@FreeBSD.org>	Fix typo in symbol name. It helps to hit save in all your buffers before committing.
# d36eec69	22-May-2015	Warner Losh <imp@FreeBSD.org>	Export the eflags field from the elf header. This allows better discrimination between different subarch binaries, at least for mips and arm. Arm is implemented, mips is still tbd, so not currently exported. aarch64 does not export this because aarch64 binaries use different tags and flags than arm. Differential Revision: https://reviews.freebsd.org/D2611
# 4b5c9cf6	29-Apr-2015	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add kern.racct.enable tunable and RACCT_DISABLED config option. The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). Differential Revision: https://reviews.freebsd.org/D2369 Reviewed by: kib@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation
# 316b3843	15-Apr-2015	Konstantin Belousov <kib@FreeBSD.org>	Implement support for binary to requesting specific stack size for the initial thread. It is read by the ELF image activator as the virtual size of the PT_GNU_STACK program header entry, and can be specified by the linker option -z stack-size in newer binutils. The soft RLIMIT_STACK is auto-increased if possible, to satisfy the binary' request. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# aa14e9b7	08-Mar-2015	Mark Johnston <markj@FreeBSD.org>	Reimplement support for userland core dump compression using a new interface in kern_gzio.c. The old gzio interface was somewhat inflexible and has not worked properly since r272535: currently, the gzio functions are called with a range lock held on the output vnode, but kern_gzio.c does not pass the IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr() attempts to reacquire the range lock. Moreover, the new gzio interface can be used to implement kernel core compression. This change also modifies the kernel configuration options needed to enable userland core dump compression support: gzio is now an option rather than a device, and the COMPRESS_USER_CORES option is removed. Core dump compression is enabled using the kern.compress_user_cores sysctl/tunable. Differential Revision: https://reviews.freebsd.org/D1832 Reviewed by: rpaulo Discussed with: kib
# b96bd95b	27-Feb-2015	Ian Lepore <ian@FreeBSD.org>	Allow the kern.osrelease and kern.osreldate sysctl values to be set in a jail's creation parameters. This allows the kernel version to be reliably spoofed within the jail whether examined directly with sysctl or indirectly with the uname -r and -K options. The values can only be set at jail creation time, to eliminate the need for any locking when accessing the values via sysctl. The overridden values are inherited by nested jails (unless the config for the nested jails also overrides the values). There is no sanity or range checking, other than disallowing an empty release string or a zero release date, by design. The system administrator is trusted to set sane values. Setting values that are newer than the actual running kernel will likely cause compatibility problems. Differential Revision: https://reviews.freebsd.org/D1948 Relnotes: yes
# bc411bc2	14-Feb-2015	John Baldwin <jhb@FreeBSD.org>	Include OBJT_PHYS VM objects in ELF core dumps. In particular this includes the shared page allowing debuggers to use the signal trampoline code to identify signal frames in core dumps. Differential Revision: https://reviews.freebsd.org/D1828 Reviewed by: alc, kib MFC after: 1 week
# 64779280	22-Nov-2014	Konstantin Belousov <kib@FreeBSD.org>	The size value should be asserted when it is known. Reported and tested by: pho Sponsored by: The FreeBSD Foundation
# 180e57e5	21-Nov-2014	John Baldwin <jhb@FreeBSD.org>	Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). Differential Revision: https://reviews.freebsd.org/D1193 Reviewed by: kib MFC after: 2 weeks
# 539c9eef	04-Oct-2014	Konstantin Belousov <kib@FreeBSD.org>	Fixes for i/o during coredumping: - Do not dump into system files. - Do not acquire write reference to the mount point where img.core is written, in the coredump(). The vn_rdwr() calls from ELF imgact request the write ref from vn_rdwr(). Recursive acqusition of the write ref deadlocks with the unmount. - Instead, take the range lock for the whole core file. This prevents parallel dumping from two processes executing the same image, converting the useless interleaved dump into sequential dumping, with second core overwriting the first. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 6662ce5a	29-Aug-2014	Mateusz Guzik <mjg@FreeBSD.org>	Add missing proctree locking to fill_kinfo_proc consumers. This fixes r270444. Pointy hat: mjg Reported by: many MFC after: 1 week
# 817dc004	17-Aug-2014	Warner Losh <imp@FreeBSD.org>	Expand the elf brandelf infrastructure to give access to the whole ELF header (Elf_Ehdr) to determine if a particular interpretor wants to accept it or not. Use this mechanism to filter EABI arm on OABI arm kernels, and vice versa. This method could also be used to implement OABI on EABI arm kernels, if desired, or to allow a single mips kernel to run o32, n32 and n64 binaries. Differential Revision: https://reviews.freebsd.org/D609
# e7d939bd	06-Jul-2014	Marcel Moolenaar <marcel@FreeBSD.org>	Remove ia64. This includes: o All directories named ia64 o All files named ia64 o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan
# af3b2549	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Pull in r267961 and r267973 again. Fix for issues reported will follow.
# 37a107a4	27-Jun-2014	Glen Barber <gjb@FreeBSD.org>	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
# 3da1cf1e	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
# 2dedc128	16-Jun-2014	Dmitry Chagin <dchagin@FreeBSD.org>	Revert r266925 as it can lead to instant panic at fexecve(): To allow to run the interpreter itself add a new ELF branding type. Pointed out by: kib, mjg
# 5f56da18	31-May-2014	Dmitry Chagin <dchagin@FreeBSD.org>	To allow to run the interpreter itself add a new ELF branding type. Allow Linux ABI to run ELF interpreter. MFC after: 3 days
# 83a396ce	14-Apr-2014	Christian Brueffer <brueffer@FreeBSD.org>	Refine r264422: set buf to NULL only when we don't allocate memory, and free buf unconditionally. Requested by: kib MFC after: 1 week
# a1761d73	13-Apr-2014	Christian Brueffer <brueffer@FreeBSD.org>	Free buf after usage. CID: 1199377 Found with: Coverity Prevent(tm) MFC after: 1 week
# 4a144410	16-Mar-2014	Robert Watson <rwatson@FreeBSD.org>	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
# edb572a3	09-Sep-2013	John Baldwin <jhb@FreeBSD.org>	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
# be996836	05-Aug-2013	Attilio Rao <attilio@FreeBSD.org>	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib
# 3b6714ca	04-Aug-2013	Attilio Rao <attilio@FreeBSD.org>	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho
# 1b8388cd	01-May-2013	Mikolaj Golub <trociny@FreeBSD.org>	Introduce a constant, ELF_NOTE_ROUNDSIZE, which evidently declare our intention to use 4-byte padding for elf notes. MFC after: 3 weeks
# f1fca82e	16-Apr-2013	Mikolaj Golub <trociny@FreeBSD.org>	Add a new set of notes to a process core dump to store procstat data. The notes format is a header of sizeof(int), which stores the size of the corresponding data structure to provide some versioning, and data in the format as it is returned by a related sysctl call. The userland tools (procstat(1)) will be taught to extract this data, providing additional info for postmortem analysis. PR: kern/173723 Suggested by: jhb Discussed with: jhb, kib Reviewed by: jhb (initial version), kib MFC after: 1 month
# bd390213	14-Apr-2013	Mikolaj Golub <trociny@FreeBSD.org>	Re-factor coredump routines. For each type of notes an output function is provided, which is used either to calculate the note size or output it to sbuf. On the first pass the notes are registered in a list and the resulting size is found, on the second pass the list is traversed outputing notes to sbuf. For the sbuf a drain routine is provided that writes data to a core file. The main goal of the change is to make coredump to write notes directly to the core file, without preliminary preparing them all in a memory buffer. Storing notes in memory is not a problem for the current, rather small, set of notes we write to the core, but it may becomes an issue when we start to store procstat notes. Reviewed by: jhb (initial version), kib Discussed with: jhb, kib MFC after: 3 weeks
# bc403f03	08-Apr-2013	Attilio Rao <attilio@FreeBSD.org>	Switch some "low-hanging fruit" to acquire read lock on vmobjects rather than write locks. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
# fb5ea9d1	07-Apr-2013	Mikolaj Golub <trociny@FreeBSD.org>	Fill p_flags and p_align fields of the core dump note segement. Reviewed by: kib MFC after: 2 weeks
# 27b05648	07-Apr-2013	Mikolaj Golub <trociny@FreeBSD.org>	Use 4-byte padding for core dump notes on both 32 and 64bit archs. Although native word padding (i.e. 8-byte on 64bit arch) looks to be in agreement with standards, other parts of our code and other OSes use 4-byte alignment. This is not expected to change alignment for currently generated core dump notes, as the notes look to consist of structures with sizes multiple of 8 on 64-bit archs. But there are plans to add additional notes, where 4-byte vs 8-byte alignment makes difference. Discussed with: kib Reviewed by: kib MFC after: 2 weeks
# d19d5bf4	13-Mar-2013	Tijl Coosemans <tijl@FreeBSD.org>	- Fix two possible overflows when testing if ELF program headers are on the first page: 1. Cast uint16_t operands in a multiplication to unsigned int because otherwise the implicit promotion to int results in a signed multiplication that can overflow and the behaviour on integer overflow is undefined. 2. Replace (offset + size > PAGE_SIZE) with (size > PAGE_SIZE - offset) because the sum may overflow. - Use the same tests to see if the path to the interpreter is on the first page. There's no overflow here because size is already limited by MAXPATHLEN, but the compiler optimises the new tests better. Also fix an off-by-one error. - Simplify tests to see if an ELF note program header is on the first page. This also fixes an off-by-one error. Reviewed by: kib MFC after: 1 week
# 89f6b863	08-Mar-2013	Attilio Rao <attilio@FreeBSD.org>	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
# 2871baa4	10-Feb-2013	Konstantin Belousov <kib@FreeBSD.org>	Remove the ia64-specific code fragment, which effect is more cleanly done by the call to trans_prot() function a line before. Discussed with: Oliver Pinter <oliver.pntr@gmail.com> MFC after: 1 week
# 5050aa86	22-Oct-2012	Konstantin Belousov <kib@FreeBSD.org>	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 877d24ac	28-Sep-2012	Konstantin Belousov <kib@FreeBSD.org>	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
# d1ae5c83	19-Jul-2012	Konstantin Belousov <kib@FreeBSD.org>	Fix several reads beyond the mapped first page of the binary in the ELF parser. Specifically, do not allow note reader and interpreter path comparision in the brandelf code to read past end of the page. This may happen if specially crafter ELF image is activated. Submitted by: Lukasz Wojcik <lukasz.wojcik zoho com> MFC after: 3 days
# aea81038	22-Jun-2012	Konstantin Belousov <kib@FreeBSD.org>	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month
# 1a9c7dec	11-Mar-2012	Konstantin Belousov <kib@FreeBSD.org>	ELF image can have several PT_NOTE program headers. Look for the ELF brand note in each header, instead of using only first one. Reviewed by: kan Tested by: andrew (arm), flo (sparc64) MFC after: 3 weeks
# 62c625fd	30-Jan-2012	Konstantin Belousov <kib@FreeBSD.org>	Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported, and some variants of powerpc32. MFC after: 2 months (if ever)
# 1dfab802	17-Jan-2012	Alan Cox <alc@FreeBSD.org>	Explain why it is safe to unlock the vnode. Requested by: kib
# 292177e6	16-Jan-2012	Alan Cox <alc@FreeBSD.org>	Improve abstraction. Eliminate direct access by elf_load_section() to an OBJT_VNODE-specific field of the vm object. The same information can be just as easily obtained from the struct vattr that is in struct image_params if the latter is passed to elf_load_section(). Moreover, by replacing the vmspace and vm object parameters to elf*_load_section() with a struct image_params parameter, we actually reduce the size of the object code. In collaboration with: kib
# 9a14aa01	15-Jan-2012	Ulrich Spörlein <uqs@FreeBSD.org>	Convert files to UTF-8
# 126b36a2	14-Oct-2011	Konstantin Belousov <kib@FreeBSD.org>	Control the execution permission of the readable segments for i386 binaries on the amd64 and ia64 with the sysctl, instead of unconditionally enabling it. Reviewed by: marcel
# 676eda08	13-Oct-2011	Marcel Moolenaar <marcel@FreeBSD.org>	In elf32_trans_prot() and when compiling for amd64 or ia64, add PROT_EXECUTE when PROT_READ is needed. By default i386 allows execution when reading is allowed and JDK 1.4.x depends on that.
# afcc55f3	06-Jul-2011	Edward Tomasz Napierala <trasz@FreeBSD.org>	All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
# 12bc222e	30-Jun-2011	Jonathan Anderson <jonathan@FreeBSD.org>	Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c. Approved by: mentor(rwatson), re(bz)
# 1ba5ad42	05-Apr-2011	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
# 08b163fa	02-Feb-2011	Matthew D Fleming <mdf@FreeBSD.org>	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week
# 26d8f3e1	08-Jan-2011	Konstantin Belousov <kib@FreeBSD.org>	Use the same expression to report stack protection mode for AT_STACKEXEC as the expression used by exec_new_vmspace().
# 291c06a1	08-Jan-2011	Konstantin Belousov <kib@FreeBSD.org>	In elf image activator, read and apply the stack protection mode from PT_GNU_STACK program header, if present and enabled. Two new sysctls are provided, kern.elf32.nxstack and kern.elf64.nxstack, that allow to enable PT_GNU_STACK for ABIs of specified bitsize, if ABI decided to support shared page. Inform rtld about access mode of the stack initial mapping by AT_STACKPROT aux vector. At the moment, the default is disabled, waiting for the usermode support bits.
# ed167eaa	08-Jan-2011	Konstantin Belousov <kib@FreeBSD.org>	Collect code to translate between vm_prot_t and p_flags into helper functions. MFC after: 1 week
# 7f08176e	22-Nov-2010	Attilio Rao <attilio@FreeBSD.org>	Add the ability for GDB to printout the thread name along with other thread specific informations. In order to do that, and in order to avoid KBI breakage with existing infrastructure the following semantic is implemented: - For live programs, a new member to the PT_LWPINFO is added (pl_tdname) - For cores, a new ELF note is added (NT_THRMISC) that can be used for storing thread specific, miscellaneous, informations. Right now it is just popluated with a thread name. GDB, then, retrieves the correct informations from the corefile via the BFD interface, as it groks the ELF notes and create appropriate pseudo-sections. Sponsored by: Sandvine Incorporated Tested by: gianni Discussed with: dim, kan, kib MFC after: 2 weeks
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# ee235bef	17-Aug-2010	Konstantin Belousov <kib@FreeBSD.org>	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month
# fba6b1af	29-Apr-2010	Alfred Perlstein <alfred@FreeBSD.org>	Don't leak core_buf or gzfile if doing a compressed core file and we hit an error condition. Obtained from: Juniper Networks
# 4ccf64eb	06-Apr-2010	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	MFC r205014,205015: Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. This MFC is required for MFCs of later changes to the freebsd32 compatibility from HEAD. Requested by: kib
# a0ea661f	25-Mar-2010	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Add the ELF relocation base to struct image_params. This will be required to correctly relocate the executable entry point's function descriptor on powerpc64.
# 920acedb	25-Mar-2010	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Change the way text_addr and data_addr are computed to use the executable status of segments instead of detecting the main text segment by which segment contains the program entry point. This affects obreak() and is required for correct operation of that function on 64-bit PowerPC systems. The previous behavior was apparently required only for the Alpha, which is no longer supported. Reviewed by: jhb Tested on: amd64, sparc64, powerpc
# 841c0c7e	11-Mar-2010	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
# 8b325009	04-Mar-2010	Alfred Perlstein <alfred@FreeBSD.org>	put calls to gzclose() under ifdef COMPRESS_USER_CORES to prevent undefined symbols on kernels without this option. Reported by: Alexander Best
# e7228204	01-Mar-2010	Alfred Perlstein <alfred@FreeBSD.org>	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan
# 8cb7f89d	05-Dec-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	MFC r197726: Print a warning in case we cannot add more brandinfo because we would overflow the MAX_BRANDS sized array. Reviewed by: kib
# dc68cec6	20-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r197934: Map PIE binaries at non-zero base address. MFC r198202: Honour non-zero mapbase for PIE binaries. Inform interpreter-less PIE binary about its relocbase. Approved by: re (kensmith)
# 5b15472f	20-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r197932: Do not map elf segments of zero length. Approved by: re (kensmith)
# 7564c4ad	17-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	If ET_DYN binary has non-zero base address for some reason, honour it and do not relocate the binary to ET_DYN_LOAD_ADDR. This allows for the binary author to influence address map of the process. In particular, when the binary is actually an interpeter, this allows to have almost usual process address map. Communicate the relocation bias of the mapping for interpeter-less ET_DYN binary, that is interperter itself, in AT_BASE aux entry. This way, rtld is able to find its dynamic structure and relocate itself. Note that mapbase in the rtld is still wrong and requires further fixing. Reported and tested by: rwatson Discussed with: kan MFC after: 3 days
# ab02d85f	10-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	Map PIE binaries at non-zero base address. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time
# 5b33842a	10-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	Do not map segments of zero length. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time
# 925c8b5b	03-Oct-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Print a warning in case we cannot add more brandinfo because we would overflow the MAX_BRANDS sized array. Reviewed by: kib MFC After: 1 month
# 914e5afe	02-Sep-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	MFC r196653: Make sure FreeBSD binaries without .note.ABI-tag section work correctly and do not match a colliding Debian GNU/kFreeBSD brandinfo statements. For this mark the Debian GNU/kFreeBSD brandinfo that it must have an .note.ABI-tag section and ignore the old EI_OSABI brandinfo when comparing a possibly colliding set of options. Due to SYSINIT we add the brandinfo in a non-deterministic order, so native FreeBSD is not always first. We may want to consider to force native FreeBSD to come first as well. The only way a problem could currently be noticed is when running an i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD brandinfo was matched first, as the fallback to ld-elf32.so.1 does not exist in that case. Reported and tested by: ticso In collaboration with: kib MFC after: 3 days Approved by: re (rwatson)
# ecc2fda8	30-Aug-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Make sure FreeBSD binaries without .note.ABI-tag section work correctly and do not match a colliding Debian GNU/kFreeBSD brandinfo statements. For this mark the Debian GNU/kFreeBSD brandinfo that it must have an .note.ABI-tag section and ignore the old EI_OSABI brandinfo when comparing a possibly colliding set of options. Due to SYSINIT we add the brandinfo in a non-deterministic order, so native FreeBSD is not always first. We may want to consider to force native FreeBSD to come first as well. The only way a problem could currently be noticed is when running an i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD brandinfo was matched first, as the fallback to ld-elf32.so.1 does not exist in that case. Reported and tested by: ticso In collaboration with: kib MFC after: 3 days
# ac63e409	27-Aug-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	MFC r196512: Fix handling of .note.ABI-tag section for GNU systems [1]. Handle GNU/Linux according to LSB Core Specification 4.0, Chapter 11. Object Format, 11.8. ABI note tag. Also check the first word of desc, not only name, according to glibc abi-tags specification to distinguish between Linux and kFreeBSD. Add explicit handling for Debian GNU/kFreeBSD, which runs on our kernels as well [2]. In {amd64,i386}/trap.c, when checking osrel of the current process, also check the ABI to not change the signal behaviour for Linux binary processes, now that we save an osrel version for all three from the lists above in struct proc [2]. These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD and Linux binaries on the same machine again for at least i386 and amd64, and no longer break kFreeBSD which was detected as GNU(/Linux). PR: kern/135468 Submitted by: dchagin [1] (initial patch) Suggested by: kib [2] Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD Reviewed by: kib Approved by: re (kensmith)
# 89ffc202	24-Aug-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Fix handling of .note.ABI-tag section for GNU systems [1]. Handle GNU/Linux according to LSB Core Specification 4.0, Chapter 11. Object Format, 11.8. ABI note tag. Also check the first word of desc, not only name, according to glibc abi-tags specification to distinguish between Linux and kFreeBSD. Add explicit handling for Debian GNU/kFreeBSD, which runs on our kernels as well [2]. In {amd64,i386}/trap.c, when checking osrel of the current process, also check the ABI to not change the signal behaviour for Linux binary processes, now that we save an osrel version for all three from the lists above in struct proc [2]. These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD and Linux binaries on the same machine again for at least i386 and amd64, and no longer break kFreeBSD which was detected as GNU(/Linux). PR: kern/135468 Submitted by: dchagin [1] (initial patch) Suggested by: kib [2] Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD Reviewed by: kib MFC after: 3 days
# cd899aad	05-Apr-2009	Dmitry Chagin <dchagin@FreeBSD.org>	Fix KBI breakage by r190520 which affects older linux.ko binaries: 1) Move the new field (brand_note) to the end of the Brandinfo structure. 2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer is valid. 3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old modules won't have the flag set, so the new field brand_note would be ignored. Suggested by: jhb Reviewed by: jhb Approved by: kib (mentor) MFC after: 6 days
# 267c52fc	22-Mar-2009	Konstantin Belousov <kib@FreeBSD.org>	Fix several issues with parsing the notes for ELF objects. Badly formed ELF note may cause the caclulated pointer to the next note to point both after the note region, that was checked in the code, but also to point before the region, that was not checked [1]. Remember the first note location in note0 and leap out if the note is not between note0 and note_end. In the similar way, badly formed note may cause infinite loop by pointing next note into the same or previous note. Guard against this by limiting amount of loop iterations by arbitrary choosen big number. For clarity, check the calculated note alignment in each iteration. Reported by: Chris Palmer <chris noncombatant org> [1] PR: kern/132886 Reviewed and tested by: dchagin MFC after: 3 days
# 3ff06357	16-Mar-2009	Konstantin Belousov <kib@FreeBSD.org>	Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and compat32 binaries. Tested by: pho Reviewed by: kan
# 429f5a58	17-Mar-2009	Konstantin Belousov <kib@FreeBSD.org>	Use the properly sized types for ELF object header and program headers. This fixes osrel fetching from the FreeBSD branding note for the 64bit platforms. Reported by: swell.k gmail com Reviewed by: dchagin Tested by: dchagin, swell.k gmail com
# 32c01de2	13-Mar-2009	Dmitry Chagin <dchagin@FreeBSD.org>	Implement new way of branding ELF binaries by looking to a ".note.ABI-tag" section. The search order of a brand is changed, now first of all the ".note.ABI-tag" is looked through. Move code which fetch osreldate for ELF binary to check_note() handler. PR: 118473 Approved by: kib (mentor)
# 95c807cf	24-Jan-2009	Robert Watson <rwatson@FreeBSD.org>	When a statically linked binary is executed (or at least, one without an interpreter definition in its program header), set the auxiliary ELF argument AT_BASE to 0 rather than to the address that we would have mapped the interpreter at if there had been one. The ELF ABI specifications appear to be ambiguous as to the desired behavior in this situation, as they define AT_BASE as the base address of the interpreter, but do not mention what to do if there is none. On Solaris, AT_BASE will be set to the base address of the static binary if there is no interpreter, and on Linux, AT_BASE is set to 0. We go with the Linux semantics as they are of more immediate utility and allow the early runtime environment to know that the kernel has not mapped an interpreter, but because AT_PHDR points at the ELF header for the running binary, it is still possible to retrieve all required mapping information when the process starts should it be required. Either approach would be preferable to our current behavior of passing a pointer to an unmapped region of user memory as AT_BASE. MFC after: 3 weeks
# a3ac8c94	17-Dec-2008	Peter Wemm <peter@FreeBSD.org>	Remove sysctl debug.elf_trace and the trace field in auxargs. They go nowhere. It used to be the equivalent of $LD_DEBUG in rtld-elf. Elf_Auxargs is an internal structure.
# 35c2a5a8	17-Dec-2008	Warner Losh <imp@FreeBSD.org>	Minor style(9) nit.
# 6f347545	17-Dec-2008	Konstantin Belousov <kib@FreeBSD.org>	Remove two remnant uses of AT_DEBUG.
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 387ad998	08-Oct-2008	Konstantin Belousov <kib@FreeBSD.org>	If the ABI-overriden interpreter was not loaded, do not set have_interp to TRUE. This allows the code in image activator to try /libexec/ld-elf.so.1 as interpreter when newinterp is not found to execute. Reviewed by: peter MFC after: 2 weeks (together with r175105)
# ccd3953e	14-May-2008	John Baldwin <jhb@FreeBSD.org>	Go back to using the process command name (p_comm) for the file name and command line arguments stored in the note at the beginning of a core dump instead of the current thread name. Reviewed by: julian
# 6617724c	12-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 22db15c0	13-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# cb05b60a	09-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# 4113f8d7	05-Jan-2008	Peter Wemm <peter@FreeBSD.org>	Fall back to the binary-specified interpreter (ld-elf.so.1) if the ABI override binary isn't found. This could probably be smoother, but it is what I did in p4 change #126891 on 2007/09/27. It should solve the "ld-elf32.so.1"-in-chroot problem.
# f231de47	03-Dec-2007	Konstantin Belousov <kib@FreeBSD.org>	Implement fetching of the __FreeBSD_version from the ELF ABI-tag note. The value is read into the p_osrel member of the struct proc. p_osrel is set to 0 for the binaries without the note. MFC after: 3 days
# 93d1c728	03-Dec-2007	Konstantin Belousov <kib@FreeBSD.org>	Check for the program headers alignment of the ELF images before dereferencing. Unaligned access could cause panic on strict alignment architectures. Reviewed by: marcel, marius (also tested on sparc64, thanks !) MFC after: 3 days
# e01eafef	13-Nov-2007	Julian Elischer <julian@FreeBSD.org>	A bunch more files that should probably print out a thread name instead of a process name.
# 89b57fcf	05-Nov-2007	Konstantin Belousov <kib@FreeBSD.org>	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
# 19059a13	14-May-2007	John Baldwin <jhb@FreeBSD.org>	Rework the support for ABIs to override resource limits (used by 32-bit processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week
# 4f506694	17-Jan-2007	Xin LI <delphij@FreeBSD.org>	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.
# 976a87a2	19-Nov-2006	Alan Cox <alc@FreeBSD.org>	Add vm map and object locking to each_writable_segment(). Noticed by: jhb@ MFC after: 3 weeks
# e5e6093b	21-Jan-2006	Alan Cox <alc@FreeBSD.org>	Avoid a vm object reference leak in a rarely used code path. An executable contains at most one PT_INTERP program header. Therefore, the loop that searches for it can terminate after it is found rather than iterating over the entire set of program headers. Eliminate an unneeded initialization. Reviewed by: tegge
# d49b2109	26-Dec-2005	Maxim Sobolev <sobomax@FreeBSD.org>	Fix breakage introduced in the previous commit.
# 900b28f9	26-Dec-2005	Maxim Sobolev <sobomax@FreeBSD.org>	Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually allow executing elf dynamic binaries (aka shared libraries). When it is requested to execute ET_DYN elf image check if this flag is on after we know the elf brand allowing execution if so. PR: kern/87615 Submitted by: Marcin Koziej <creep@desk.pl>
# 60bb3943	23-Dec-2005	Alan Cox <alc@FreeBSD.org>	Maintain the lock on the vnode for most of exec_elfN_imgact(). Specifically, it is required for the I/O that may be performed by elfN_load_section(). Avoid an obscure deadlock in the a.out, elf, and gzip image activators. Add a comment describing why the deadlock does not occur in the common case and how it might occur in less usual circumstances. Eliminate an unused variable from exec_aout_imgact(). In collaboration with: tegge
# 373d1a3f	21-Dec-2005	Alan Cox <alc@FreeBSD.org>	Maintain the vnode lock throughout elfN_load_file() rather than releasing it and reacquiring it in vrele(). Consequently, there is no reason to increase the reference count on the vm object caching the file's pages. Reviewed by: tegge Eliminate unused parameters to elfN_load_file().
# ff6f03c7	20-Dec-2005	Alan Cox <alc@FreeBSD.org>	Eliminate an unneeded (vm_prot_t) parameter from two functions. Eliminate unnecessary uses of a local variable. Reviewed by: tegge
# 044bbbb5	17-Dec-2005	Alan Cox <alc@FreeBSD.org>	Correct a long-standing problem in elfN_map_insert(): In order to copy a page to user space, the user space mapping must allow write access. In collaboration with: tegge@ MFC after: 3 weeks
# 584716b0	16-Dec-2005	Alan Cox <alc@FreeBSD.org>	Style: The second argument to vm_map_find() should be NULL instead of 0.
# da61b9a6	16-Dec-2005	Alan Cox <alc@FreeBSD.org>	Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors. Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks
# 481a1fe1	14-Nov-2005	Olivier Houchard <cognet@FreeBSD.org>	Add a new sysctl, kern.elf[32\|64].can_exec_dyn. When set to 1, one can execute a ET_DYN binary (shared object). This does not make much sense, but some linux scripts expect to be able to execute /lib/ld-linux.so.2 (ldd comes to mind). The sysctl defaults to 0. MFC after: 3 days
# 5f419982	28-Sep-2005	Robert Watson <rwatson@FreeBSD.org>	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde
# 84d2b7df	19-Sep-2005	Robert Watson <rwatson@FreeBSD.org>	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week
# 68ff2a43	15-Sep-2005	Christian S.J. Peron <csjp@FreeBSD.org>	Improve the MP safeness associated with the creation of symbolic links and the execution of ELF binaries. Two problems were found: 1) The link path wasn't tagged as being MP safe and thus was not properly protected. 2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was insufficiently protected. This commit makes the following changes: -Sets the MPSAFE flag in NDINIT for symbolic link paths -Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been picked up. -Drop in an assertion into vfs_lookup which ensures that if the MPSAFE flag is NOT set, that we have picked up giant. If not panic (if WITNESS compiled into the kernel). This should help us find conditions where vnode operations are in-sufficiently protected. This is a RELENG_6 candidate. Discussed with: jeff MFC after: 4 days
# 62919d78	30-Jun-2005	Peter Wemm <peter@FreeBSD.org>	Jumbo-commit to enhance 32 bit application support on 64 bit kernels. This is good enough to be able to run a RELENG_4 gdb binary against a RELENG_4 application, along with various other tools (eg: 4.x gcore). We use this at work. ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace, procfs and core dumps. procfs_regs.c: vary the format of proc/XXX/regs depending on the client and target application. procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their sscanf fails. They expect an unsigned long. imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps. sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note that 64 bit consumers can still debug 32 bit targets. IA64 has got stubs for ia32_reg.c. Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't implemented in the 32/64 wrapper yet. We also make a tiny patch to gdb pacify it over conflicting formats of ld-elf.so.1. Approved by: re
# 77f30fff	24-May-2005	Olivier Houchard <cognet@FreeBSD.org>	Don't set the default of kern.fallback_elf_brand to FreeBSD for arm, as binutils now do the job for us
# e11a45c9	03-May-2005	Jeff Roberson <jeff@FreeBSD.org>	- Neither of our image formats require Giant now that the vm and vfs have been locked.
# 9f65fb13	03-Apr-2005	Alan Cox <alc@FreeBSD.org>	Remove GIANT_REQUIRED from elfN_load_section().
# 610ecfe0	29-Jan-2005	Maxim Sobolev <sobomax@FreeBSD.org>	o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
# 8516dd18	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Don't use VOP_GETVOBJECT, use vp->v_object directly.
# e0370a18	23-Sep-2004	Olivier Houchard <cognet@FreeBSD.org>	On arm, set the default elf brand to FreeBSD, until the binutils do it for us.
# 4da47b2f	10-Aug-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Add __elfN(dump_thread). This function is called from __elfN(coredump) to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes. Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc
# cfaf7e60	08-Aug-2004	Doug Rabson <dfr@FreeBSD.org>	Make sure that AT_PHDR has a useful value even for static programs.
# 1f7a1baa	18-Jul-2004	Marcel Moolenaar <marcel@FreeBSD.org>	After maintaining previous behaviour in writing out the core notes, it's time now to break with the past: do not write the PID in the first note. Rationale: 1. [impact of the breakage] Process IDs in core files serve no immediate purpose to the debugger itself. They are only useful to relate a core file to a process. This can provide context to the person looking at the core file, provided one keeps track of this. Overall, not having the PID in the core file is only in very rare occasions unfortunate. 2. [reason of the breakage] Having one PRSTATUS note contain the PID, while all others contain the LWPID of the corresponding kernel thread creates an irregularity for the debugger that cannot easily be worked around. This is caused by libthread_db correlating user thread IDs to kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs. Update comments accordingly.
# 247aba24	26-Jun-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Allocate TIDs in thread_init() and deallocate them in thread_fini(). The overhead of unconditionally allocating TIDs (and likewise, unconditionally deallocating them), is amortized across multiple thread creations by the way UMA makes it possible to have type-stable storage. Previously the cost was kept down by having threads created as part of a fork operation use the process' PID as the TID. While this had some nice properties, it also introduced complexity in the way TIDs were allocated. Most importantly, by using the type-stable storage that UMA gives us this was also unnecessary. This change affects how core dumps are created and in particular how the PRSTATUS notes are dumped. Since we don't have a thread with a TID equalling the PID, we now need a different way to preserve the old and previous behavior. We do this by having the given thread (i.e. the thread passed to the core dump code in td) dump it's state first and fill in pr_pid with the actual PID. All other threads will have pr_pid contain their TIDs. The upshot of all this is that the debugger will now likely select the right LWP (=TID) as the initial thread. Credits to: julian@ for spotting how we can utilize UMA. Thanks to: all who provided julian@ with test results.
# f99619a0	04-Jun-2004	Tim J. Robbins <tjr@FreeBSD.org>	Change the types of vn_rdwr_inchunks()'s len and aresid arguments to size_t and size_t *, respectively. Update callers for the new interface. This is a better fix for overflows that occurred when dumping segments larger than 2GB to core files.
# 2b471bc6	04-Jun-2004	Tim J. Robbins <tjr@FreeBSD.org>	Back out workaround for vn_rdwr_inchunks()'s INT_MAX length limitation after discussions with bde; vn_rdwr_inchunks() itself should be fixed.
# 16e6d162	04-Jun-2004	Tim J. Robbins <tjr@FreeBSD.org>	Write segments to core dump files in maximally-sized chunks that neither exceed vn_rdwr_inchunks()'s INT_MAX length limitation nor span a block boundary. This fixes dumping segments larger than 2GB. PR: 67546
# 59c8bc40	22-Apr-2004	Alan Cox <alc@FreeBSD.org>	Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
# ece267ba	08-Apr-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Do not assume that the initial thread (i.e. the thread with the ID equal to the process ID) is still present when we dump a core. It already may have been destroyed. In that case we would end up dereferencing a NULL pointer, so specifically test for that as well. Reported & tested by: Dan Nelson <dnelson@allantgroup.com>
# 8c9b7b2c	03-Apr-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Create NT_PRSTATUS and NT_FPREGSET notes for each and every thread in the process. This is required for proper debugging of corefiles created by 1:1 or M:N threaded processes. Add an XXX comment where we should actually call a function that dumps MD specific notes. An example of a MD specific note is the NT_PRXFPREG note for SSE registers. Since BFD creates non-annotated pseudo-sections for the first PRSTATUS and FPREGSET notes (non-annotated in the sense that the name of the section does not contain the pid/tid), make sure those sections describe the initial thread of the process (i.e. the thread which tid equals the pid). This is not strictly necessary, but makes sure that tools that use the non-annotated section names will not change behaviour due to this change. The practical upshot of this all is that one can see the threads in the debugger when looking at a corefile. For 1:1 threading this means that all threads are visible.
# 3dc19c46	18-Mar-2004	Jacques Vidrine <nectar@FreeBSD.org>	Verify more bits of the ELF header: the program header table entry size and the ELF version. Also, avoid a potential integer overflow when determining whether the ELF header fits entirely within the first page. Reviewed by: jdp A panic when attempting to execute an ELF binary with a bogus program header table entry size was Reported by: Christer Öberg <christer.oberg@texonet.com>
# 91d5354a	04-Feb-2004	John Baldwin <jhb@FreeBSD.org>	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
# 9b68618d	22-Dec-2003	Peter Wemm <peter@FreeBSD.org>	Add an additional field to the elf brandinfo structure to support quicker exec-time replacement of the elf interpreter on an emulation environment where an entire /compat/* tree isn't really warranted.
# c460ac3a	24-Sep-2003	Peter Wemm <peter@FreeBSD.org>	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# a063facb	31-May-2003	Marcel Moolenaar <marcel@FreeBSD.org>	Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage on the stack to be changed in a way incompatible with elf32_map_insert() where we used data_buf without initializing it for when the partial mapping resulting in a misaligned image (typical when the page size implied by the image is not the same as the page size in use by the kernel). Since data_buf is passed by reference to vm_map_find(), the compiler cannot warn about it. While here, move all local variables to the top of the function.
# a163d034	18-Feb-2003	Warner Losh <imp@FreeBSD.org>	Back out M_* changes, per decision of the TRB. Approved by: trb
# 44956c98	21-Jan-2003	Alfred Perlstein <alfred@FreeBSD.org>	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# e548a1d4	04-Jan-2003	Jake Burkholder <jake@FreeBSD.org>	- Provide backwards compatibility for kern.fallback_elf_brand. - Use the generic elf type macros in imgact_elf.h instead of ifdefing the entire contents of the header.
# a360a43d	04-Jan-2003	Jake Burkholder <jake@FreeBSD.org>	Improve the way that an elf image activator for an alternate word size is included in the kernel. Include imgact_elf.c in conf/files, instead of both imgact_elf32.c and imgact_elf64.c, which will use the default word size for an architecture as defined in machine/elf.h. Architectures that wish to build an additional image activator for an alternate word size can include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which allows it to be dependent on MD options instead of solely on architecture. Glanced at by: peter
# 551d79e1	20-Dec-2002	Marcel Moolenaar <marcel@FreeBSD.org>	Fix multiple registration of the elf_legacy_coredump sysctl variable. The duplication is caused by the fact that imgact_elf.c is included by both imgact_elf32.c and imgact_elf64.c and both are compiled by default on ia64. Consequently, we have two seperate copies of the elf_legacy_coredump variable due to them being declared static, and two entries for the same sysctl in the linker set, both referencing the unique copy of the elf_legacy_coredump variable. Since the second sysctl cannot be registered, one of the elf_legacy_coredump variables can not be tuned (if ordering still holds, it's the ELF64 related one). The only solution is to create two different sysctl variables, just like the elf<32\|64>_trace sysctl variables. This unfortunately is an (user) interface change, but unavoidable. Thus, on ELF32 platforms the sysctl variable is called elf32_legacy_coredump and on ELF64 platforms it is called elf64_legacy_coredump. Platforms that have both ELF formats have both sysctl variables. These variables should probably be retired sooner rather than later.
# fa7dd9c5	16-Dec-2002	Matthew Dillon <dillon@FreeBSD.org>	Change the way ELF coredumps are handled. Instead of unconditionally skipping read-only pages, which can result in valuable non-text-related data not getting dumped, the ELF loader and the dynamic loader now mark read-only text pages NOCORE and the coredump code only checks (primarily) for complete inaccessibility of the page or NOCORE being set. Certain applications which map large amounts of read-only data will produce much larger cores. A new sysctl has been added, debug.elf_legacy_coredump, which will revert to the old behavior. This commit represents collaborative work by all parties involved. The PR contains a program demonstrating the problem. PR: kern/45994 Submitted by: "Peter Edwards" <pmedwards@eircom.net>, Archie Cobbs <archie@dellroad.org> Reviewed by: jdp, dillon MFC after: 7 days
# 6d7bdc8d	08-Nov-2002	Robert Watson <rwatson@FreeBSD.org>	Assign value of NULL to imgp->execlabel when imgp is initialized in the ELF code. Missed in earlier merge from the MAC tree. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 450ffb44	04-Nov-2002	Robert Watson <rwatson@FreeBSD.org>	Remove reference to struct execve_args from struct imgact, which describes an image activation instance. Instead, make use of the existing fname structure entry, and introduce two new entries, userspace_argv, and userspace_envv. With the addition of mac_execve(), this divorces the image structure from the specifics of the execve() system call, removes a redundant pointer, etc. No semantic change from current behavior, but it means that the structure doesn't depend on syscalls.master-generated includes. There seems to be some redundant initialization of imgact entries, which I have maintained, but which could probably use some cleaning up at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 96725dd0	22-Oct-2002	Alexander Kabaev <kan@FreeBSD.org>	Handle binaries with arbitrary number PT_LOAD sections, not only ones with one text and one data section. The text and data rlimit checks still needs to be fixed to properly accout for additional sections. Reviewed by: peter (slightly different patch version)
# e80fb434	17-Oct-2002	Robert Drehmel <robert@FreeBSD.org>	Use strlcpy() instead of strncpy() to copy NUL terminated strings for safety and consistency.
# 05ba50f5	21-Sep-2002	Jake Burkholder <jake@FreeBSD.org>	Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
# d0ca7c29	07-Sep-2002	Peter Wemm <peter@FreeBSD.org>	Do not blow up when we walk off the end of the brands list. Found by: kris, jake
# 21c2d047	03-Sep-2002	Matthew Dillon <dillon@FreeBSD.org>	Alright, fix the problems with the elf loader for the Alpha. It turns out that there is no easy way to discern the difference between a text segment and a data segment through the read-only OR execute attribute in the elf segment header, so revert the algorithm to what it was before. Neither can we account for multiple data load segments in the vmspace structure (at least not without more work), due to assumptions obreak() makes in regards to the data start and data size fields. Retain RLIMIT_VMEM checking by using a local variable to track the total bytes of data being loaded. Reviewed by: peter X-MFC after: ASAP
# 9782ecba	03-Sep-2002	Peter Wemm <peter@FreeBSD.org>	Make the text segment locating heuristics from rev 1.121 more reliable so that it works on the Alpha. This defines the segment that the entry point exists in as 'text' and any others (usually one) as data. Submitted by: tmm Tested on: i386, alpha
# 05ef8798	02-Sep-2002	Matthew Dillon <dillon@FreeBSD.org>	Grammer cleanup
# 5fe3ed62	01-Sep-2002	Jake Burkholder <jake@FreeBSD.org>	Moved elf brand identification into a function. Fully identify the brand early in the process of loading an elf file, so that we can identify the sysentvec, and so that we do not continue if we do not have a brand (and thus a sysentvec). Use the values in the sysentvec for the page size and vm ranges unconditionally, since they are all filled in now.
# 8cf03452	01-Sep-2002	Jake Burkholder <jake@FreeBSD.org>	Fixed more indentation bugs.
# cac45152	30-Aug-2002	Matthew Dillon <dillon@FreeBSD.org>	Implement data, text, and vmem limit checking in the elf loader and svr4 compat code. Clean up accounting for multiple segments. Part 1/2. Submitted by: Andrey Alekseyev <uitm@zenon.net> (with some modifications) MFC after: 3 days
# 81f223ca	25-Aug-2002	Jake Burkholder <jake@FreeBSD.org>	Fixed most indentation bugs.
# ca0387ef	25-Aug-2002	Jake Burkholder <jake@FreeBSD.org>	Fixed placement of operators. Wrapped long lines.
# fd559a8a	24-Aug-2002	Jake Burkholder <jake@FreeBSD.org>	Fixed white space around operators, casts and reserved words. Reviewed by: md5
# a7cddfed	24-Aug-2002	Jake Burkholder <jake@FreeBSD.org>	return x; -> return (x); return(x); -> return (x); Reviewed by: md5
# 9ca43589	15-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 619eb6e5	13-Aug-2002	Jeff Roberson <jeff@FreeBSD.org>	- Hold the vnode lock throughout execve. - Set VV_TEXT in the top level execve code. - Fixup the image activators to deal with the newly locked vnode.
# e6e370a7	04-Aug-2002	Jeff Roberson <jeff@FreeBSD.org>	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
# 3ebc1248	19-Jul-2002	Peter Wemm <peter@FreeBSD.org>	Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable handler in the kernel at the same time. Also, allow for the exec_new_vmspace() code to build a different sized vmspace depending on the executable environment. This is a big help for execing i386 binaries on ia64. The ELF exec code grows the ability to map partial pages when there is a page size difference, eg: emulating 4K pages on 8K or 16K hardware pages. Flesh out the i386 emulation support for ia64. At this point, the only binary that I know of that fails is cvsup, because the cvsup runtime tries to execute code in pages not marked executable. Obtained from: dfr (mostly, many tweaks from me).
# 0b2ed1ae	06-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	Clean up execve locking: - Grab the vnode object early in exec when we still have the vnode lock. - Cache the object in the image_params. - Make use of the cached object in imgact_*.c
# 21dc7d4f	02-Jun-2002	Jens Schweikhardt <schweikh@FreeBSD.org>	Fix typo in the BSD copyright: s/withough/without/ Spotted and suggested by: des MFC after: 3 weeks
# 4d77a549	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# a854ed98	27-Feb-2002	John Baldwin <jhb@FreeBSD.org>	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
# bf43c504	16-Dec-2001	Mark Peek <mp@FreeBSD.org>	Remove whitespace at end of line.
# cbc89bfb	10-Oct-2001	Paul Saab <ps@FreeBSD.org>	Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable. Reviewed by: peter MFC after: 2 weeks
# 3418ebeb	26-Sep-2001	Matthew Dillon <dillon@FreeBSD.org>	Make uio_yield() a global. Call uio_yield() between chunks in vn_rdwr_inchunks(), allowing other processes to gain an exclusive lock on the vnode. Specifically: directory scanning, to avoid a race to the root directory, and multiple child processes coring simultaniously so they can figure out that some other core'ing child has an exclusive adv lock and just exit instead. This completely fixes performance problems when large programs core. You can have hundreds of copies (forked children) of the same binary core all at once and not notice. MFC after: 3 days
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 06ae1e91	08-Sep-2001	Matthew Dillon <dillon@FreeBSD.org>	This brings in a Yahoo coredump patch from Paul, with additional mods by me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that if you have large process images core dumping, or you have a large number of forked processes all core dumping at the same time, the original coredump code would leave the vnode locked throughout. This can cause the directory vnode to get locked up, which can cause the parent directory vnode to get locked up, and so on all the way to the root node, locking the entire machine up for extremely long periods of time. This patch solves the problem in two ways. First it uses an advisory non-blocking lock to abort multiple processes trying to core to the same file. Second (my contribution) it chunks up the writes and uses bwillwrite() to avoid holding the vnode locked while blocking in the buffer cache. Submitted by: ps Reviewed by: dillon MFC after: 2 weeks
# ef4181d9	01-Sep-2001	Peter Wemm <peter@FreeBSD.org>	For ia64, set the default elf brand to be FreeBSD. This is temporarily necessary only for as long as we're using a linux toolchain.
# 546a92c4	28-Aug-2001	Brian Somers <brian@FreeBSD.org>	OR M_WAITOK with M_ZERO in malloc()s args for clarity.
# 29b7fbd1	17-Aug-2001	Mark Peek <mp@FreeBSD.org>	Unbreak linux compatibility by providing the correct length of the buffer. Reported by: "Pierre Y. Dampure" <pierre.dampure@westmarsh.com>, "Niels Chr. Bank-Pedersen" <ncbp@bank-pedersen.dk> Pointy hat to: mp
# a75a0c55	16-Aug-2001	Peter Wemm <peter@FreeBSD.org>	Don't explicitly null-terminate. The buffer we are copying into is already zeroed, and we explicitly leave the last byte untouched. Submitted by: bde
# 911c2be0	16-Aug-2001	Mark Peek <mp@FreeBSD.org>	Reduce stack allocation (stack-fast?). elf_load_file() => 352 to 52 bytes exec_elf_imgact() => 1072 to 48 bytes elf_corehdr() => 396 to 8 bytes Reviewed by: julian
# 6eef6816	16-Aug-2001	Peter Wemm <peter@FreeBSD.org>	Use explicit sizes for the prpsinfo command length string so that we dont have any more unexpected changes in core dumps. This gets us back to the original core dump layout from a few days ago.
# 0cddd8f0	04-Jul-2001	Matthew Dillon <dillon@FreeBSD.org>	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
# 613c83cb	23-May-2001	John Baldwin <jhb@FreeBSD.org>	Lock the VM while twiddling the vmspace.
# 23955314	18-May-2001	Alfred Perlstein <alfred@FreeBSD.org>	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
# 1005a129	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Convert the allproc and proctree locks from lockmgr locks to sx locks.
# f34fa851	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
# 828c9e13	04-Mar-2001	David E. O'Brien <obrien@FreeBSD.org>	Do not set a default ELF syscall ABI fallback. If one runs an un-branded Linux static binary that calls Linux's fcntl the machine will reboot when interupted by the FreeBSD syscall ABI.
# 21a3ee0e	24-Feb-2001	David E. O'Brien <obrien@FreeBSD.org>	MFS: bring the consistent `compat_3_brand' support into -CURRENT (the work was first done in the RELENG_4 branch near a release during a MFC to make the code cleaner and more consistent)
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# ba88dfc7	26-Jan-2001	John Baldwin <jhb@FreeBSD.org>	Back out proc locking to protect p_ucred for obtaining additional references along with the actual obtaining of additional references.
# 611d9407	23-Jan-2001	John Baldwin <jhb@FreeBSD.org>	Proc locking.
# c0c25570	12-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
# 553629eb	22-Nov-2000	Jake Burkholder <jake@FreeBSD.org>	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
# 806d7daa	09-Nov-2000	Marcel Moolenaar <marcel@FreeBSD.org>	Make MINSIGSTKSZ machine dependent, and have the sigaltstack syscall compare against a variable sv_minsigstksz in struct sysentvec as to properly take the size of the machine- and ABI dependent struct sigframe into account. The SVR4 and iBCS2 modules continue to have a minsigstksz of 8192 to preserve behavior. The real values (if different) are not known at this time. Other ABI modules use the real values. The native MINSIGSTKSZ is now defined as follows: Arch MINSIGSTKSZ ---- ----------- alpha 4096 i386 2048 ia64 12288 Reviewed by: mjacob Suggested by: bde
# 00910f28	05-Nov-2000	David E. O'Brien <obrien@FreeBSD.org>	ELF kernels should use an ELF sysvec. This allows us to move a.out specific files to those platforms that acutally support a.out.
# 35e0e5b3	20-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# a18b1f1d	03-Oct-2000	Jason Evans <jasone@FreeBSD.org>	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
# 9ff5ce6b	12-Sep-2000	Boris Popov <bp@FreeBSD.org>	Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
# 36240ea5	10-Sep-2000	Doug Rabson <dfr@FreeBSD.org>	Move the include of <sys/systm.h> so that KTR gets a declaration for snprintf().
# 55af4c7d	23-Jul-2000	Brian Feldman <green@FreeBSD.org>	Using an atomic operation here won't help if nobody else uses them (for this). Use the simple_lock() on v_interlock like elsewhere.
# 25ead034	23-Jul-2000	Brian Feldman <green@FreeBSD.org>	Solve the problem where it is possible to get the kernel stuck in a loop down in pmap_init_pt(). A subtraction causes the number of pages to become negative, that was assigned to an unsigned variable, and there is a lot of iteration. The bug is due to the ELF image activator not properly checking for its files being the correct size as specified by the ELF header. The solution is to check that the header doesn't ask for part of a file when that part of the file doesn't exist. Make sure to set VEXEC at the proper times to make the executables immutable (remove race conditions). Also, the ELF format specifiies header entries that allow embedding of other executables (hence how ld-elf.so.1 gets loaded, but not the same as loading shared libraries), so those executables need to be set VEXEC, too, so they're immutable. Reviewed by: peter
# 2c9b67a8	30-Apr-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
# c815a20c	17-Apr-2000	David E. O'Brien <obrien@FreeBSD.org>	Change our ELF binary branding to something more acceptable to the Binutils maintainers. After we established our branding method of writing upto 8 characters of the OS name into the ELF header in the padding; the Binutils maintainers and/or SCO (as USL) decided that instead the ELF header should grow two new fields -- EI_OSABI and EI_ABIVERSION. Each of these are an 8-bit unsigned integer. SCO has assigned official values for the EI_OSABI field. In addition to this, the Binutils maintainers and NetBSD decided that a better ELF branding method was to include ABI information in a ".note" ELF section. With this set of changes, we will now create ELF binaries branded using both "official" methods. Due to the complexity of adding a section to a binary, binaries branded with ``brandelf'' will only brand using the EI_OSABI method. Also due to the complexity of pulling a section out of an ELF file vs. poking around in the ELF header, our image activator only looks at the EI_OSABI header field. Note that a new kernel can still properly load old binaries except for Linux static binaries branded in our old method. * * For a short period of time, ``ld'' will also brand ELF binaries * using our old method. This is so people can still use kernel.old * with a new world. This support will be removed before 5.0-RELEASE, * and may not last anywhere upto the actual release. My expiration * time for this is about 6mo. *
# 77ac690c	27-Feb-2000	Paul Saab <ps@FreeBSD.org>	Update a comment in elf_coredump to reflect that if you madvise with MADV_NOCORE, its address space is also excluded from a core file. Pointed out by: alc
# 9730a5da	27-Feb-2000	Paul Saab <ps@FreeBSD.org>	Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2). This This feature allows you to specify if mmap'd data is included in an application's corefile. Change the type of eflags in struct vm_map_entry from u_char to vm_eflags_t (an unsigned int). Reviewed by: dillon,jdp,alfred Approved by: jkh
# 654f6be1	27-Dec-1999	Bruce Evans <bde@FreeBSD.org>	Changed the type used to represent the user stack pointer from `long ' to `register_t '. This fixes bugs like misplacement of argc and argv on the user stack on i386's with 64-bit longs. We still use longs to represent "words" like argc and argv, and assume that they are on the stack (and that there is stack). The suword() and fuword() families should also use register_t.
# 762e6b85	15-Dec-1999	Eivind Eklund <eivind@FreeBSD.org>	Introduce NDFREE (and remove VOP_ABORTOP)
# da654d90	20-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	s/p_cred->pc_ucred/p_ucred/g
# a3021f91	19-Nov-1999	Boris Popov <bp@FreeBSD.org>	Vnode was left referenced in the case if ELF image is broken. Reviewed by: Peter Wemm <peter@netplex.com.au>
# 2e3c8fcb	16-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	This is a partial commit of the patch from PR 14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
# 923502ff	29-Oct-1999	Poul-Henning Kamp <phk@FreeBSD.org>	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.
# d1f088da	11-Oct-1999	Peter Wemm <peter@FreeBSD.org>	Trim unused options (or #ifdef for undoc options). Submitted by: phk
# fca666a1	31-Aug-1999	Julian Elischer <julian@FreeBSD.org>	General cleanup of core-dumping code. Submitted by: Sean Fagan,
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# d44e4156	26-Aug-1999	Dima Ruban <dima@FreeBSD.org>	Don't follow symlinks on coredumps. Reviewed by: dillon && security-officer
# bdbc8c26	09-Jul-1999	Peter Wemm <peter@FreeBSD.org>	Fix the previous warning a different way since the emul_path exposure was intentional. Avoid the warning by propagating the const filename through to elf_load_file() instead.
# c6bb4a64	09-Jul-1999	Peter Wemm <peter@FreeBSD.org>	Minor tweak - don't cause a warning. I don't know if it was intentional or not, but it would have printed out: /compat/linux/foo/bar.so: interpreter not found If it was, then I've broken it. De-constifying the 'interp' variable or carrying the constness through to elf_load_file() are alternatives.
# 7a583b02	05-Jul-1999	Marcel Moolenaar <marcel@FreeBSD.org>	Also try to load the interpreter without prepending "emul_path". This allows dynamicly linked binaries to run in a chroot'd environment with "emul_path" as the new root. The new behavior of loading interpreters is identical to the principle of overlaying. PR: 10145
# e972780a	16-May-1999	Alan Cox <alc@FreeBSD.org>	Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert, eliminating the need for the pmap_object_init_pt calls in imgact_* and mmap. Reviewed by: David Greenman <dg@root.com>
# e5f13bdd	14-May-1999	Alan Cox <alc@FreeBSD.org>	Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option. It never makes sense to specify MAP_COPY_NEEDED without also specifying MAP_COPY_ON_WRITE, and vice versa. Thus, MAP_COPY_ON_WRITE suffices. Reviewed by: David Greenman <dg@root.com>
# e37622b2	09-May-1999	Peter Wemm <peter@FreeBSD.org>	Fix a couple of warnings and some bitrot in comments.
# c33fe779	20-Feb-1999	John Polstra <jdp@FreeBSD.org>	If you merge this into -stable, please increment __FreeBSD_version in "src/sys/sys/param.h". Fix the ELF image activator so that it can handle dynamic linkers which are executables linked at a fixed address. This improves compliance with the ABI spec, and it opens the door to possibly better dynamic linker performance in the future. I've experimented a bit with a fixed-address dynamic linker, and it works fine. But I don't have any measurements yet to determine whether it's worthwhile. Also, remove a few calculations that were never used for anything. I will increment __FreeBSD_version, since this adds a new capability to the kernel that the dynamic linker might some day rely upon.
# b1028ad1	19-Feb-1999	Luoqi Chen <luoqi@FreeBSD.org>	Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper. Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
# 47633640	07-Feb-1999	John Polstra <jdp@FreeBSD.org>	Change the load address of the ELF dynamic linker from "2L*MAXDSIZ" to an architecture-specific value defined in <machine/elf.h>. This solves problems on large-memory systems that have a high value for MAXDSIZ. The load address is controlled by a new macro ELF_RTLD_ADDR(vmspace). On the i386 it is hard-wired to 0x08000000, which is the standard SVR4 location for the dynamic linker. On the Alpha, the dynamic linker is loaded MAXDSIZ bytes beyond the start of the program's data segment. This is the same place a userland mmap(0, ...) call would put it, so it ends up just below all the shared libraries. The rationale behind the calculation is that it allows room for the data segment to grow to its maximum possible size. These changes have been tested on the i386 for several months without problems. They have been tested on the Alpha as well, though not for nearly as long. I would like to merge the changes into 3.1 within a week if no problems have surfaced as a result of them.
# 9fdfe602	07-Feb-1999	Matthew Dillon <dillon@FreeBSD.org>	Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to attempt to optimize forks but were essentially given-up on due to problems and replaced with an explicit dup of the vm_map_entry structure. Prior to the removal, they were entirely unused.
# 6f8126fa	05-Feb-1999	John Polstra <jdp@FreeBSD.org>	Correct an "&" operator which should have been "&&". Submitted by: mjacob
# 3b351cc1	04-Feb-1999	Mark Newton <newton@FreeBSD.org>	Additional note on last rev: The rationale for this is to allow you to run Solaris executables (or executables from any other ELF system) directly off the CD-ROM without having to waste megabytes of disk by copying them to another filesystem just to brand them.
# f8b3601e	04-Feb-1999	Mark Newton <newton@FreeBSD.org>	Created sysctl kern.fallback_elf_brand. Defaults to "none", which will give the same behaviour produced before today. If sysadmin sets it to a valid ELF brand, ELF image activator will attempt to run unbranded ELF exectutables as if they were branded with that value. Suggested by: Dima Ruban <dima@best.net>
# 096977fa	03-Feb-1999	Mark Newton <newton@FreeBSD.org>	Provide elf_brand_inuse() as a method an emulator can use to find out whether it is currently in use (which is kinda useful when it's about to unload itself: Lockups are never very much fun, are they?).
# 820ca326	29-Jan-1999	Matthew Dillon <dillon@FreeBSD.org>	*_execsw static structures cannot be const due to the way they interact with EXEC_SET, DECLARE_MODULE, and module_register. Specifically, module_register. We may eventually be able to make these const, but not now.
# d254af07	27-Jan-1999	Matthew Dillon <dillon@FreeBSD.org>	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# 88c5ea45	25-Jan-1999	Julian Elischer <julian@FreeBSD.org>	Enable Linux threads support by default. This takes the conditionals out of the code that has been tested by various people for a while. ps and friends (libkvm) will need a recompile as some proc structure changes are made. Submitted by: "Richard Seaman, Jr." <dick@tar.com>
# 6626c604	18-Dec-1998	Julian Elischer <julian@FreeBSD.org>	Reviewed by: Luoqi Chen, Jordan Hubbard Submitted by: "Richard Seaman, Jr." <lists@tar.com> Obtained from: linux :-) Code to allow Linux Threads to run under FreeBSD. By default not enabled This code is dependent on the conditional COMPAT_LINUX_THREADS (suggested by Garret) This is not yet a 'real' option but will be within some number of hours.
# 2127f260	04-Dec-1998	Archie Cobbs <archie@FreeBSD.org>	Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc. These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>
# f5ef029e	25-Oct-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.
# 52c24af7	18-Oct-1998	Peter Wemm <peter@FreeBSD.org>	Some cleanups and optimizations: - Use the system headers method for Elf32/Elf64 symbol compatability - get rid of the UPRINTF debugging. - check the ELF header for compatability much more completely - optimize the section mapper. Use the same direct VM interfaces that imgact_aout.c and kern_exec.c use. - Check the return codes from the vm_* functions better. Some return KERN_* results, not an errno. - prefault the page tables to reduce startup faults on page tables like a.out does. - reset the segment protection to zero for each loop, otherwise each segment could get progressively more privs. (eg: if the first was read/write/execute, and the second was meant to be read/execute, the bug would make the second r/w/x too. In practice this was not a problem because executables are normally laid out with text first.) - Don't impose arbitary limits. Use the limits on headers imposed by the need to fit them into one page. - Remove unused switch() cases now that the verbose debugging is gone. I've been using an earlier version of this for a month or so. This sped up ELF exec speed a bit for me but I found it hard to get consistant benchmarks when I tested it last (a few weeks ago). I'm still bothered by the page read out of order caused by the transition from data to bss. This which requires either part filling the transition page or clearing the remainder.
# aa855a59	15-Oct-1998	Peter Wemm <peter@FreeBSD.org>	gulp. Jordan specifically OK'ed this.. This is the bulk of the support for doing kld modules. Two linker_sets were replaced by SYSINIT()'s. VFS's and exec handlers are self registered. kld is now a superset of lkm. I have converted most of them, they will follow as a seperate commit as samples. This all still works as a static a.out kernel using LKM's.
# 216a0f2d	15-Oct-1998	Doug Rabson <dfr@FreeBSD.org>	Don't frob the user stack directly, use suword instead. This fixes the elf_freebsd_fixup() panic which many people have noticed on the alpha.
# 6cde7a16	13-Oct-1998	David Greenman <dg@FreeBSD.org>	Fixed two potentially serious classes of bugs: 1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.
# d1dbc694	11-Oct-1998	John Polstra <jdp@FreeBSD.org>	If an ELF executable has a recognized brand, then believe it. Formerly, the heuristic involving the interpreter path took precedence. Also, print a better error message if the brand is missing or not recognized. If there is no brand at all, give the user a hint that "brandelf" needs to be run.
# 7b4c881c	02-Oct-1998	John Polstra <jdp@FreeBSD.org>	Fix a bug which caused the dynamic linker pathname in the PT_INTERP program header entry to be ignored if a recognized brand was found.
# 0ff27d31	15-Sep-1998	John Polstra <jdp@FreeBSD.org>	Restore the core-dumping of all writable segments for ELF executables, minus the NULL pointer dereference in rev. 1.33. Also simplify things somewhat by eliminating one traversal of the VM map entries. Finally, eliminate calls to vm_map_{un,}lock_read() which aren't needed here. I originally took them from procfs_map.c, but here we know we are dealing only with the map of the current process.
# dada0278	15-Sep-1998	John Polstra <jdp@FreeBSD.org>	Erk. Revert back to 1.31, dumping only data and stack to the core file, until I can solve a panic that has just cropped up.
# 6bb20c50	15-Sep-1998	John Polstra <jdp@FreeBSD.org>	When choosing segments to write to the core file, don't assume that writable implies readable.
# 8162da63	15-Sep-1998	John Polstra <jdp@FreeBSD.org>	Instead of just the data and stack segments, include all writable segments (except memory-mapped devices) in the ELF core file. This is really nice. You get access to the data areas of all shared libraries, and even to files that are mapped read-write. In the future, it might be good to add a new resource limit in the spirit of RLIMIT_CORE. It would specify the maximum sized writable segment to include in core dumps. Segments larger than that would be omitted. This would be useful for programs that map very large files read/write but that still would like to get usable core dumps.
# 8c64af4f	14-Sep-1998	John Polstra <jdp@FreeBSD.org>	Viola! The kernel now generates standard ELF core dumps for ELF executables. Currently only data and stack are included in the core dumps. I am looking into adding the other (mmapped) writable segments as well.
# 22d4b0fb	13-Sep-1998	John Polstra <jdp@FreeBSD.org>	Add provisions for variant core dump file formats, depending on the object format of the executable being dumped. This is the first step toward producing ELF core dumps in the proper format. I will commit the code to generate the ELF core dumps Real Soon Now. In the meantime, ELF executables won't dump core at all. That is probably no less useful than dumping a.out-style core dumps as they have done until now. Submitted by: Alex <garbanzo@hooked.net> (with very minor changes by me)
# a9d81f7c	29-Jul-1998	Doug Rabson <dfr@FreeBSD.org>	Default to FreeBSD if no brand detected. This makes life easier when bootstrapping from NetBSD/alpha.
# 7cd99438	14-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Cast u_longs to uintptr_t before casting them to pointers. Don't attempt to even partially support systems with function pointers larger than object pointers.
# ed62fb52	11-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Fixed printf format errors.
# 2e91d07a	08-Jun-1998	Doug Rabson <dfr@FreeBSD.org>	Fix a typo which prevented i386 elf from working at all (including Linux emulated elf binaries).
# ecbb00a2	07-Jun-1998	Doug Rabson <dfr@FreeBSD.org>	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
# 288078be	28-Apr-1998	Eivind Eklund <eivind@FreeBSD.org>	Translate T_PROTFLT to SIGSEGV instead of SIGBUS when running under Linux emulation. This make Allegro Common Lisp 4.3 work under FreeBSD! Submitted by: Fred Gilham <gilham@csl.sri.com> Commented on by: bde, dg, msmith, tg Hoping he got everything right: eivind
# 3c1300a6	28-Mar-1998	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# c8a79999	01-Mar-1998	Peter Wemm <peter@FreeBSD.org>	Update the ELF image activator to use some of the exec resources rather than rolling it's own. This means that it now uses the "safe" exec_map_first_page() to get the ld.so headers rather than risking a panic on a page fault failure (eg: NFS server goes down). Since all the ELF tools go to a lot of trouble to make sure everything lives in the first page for executables, this is a win. I have not seen any ELF executable on any system where all the headers didn't fit in the first page with lots of room to spare. I have been running variations of this code for some time on my pure ELF systems.
# 303b270b	08-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Staticize.
# 1560a9d5	20-Sep-1997	Peter Wemm <peter@FreeBSD.org>	We were (I think) missing a vrele() on the vnode for the object loaded via PT_INTERP (usually /usr/libexec/ld-elf.so.1).
# 5856e12e	12-Apr-1997	John Dyson <dyson@FreeBSD.org>	Fully implement vfork. Vfork is now much much faster than even our fork. (On my machine, fork is about 240usecs, vfork is 78usecs.) Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory from the other threads of a group. Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating possible existing shares with other threads/processes. Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a thread from the rest of the group. Fix the case where a thread does an exec. It is almost nonsense for a thread to modify the other threads address space by an exec, so we now automatically divorce the address space before modifying it.
# d8a4f230	01-Apr-1997	Bruce Evans <bde@FreeBSD.org>	Use OID_AUTO instead of magic number for old sysctl debug.elf_trace. The magic number conflicted with the one for the Lite2 sysctl debug.busyprt. Staticized some variables. Removed unused #includes.
# 3ac4d1ef	22-Mar-1997	Bruce Evans <bde@FreeBSD.org>	Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 996c772f	09-Feb-1997	John Dyson <dyson@FreeBSD.org>	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# e9822d92	22-Dec-1996	Joerg Wunsch <joerg@FreeBSD.org>	Make DFLDSIZ and MAXDSIZ fully-supported options. "Don't forget to do a ``make depend''" :-)
# d672246b	24-Oct-1996	Søren Schmidt <sos@FreeBSD.org>	Added a missing break, so all static bins would be missed :(
# 717fb679	16-Oct-1996	Søren Schmidt <sos@FreeBSD.org>	Oops forgot to remove a debug printf.
# ea5a2b2e	16-Oct-1996	Søren Schmidt <sos@FreeBSD.org>	Prepare kernel to take advantage of "branded" ELF binaries.
# 1a7eb2dc	03-Oct-1996	Peter Wemm <peter@FreeBSD.org>	Drop an unused param to unmap_pages().
# e0c95ed9	31-Aug-1996	Bruce Evans <bde@FreeBSD.org>	Fixed the easy cases of const poisoning in the kernel. Cosmetic.
# 6ead3edd	17-Jun-1996	John Dyson <dyson@FreeBSD.org>	Clean-up the new VM map procfs code, and also add support for executable format file "etype". It contains a description of the binary type for a process.
# c23670e2	11-Jun-1996	Gary Palmer <gpalmer@FreeBSD.org>	Clean up -Wunused warnings. Reviewed by: bde
# a794e791	30-Apr-1996	Bruce Evans <bde@FreeBSD.org>	Removed unnecessary #includes from <sys/imgact.h> so that it is self-sufficient and added explicit #includes where required.
# 71d7d1b1	11-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Remove references to MAP_FILE.. That is now "default" and is only a "#define MAP_FILE 0" that is still there for net-2 source compatability.
# 250c11f9	10-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Tweak the data/bss segment page count. The last version worked with all the test cases I tried, I'm sure this is more correct. Tweak some prototypes.
# 8191d577	10-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Fix some rounding problems.. In some (fairly rare) situtaions it mapped one page too many, which caused obreak() to fail in vm_map_find() with ENOMEM because of the conflicting page.
# e1743d02	10-Mar-1996	Søren Schmidt <sos@FreeBSD.org>	First attempt at FreeBSD & Linux ELF support. Compile and link a new kernel, that will give native ELF support, and provide the hooks for other ELF interpreters as well. To make native ELF binaries use John Polstras elf-kit-1.0.1.. For the time being also use his ld-elf.so.1 and put it in /usr/libexec. The Linux emulator has been enhanced to also run ELF binaries, it is however in its very first incarnation. Just get some Linux ELF libs (Slackware-3.0) and put them in the prober place (/compat/linux/...). I've ben able to run all the Slackware-3.0 binaries I've tried so far. (No it won't run quake yet :)