#
0cd9cde7 |
|
06-Apr-2024 |
Jake Freeland <jfree@FreeBSD.org> |
ktrace: Record namei violations with KTR_CAPFAIL Report namei path lookups while Capsicum violation tracing with CAPFAIL_NAMEI. vfs caching is also ignored when tracing to mimic capability mode behavior. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40680
|
#
4a69fc16 |
|
07-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Add membarrier(2) This is an attempt at clean-room implementation of the Linux' membarrier(2) syscall. For documentation, you would need to read both membarrier(2) Linux man page, the comments in Linux kernel/sched/membarrier.c implementation and possibly look at actual uses. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32360
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
94426d21 |
|
30-May-2023 |
Jessica Clarke <jrtc27@FreeBSD.org> |
pmc: Rework PROCEXEC event to support PIEs Currently the PROCEXEC event only reports a single address, entryaddr, which is the entry point of the interpreter in the typical dynamic case, and used solely to calculate the base address of the interpreter. For PDEs this is fine, since the base address is known from the program headers, but for PIEs the base address varies at run time based on where the kernel chooses to load it, and so pmcstat has no way of knowing the real address ranges for the executable. This was less of an issue in the past since PIEs were rare, but now they're on by default on 64-bit architectures it's more of a problem. To solve this, pass through what was picked for et_dyn_addr by the kernel, and use that as the offset for the executable's start address just as is done for everything in the kernel. Since we're changing this interface, sanitise the way we determine the interpreter's base address by passing it through directly rather than indirectly via the entry point and having to subtract off whatever the ELF header's e_entry is (and anything that wants the entry point in future can still add that back on as needed; this merely changes the interface to directly provide the underlying variables involved). This will be followed up by a bump to the pmc major version. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D39595
|
#
d706d02e |
|
29-May-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
sysentvec: Retire sv_imgact_try as unneeded anymore The sysentvec sv_imgact_try was used by kern_exec() to allow non-native ABI to fixup shell path according to ABI root directory. Since the non-native ABI can now specify its root directory directly to namei() via pwd_altroot() call this facility is not needed anymore. Differential Revision: https://reviews.freebsd.org/D40092 MFC after: 2 month
|
#
4d846d26 |
|
10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
|
#
5eeb4f73 |
|
17-Nov-2022 |
Doug Rabson <dfr@FreeBSD.org> |
imgact_binmisc: Optionally pre-open the interpreter vnode This allows the use of chroot and/or jail environments which depend on interpreters registed with imgact_binmisc to use emulator binaries from the host to emulate programs inside the chroot. Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D37432
|
#
5b5b7e2c |
|
17-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: always retain path buffer after lookup This removes some of the complexity needed to maintain HASBUF and allows for removing injecting SAVENAME by filesystems. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D36542
|
#
5e5675cb |
|
12-Aug-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove struct proc p_singlethr member It does not serve any purpose after we stopped doing thread_single(SINGLE_ALLPROC) from stoppable user processes. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36207
|
#
939f0b63 |
|
10-May-2022 |
Kornel Dulęba <kd@FreeBSD.org> |
Implement shared page address randomization It used to be mapped at the top of the UVA. If the randomization is enabled any address above .data section will be randomly chosen and a guard page will be inserted in the shared page default location. The shared page is now mapped in exec_map_stack, instead of exec_new_vmspace. The latter function is called before image activator has a chance to parse ASLR related flags. The KERN_PROC_VM_LAYOUT sysctl was extended to provide shared page address. The feature is enabled by default for 64 bit applications on all architectures. It can be toggled kern.elf64.aslr.shared_page sysctl. Approved by: mw(mentor) Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35349
|
#
361971fb |
|
02-Jun-2022 |
Kornel Dulęba <kd@FreeBSD.org> |
Rework how shared page related data is stored Store the shared page address in struct vmspace. Also instead of storing absolute addresses of various shared page segments save their offsets with respect to the shared page address. This will be more useful when the shared page address is randomized. Approved by: mw(mentor) Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35393
|
#
4493a13e |
|
15-May-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not single-thread itself when the process single-threaded some another process Since both self single-threading and remote single-threading rely on suspending the thread doing thread_single(), it cannot be mixed: thread doing thread_suspend_switch() might be subject to thread_suspend_one() and vice versa. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310
|
#
bb92cd7b |
|
24-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)
|
#
773fa8cd |
|
25-Jan-2022 |
Kyle Evans <kevans@FreeBSD.org> |
execve: disallow argc == 0 The manpage has contained the following verbiage on the matter for just under 31 years: "At least one argument must be present in the array" Previous to this version, it had been prefaced with the weakening phrase "By convention." Carry through and document it the rest of the way. Allowing argc == 0 has been a source of security issues in the past, and it's hard to imagine a valid use-case for allowing it. Toss back EINVAL if we ended up not copying in any args for *execve(). The manpage change can be considered "Obtained from: OpenBSD" Reviewed by: emaste, kib, markj (all previous version) Differential Revision: https://reviews.freebsd.org/D34045
|
#
1811c1e9 |
|
17-Jan-2022 |
Mark Johnston <markj@FreeBSD.org> |
exec: Reimplement stack address randomization The approach taken by the stack gap implementation was to insert a random gap between the top of the fixed stack mapping and the true top of the main process stack. This approach was chosen so as to avoid randomizing the previously fixed address of certain process metadata stored at the top of the stack, but had some shortcomings. In particular, mlockall(2) calls would wire the gap, bloating the process' memory usage, and RLIMIT_STACK included the size of the gap so small (< several MB) limits could not be used. There is little value in storing each process' ps_strings at a fixed location, as only very old programs hard-code this address; consumers were converted decades ago to use a sysctl-based interface for this purpose. Thus, this change re-implements stack address randomization by simply breaking the convention of storing ps_strings at a fixed location, and randomizing the location of the entire stack mapping. This implementation is simpler and avoids the problems mentioned above, while being unlikely to break compatibility anywhere the default ASLR settings are used. The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack, and is re-enabled by default. PR: 260303 Reviewed by: kib Discussed with: emaste, mw MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33704
|
#
758d98de |
|
17-Jan-2022 |
Mark Johnston <markj@FreeBSD.org> |
exec: Remove the stack gap implementation ASLR stack randomization will reappear in a forthcoming commit. Rather than inserting a random gap into the stack mapping, the entire stack mapping itself will be randomized in the same way that other mappings are when ASLR is enabled. No functional change intended, as the stack gap implementation is currently disabled by default. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33704
|
#
706f4a81 |
|
17-Jan-2022 |
Mark Johnston <markj@FreeBSD.org> |
exec: Introduce the PROC_PS_STRINGS() macro Rather than fetching the ps_strings address directly from a process' sysentvec, use this macro. With stack address randomization the ps_strings address is no longer fixed. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33704
|
#
1544f5ad |
|
17-Jan-2022 |
Mark Johnston <markj@FreeBSD.org> |
Revert "kern_exec: Add kern.stacktop sysctl." The current ASLR stack gap feature will be removed, and with that the need for the kern.stacktop sysctl is gone. All consumers have been removed. This reverts commit a97d697122da2bfb0baae5f0939d118d119dae33. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33704
|
#
f04a0960 |
|
30-Dec-2021 |
Mark Johnston <markj@FreeBSD.org> |
exec: Simplify sv_copyout_strings implementations a bit Simplify control flow around handling of the execpath length and signal trampoline. Cache the sysentvec pointer in a local variable. No functional change intended. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33703
|
#
7e1d3eef |
|
25-Nov-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
|
#
be10c0a9 |
|
03-Nov-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
fexecve(2): allow O_PATH file descriptors opened without O_EXEC This improves compatibility with Linux. Noted by: Drew DeVault <sir@cmpwn.com> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32821
|
#
e4ce23b2 |
|
03-Nov-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
fexecve(2): restore the attempts to calculate the executable path vn_fullpath() call was not converted to pass newtextvp, instead it used imgp->vp which is still NULL there. As result vn_fullpath() always returned EINVAL and execpath was recorded from the value of arg0. Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
1c696903 |
|
20-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Unmap shared page manually before doing vm_map_remove() on exit or exec This allows the pmap_remove(min, max) call to see empty pmap and exploit empty pmap optimization. Reviewed by: markj Tested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32569
|
#
351d5f7f |
|
23-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
exec: store parent directory and hardlink name of the binary in struct proc While doing it, also move all the code to resolve pathnames and obtain text vp and dvp, into single place. Besides simplifying the code, it avoids spurious vnode relocks and validates the explanation why a transient text reference on the script vnode is not harmful. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611
|
#
0c10648f |
|
22-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
exec: provide right hardlink name in AT_EXECPATH For this, use vn_fullpath_hardlink() to resolve executable name for execve(2). This should provide the right hardlink name, used for execution, instead of random hardlink pointing to this binary. Also this should make the AT_EXECNAME reliable for execve(2), since kernel only needs to resolve parent directory path, which should always succeed (except pathological cases like unlinking a directory). PR: 248184 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611
|
#
15bf81f3 |
|
23-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
struct image_params: use bool type for boolean members Also re-align comments, and group booleans and char members together. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611
|
#
9d58243f |
|
23-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
do_execve(): switch boolean locals to use bool type Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611
|
#
143dba3a |
|
22-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_exec.c: style Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611
|
#
46dd801a |
|
16-Oct-2021 |
Colin Percival <cperciva@FreeBSD.org> |
Add userland boot profiling to TSLOG On kernels compiled with 'options TSLOG', record for each process ID: * The timestamp of the fork() which creates it and the parent process ID, * The first path passed to execve(), if any, * The first path resolved by namei, if any, and * The timestamp of the exit() which terminates the process. Expose this information via a new sysctl, debug.tslog_user. On kernels lacking 'options TSLOG' (the default), no information is recorded and the sysctl does not exist. Note that recording namei is needed in order to obtain the names of rc.d scripts being launched, as the rc system sources them in a subshell rather than execing the scripts. With this commit it is now possible to generate flamecharts of the entire boot process from the start of the loader to the end of /etc/rc. The code needed to perform this processing is currently found in github: https://github.com/cperciva/freebsd-boot-profiling Reviewed by: mhorne Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D32493
|
#
a97d6971 |
|
13-Oct-2021 |
Dawid Gorecki <dgr@semihalf.com> |
kern_exec: Add kern.stacktop sysctl. With stack gap enabled top of the stack is moved down by a random amount of bytes. Because of that some multithreaded applications which use kern.usrstack sysctl to calculate address of stacks for their threads can fail. Add kern.stacktop sysctl, which can be used to retrieve address of the stack after stack gap is applied to it. Returns value identical to kern.usrstack for processes which have no stack gap. Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31897
|
#
889b56c8 |
|
13-Oct-2021 |
Dawid Gorecki <dgr@semihalf.com> |
setrlimit: Take stack gap into account. Calling setrlimit with stack gap enabled and with low values of stack resource limit often caused the program to abort immediately after exiting the syscall. This happened due to the fact that the resource limit was calculated assuming that the stack started at sv_usrstack, while with stack gap enabled the stack is moved by a random number of bytes. Save information about stack size in struct vmspace and adjust the rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max, then the value is truncated to rlim_max. PR: 253208 Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31516
|
#
a0558fe9 |
|
28-Apr-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
Retire code added to support CloudABI CloudABI was removed in cf0ee8738e31aa9e6fbf4dca4dac56d89226a71a
|
#
b5cadc64 |
|
04-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Make core dump writes interruptible with SIGKILL This can be disabled by sysctl kern.core_dump_can_intr Reported and tested by: pho Reviewed by: imp, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32313
|
#
63cb9308 |
|
01-Oct-2021 |
Justin Hibbits <jhibbits@FreeBSD.org> |
Fix segment size in compressing core dumps A core segment is bounded in size only by memory size. On 64-bit architectures this means a segment can be much larger than 4GB. However, compress_chunk() takes only a u_int, clamping segment size to 4GB-1, resulting in a truncated core. Everything else, including the compressor internally, uses size_t, so use size_t at the boundary here. This dates back to the original refactor back in 2015 (r279801 / aa14e9b7). MFC after: 1 week Sponsored by: Juniper Networks, Inc.
|
#
397f1889 |
|
12-Sep-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove SV_CAPSICUM It was only needed for cloudabi Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D31923
|
#
b7924341 |
|
27-Aug-2021 |
Andrew Turner <andrew@FreeBSD.org> |
Create sys/reg.h for the common code previously in machine/reg.h Move the common kernel function signatures from machine/reg.h to a new sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2). Reviewed by: imp, markj Sponsored by: DARPA, AFRL (original work) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19830
|
#
af29f399 |
|
28-Jul-2021 |
Dmitry Chagin <dchagin@FreeBSD.org> |
umtx: Split umtx.h on two counterparts. To prevent umtx.h polluting by future changes split it on two headers: umtx.h - ABI header for userspace; umtxvar.h - the kernel staff. While here fix umtx_key_match style. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31248 MFC after: 2 weeks
|
#
5fd9cd53 |
|
20-Jul-2021 |
Dmitry Chagin <dchagin@FreeBSD.org> |
linux(4): Modify sv_onexec hook to return an error. Temporary add stubs to the Linux emulation layer which calls the existing hook. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D30911 MFC after: 2 weeks
|
#
62ba4cd3 |
|
20-Jul-2021 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Call sv_onexec hook after the process VA is created. For future use in the Linux emulation layer call sv_onexec hook right after the new process address space is created. It's safe, as sv_onexec used only by Linux abi and linux_on_exec() does not depend on a state of process VA. Reviewed by: kib Differential revision: https://reviews.freebsd.org/D30899 MFC after: 2 weeks
|
#
28a66fc3 |
|
01-Jul-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not call FreeBSD-ABI specific code for all ABIs Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle robust mutexes, for native FreeBSD ABI. Similarly, there is no sense in calling sigfastblock_clear() for non-native ABIs. Requested by: dchagin Reviewed by: dchagin, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30987
|
#
71ab3445 |
|
01-Jul-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Add sv_onexec_old() sysent hook for exec event Unlike sv_onexec(), it is called from the old (pre-exec) sysentvec structure. The old vmspace for the process is still intact during the call. Reviewed by: dchagin, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D30987
|
#
db8d680e |
|
01-Jul-2021 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS This introduces a new, per-process flag, "NO_NEW_PRIVS", which is inherited, preserved on exec, and cannot be cleared. The flag, when set, makes subsequent execs ignore any SUID and SGID bits, instead executing those binaries as if they not set. The main purpose of the flag is implementation of Linux PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged chroot. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30939
|
#
615f22b2 |
|
29-Jun-2021 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Add a link to the Elf_Brandinfo into the struc proc. To allow the ABI to make a dicision based on the Brandinfo add a link to the Elf_Brandinfo into the struct proc. Add a note that the high 8 bits of Elf_Brandinfo flags is private to the ABI. Note to MFC: it breaks KBI. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D30918 MFC after: 2 weeks
|
#
2d423f76 |
|
14-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
sysent: allow ABI to disable setid on exec. Reviewed by: dchagin Tested by: trasz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28154
|
#
19e6043a |
|
14-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_exec.c: Add execve_nosetid() helper Reviewed by: dchagin Tested by: trasz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28154
|
#
3b9971c8 |
|
25-May-2021 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Clean up some of the core dumping code. No functional changes. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30397
|
#
154f0ecc |
|
22-May-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
Fix tinderbox build after 1762f674ccb571e6 ktrace commit.
|
#
1762f674 |
|
14-May-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
ktrace: pack all ktrace parameters into allocated structure ktr_io_params Ref-count the ktr_io_params structure instead of vnode/cred. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30257
|
#
33621dfc |
|
22-May-2021 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Refactor core dumping code a bit This makes it possible to use core_write(), core_output(), and sbuf_drain_core_output(), in Linux coredump code. Moving them out of imgact_elf.c is necessary because of the weird way it's being built. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D30369
|
#
f1c3adef |
|
13-Apr-2021 |
Mark Johnston <markj@FreeBSD.org> |
execve: Mark exec argument buffers We cache mapped execve argument buffers to avoid the overhead of TLB shootdowns. Mark them invalid when they are freed to the cache. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29460
|
#
4ea65707 |
|
11-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
exec_new_vmspace: print useful error message on ctty if stack cannot be mapped. After old vmspace is destroyed during execve(2), but before the new space is fully constructed, an error during image activation cannot be returned because there is no executing program to receive it. In the relatively common case of failure to map stack, print some hints on the control terminal. Note that user has enough knobs to cause stack mapping error, and this is the most common reason for execve(2) aborting the process. Requested by: jhb Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050
|
#
2e1c94aa |
|
08-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement enforcing write XOR execute mapping policy. It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE | PROT_EXEC are never specified together, if vm_map has MAP_WX flag set. FreeBSD control flag allows specific binary to request WX exempt, and there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/ disable globally. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050
|
#
673e2dd6 |
|
18-Dec-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Add ELF flag to disable ASLR stack gap. Also centralize and unify checks to enable ASLR stack gap in a new helper exec_stackgap(). PR: 239873 Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
87a9b18d |
|
23-Nov-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Provide ABI modules hooks for process exec/exit and thread exit. Exec and exit are same as corresponding eventhandler hooks. Thread exit hook is called somewhat earlier, while thread is still owned by the process and enough context is available. Note that the process lock is owned when the hook is called. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27309
|
#
e68c6191 |
|
21-Nov-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Stop using eventhandlers for itimers subsystem exec and exit hooks. While there, do some minor cleanup for kclocks. They are only registered from kern_time.c, make registration function static. Remove event hooks, they are not used by both registered kclocks. Add some consts. Perhaps we can stop registering kclocks at all and statically initialize them. Reviewed by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27305
|
#
74a093eb |
|
21-Nov-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Stop using eventhandler to invoke umtx_exec hook. There is no point in dynamic registration, umtx hook is there always. Reviewed by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27303
|
#
85078b85 |
|
17-Nov-2020 |
Conrad Meyer <cem@FreeBSD.org> |
Split out cwd/root/jail, cmask state from filedesc table No functional change intended. Tracking these structures separately for each proc enables future work to correctly emulate clone(2) in linux(4). __FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof. Reviewed by: kib Discussed with: markj, mjg Differential Revision: https://reviews.freebsd.org/D27037
|
#
f7db0c95 |
|
04-Nov-2020 |
Mark Johnston <markj@FreeBSD.org> |
vmspace: Convert to refcount(9) This is mostly mechanical except for vmspace_exit(). There, use the new refcount_release_if_last() to avoid switching to vmspace0 unless other processes are sharing the vmspace. In that case, upon switching to vmspace0 we can unconditionally release the reference. Remove the volatile qualifier from vm_refcnt now that accesses are protected using refcount(9) KPIs. Reviewed by: alc, kib, mmel MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27057
|
#
275c821d |
|
24-Oct-2020 |
Kyle Evans <kevans@FreeBSD.org> |
audit: correct reporting of *execve(2) success r326145 corrected do_execve() to return EJUSTRETURN upon success so that important registers are not clobbered. This had the side effect of tapping out 'failures' for all *execve(2) audit records, which is less than useful for auditing purposes. Audit exec returns earlier, where we can know for sure that EJUSTRETURN translates to success. Note that this unsets TDP_AUDITREC as we commit the audit record, so the usual audit in the syscall return path will do nothing. PR: 249179 Reported by: Eirik Oeverby <ltning-freebsd anduin net> Reviewed by: csjp, kib MFC after: 1 week Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26922
|
#
5dca94ee |
|
23-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove pointless local variable. Reported by: alc Sponsored by: The FreeBSD Foundation MFC after: 6 days
|
#
aaf78c16 |
|
23-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not leak oldvmspace if image activation failed and current address space is already destroyed, so kern_execve() terminates the process. While there, clean up some internals of post_execve() inlined in init_main. Reported by: Peter <pmc@citylink.dinoex.sub.org> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26525
|
#
feabaaf9 |
|
24-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho
|
#
d8bc2a17 |
|
15-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: remove fd_lastfile It keeps recalculated way more often than it is needed. Provide a routine (fdlastfile) to get it if necessary. Consumers may be better off with a bitmap iterator instead.
|
#
698dda67 |
|
02-May-2020 |
Conrad Meyer <cem@FreeBSD.org> |
kern_exec.c: Produce valid code ifndef SYS_PROTO_H Reported by: Coccinelle
|
#
b24e6ac8 |
|
16-Apr-2020 |
Brooks Davis <brooks@FreeBSD.org> |
Convert canary, execpathp, and pagesizes to pointers. Use AUXARGS_ENTRY_PTR to export these pointers. This is a followup to r359987 and r359988. Reviewed by: jhb Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24446
|
#
9df1c38b |
|
15-Apr-2020 |
Brooks Davis <brooks@FreeBSD.org> |
Export argc, argv, envc, envv, and ps_strings in auxargs. This simplifies discovery of these values, potentially with reducing the number of syscalls we need to make at runtime. Longer term, we wish to convert the startup process to pass an auxargs pointer to _start() and use that rather than walking off the end of envv. This is cleaner, more C-friendly, and for systems with strong bounds (e.g. CHERI) necessary. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24407
|
#
397df744 |
|
15-Apr-2020 |
Brooks Davis <brooks@FreeBSD.org> |
Make ps_strings in struct image_params into a pointer. This is a prepratory commit for D24407. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA
|
#
59838c1a |
|
01-Apr-2020 |
John Baldwin <jhb@FreeBSD.org> |
Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837
|
#
8d4d271e |
|
04-Mar-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
execve: use LOCKSHARED when looking up the interpreter Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23956
|
#
b05ca429 |
|
02-Mar-2020 |
Pawel Biernacki <kaktus@FreeBSD.org> |
sys/: Document few more sysctls. Submitted by: Antranig Vartanian <antranigv@freebsd.am> Reviewed by: kaktus Commented by: jhb Approved by: kib (mentor) Sponsored by: illuria security Differential Revision: https://reviews.freebsd.org/D23759
|
#
7aaf252c |
|
28-Feb-2020 |
Jeff Roberson <jeff@FreeBSD.org> |
Convert a few triviail consumers to the new unlocked grab API. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23847
|
#
a113b17f |
|
20-Feb-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not read sigfastblock word on syscall entry. On machines with SMAP, fueword executes two serializing instructions which can be seen in microbenchmarks. As a measure to restore microbenchmark numbers, only read the word on the attempt to deliver signal in ast(). If the word is set, signal is not delivered and word is kept, preventing interruption of interruptible sleeps by signals until userspace calls sigfastblock(UNBLOCK) which clears the word. This way, the spurious EINTR that userspace can see while in critical section is on first interruptible sleep, if a signal is pending, and on signal posting. It is believed that it is not important for rtld and lbithr critical sections. It might be visible for the application code e.g. for the callback of dl_iterate_phdr(3), but again the belief is that the non-compliance is acceptable. Most important is that the retry of the sleeping syscall does not interrupt unless additional signal is posted. For now I added the knob kern.sigfastblock_fetch_always to enable the word read on syscall entry to be able to diagnose possible issues due to spurious EINTR. While there, do some code restructuting to have all sigfastblock() handling located in kern_sig.c. Reviewed by: jeff Discussed with: mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23622
|
#
146fc63f |
|
09-Feb-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Add a way to manage thread signal mask using shared word, instead of syscall. A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773
|
#
643656cf |
|
31-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422
|
#
b249ce48 |
|
03-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
|
#
af009714 |
|
14-Dec-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
Handle pagein clustering in vm_page_grab_valid() so that it can be used by exec_map_first_page(). This will also enable pagein clustering for other interested consumers (tmpfs, md, etc). Discussed with: alc Approved by: kib Differential Revision: https://reviews.freebsd.org/D22731
|
#
d8010b11 |
|
09-Dec-2019 |
John Baldwin <jhb@FreeBSD.org> |
Copy out aux args after the argument and environment vectors. Partially revert r354741 and r354754 and go back to allocating a fixed-size chunk of stack space for the auxiliary vector. Keep sv_copyout_auxargs but change it to accept the address at the end of the environment vector as an input stack address and no longer allocate room on the stack. It is now called at the end of copyout_strings after the argv and environment vectors have been copied out. This should fix a regression in r354754 that broke the stack alignment for newer Linux amd64 binaries (and probably broke Linux arm64 as well). Reviewed by: kib Tested on: amd64 (native, linux64 (only linux-base-c7), and i386) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22695
|
#
31174518 |
|
03-Dec-2019 |
John Baldwin <jhb@FreeBSD.org> |
Use uintptr_t instead of register_t * for the stack base. - Use ustringp for the location of the argv and environment strings and allow destp to travel further down the stack for the stackgap and auxv regions. - Update the Linux copyout_strings variants to move destp down the stack as was done for the native ABIs in r263349. - Stop allocating a space for a stack gap in the Linux ABIs. This used to hold translated system call arguments, but hasn't been used since r159992. Reviewed by: kib Tested on: md64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22501
|
#
03b0d68c |
|
18-Nov-2019 |
John Baldwin <jhb@FreeBSD.org> |
Check for errors from copyout() and suword*() in sv_copyout_args/strings. Reviewed by: brooks, kib Tested on: amd64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22401
|
#
01a2b567 |
|
17-Nov-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_exec: p_osrel and p_fctl0 were obliterated by failed execve(2) attempt. Zeroing of them is needed so that an image activator can update the values as appropriate (or not set at all). Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22379
|
#
e3532331 |
|
15-Nov-2019 |
John Baldwin <jhb@FreeBSD.org> |
Add a sv_copyout_auxargs() hook in sysentvec. Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv instead of doing it in the sv_fixup hook. In particular, this new hook allows the stack space to be allocated at the same time the auxv values are copied out to userland. This allows us to avoid wasting space for unused auxv entries as well as not having to recalculate where the auxv vector is by walking back up over the argv and environment vectors. Reviewed by: brooks, emaste Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64 Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22355
|
#
0012f373 |
|
14-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594
|
#
63e97555 |
|
14-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(1/6) Replace busy checks with acquires where it is trival to do so. This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548
|
#
55248d32 |
|
26-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Fix handling of invalid pages in exec_map_first_page(). exec_map_first_page() would unconditionally free an unbacked, invalid page from the executable image. However, it is possible that the page is wired, in which case it is incorrect to free the page, so check for additional wirings first. Reported by: syzkaller Tested by: pho Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21767
|
#
fee2a2fa |
|
09-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486
|
#
1040254b |
|
07-Sep-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
In do_execve(), use shared text vnode lock consistently. Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560
|
#
1c36b728 |
|
07-Sep-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
In do_execve(), clear imgp->textset when restarting for interpreter. Otherwise, we might left the boolean set, which would affect cleanup after an error on interpreter activation. Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560
|
#
fe69291f |
|
03-Sep-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Add procctl(PROC_STACKGAP_CTL) It allows a process to request that stack gap was not applied to its stacks, retroactively. Also it is possible to control the gaps in the process after exec. PR: 239894 Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21352
|
#
b5d239cb |
|
28-Aug-2019 |
Mark Johnston <markj@FreeBSD.org> |
Wire pages in vm_page_grab() when appropriate. uiomove_object_page() and exec_map_first_page() would previously wire a page after having grabbed it. Ask vm_page_grab() to perform the wiring instead: this removes some redundant code, and is cheaper in the case where the requested page is not resident since the page allocator can be asked to initialize the page as wired, whereas a separate vm_page_wire() call requires the page lock. In vm_imgact_hold_page(), use vm_page_unwire_noq() instead of vm_page_unwire(PQ_NONE). The latter ensures that the page is dequeued before returning, but this is unnecessary since vm_page_free() will trigger a batched dequeue of the page. Reviewed by: alc, kib Tested by: pho (part of a larger patch) MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21440
|
#
01c3ba97 |
|
05-Aug-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix mis-merge Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
f422bc30 |
|
02-Aug-2019 |
John Baldwin <jhb@FreeBSD.org> |
Set ISOPEN in namei flags when opening executable interpreters. These vnodes are explicitly opened via VOP_OPEN via exec_check_permissions identical to the main exectuable image. Setting ISOPEN allows filesystems to perform suitable checks in VOP_LOOKUP (e.g. close-to-open consistency in the NFS client). Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D21129
|
#
fc83c5a7 |
|
31-Jul-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Make randomized stack gap between strings and pointers to argv/envs. This effectively makes the stack base on the csu _start entry randomized. The gap is enabled if ASLR is for the ABI is enabled, and then kern.elf{64,32}.aslr.stack_gap specify the max percentage of the initial stack size that can be wasted for gap. Setting it to zero disables the gap, and max is capped at 50%. Only amd64 for now. Reviewed by: cem, markj Discussed with: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21081
|
#
eeacb3b0 |
|
08-Jul-2019 |
Mark Johnston <markj@FreeBSD.org> |
Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247
|
#
e2e050c8 |
|
19-May-2019 |
Conrad Meyer <cem@FreeBSD.org> |
Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
|
#
78022527 |
|
05-May-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923
|
#
91ff2d48 |
|
12-Apr-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove unneeded conditionals for sv_ functions - all the ABIs (apart from null_sysvec) define them, so the 'else' branch is never taken. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19889
|
#
6f1fe330 |
|
16-Mar-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: Add md process flags and first P_MD_PTI flag. PTI mode for the process pmap on exec is activated iff P_MD_PTI is set. On exec, the existing vmspace can be reused only if pti mode of the pmap matches the P_MD_PTI flag of the process. Add MD cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can vetoed reuse of the existing vmspace. MFC note: md_flags change struct proc KBI. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19514
|
#
fa50a355 |
|
10-Feb-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement Address Space Layout Randomization (ASLR) With this change, randomization can be enabled for all non-fixed mappings. It means that the base address for the mapping is selected with a guaranteed amount of entropy (bits). If the mapping was requested to be superpage aligned, the randomization honours the superpage attributes. Although the value of ASLR is diminshing over time as exploit authors work out simple ASLR bypass techniques, it elimintates the trivial exploitation of certain vulnerabilities, at least in theory. This implementation is relatively small and happens at the correct architectural level. Also, it is not expected to introduce regressions in existing cases when turned off (default for now), or cause any significant maintaince burden. The randomization is done on a best-effort basis - that is, the allocator falls back to a first fit strategy if fragmentation prevents entropy injection. It is trivial to implement a strong mode where failure to guarantee the requested amount of entropy results in mapping request failure, but I do not consider that to be usable. I have not fine-tuned the amount of entropy injected right now. It is only a quantitive change that will not change the implementation. The current amount is controlled by aslr_pages_rnd. To not spoil coalescing optimizations, to reduce the page table fragmentation inherent to ASLR, and to keep the transient superpage promotion for the malloced memory, locality clustering is implemented for anonymous private mappings, which are automatically grouped until fragmentation kicks in. The initial location for the anon group range is, of course, randomized. This is controlled by vm.cluster_anon, enabled by default. The default mode keeps the sbrk area unpopulated by other mappings, but this can be turned off, which gives much more breathing bits on architectures with small address space, such as i386. This is tied with the question of following an application's hint about the mmap(2) base address. Testing shows that ignoring the hint does not affect the function of common applications, but I would expect more demanding code could break. By default sbrk is preserved and mmap hints are satisfied, which can be changed by using the kern.elf{32,64}.aslr.honor_sbrk sysctl. ASLR is enabled on per-ABI basis, and currently it is only allowed on FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support for additional architectures will be added after further testing. Both per-process and per-image controls are implemented: - procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS; - NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible to force ASLR off for the given binary. (A tool to edit the feature control note is in development.) Global controls are: - kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2); - kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings; - kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2); - vm.cluster_anon - enables anon mapping clustering. PR: 208580 (exp runs) Exp-runs done by: antoine Reviewed by: markj (previous version) Discussed with: emaste Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5603
|
#
6f26dd50 |
|
07-Feb-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
do_execve(): lock vnode when needed. Code after exec_fail_dealloc label expects that the image vnode is locked if present. When copyout() of the strings or auxv vectors fails, goto to the error handling did not relocked the vnode as required. The copyout() can be made failing e.g. by creating an ELF image with PT_GNU_STACK segment disabling the write. Reported by: Jonathan Stuart <n0t.jcs@gmail.com> (found by fuzzing) Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
cc426dd3 |
|
11-Dec-2018 |
Mateusz Guzik <mjg@FreeBSD.org> |
Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation
|
#
f373437a |
|
29-Nov-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Add helper functions to copy strings into struct image_args. Given a zeroed struct image_args with an allocated buf member, exec_args_add_fname() must be called to install a file name (or NULL). Then zero or more calls to exec_args_add_env() followed by zero or more calls to exec_args_add_env(). exec_args_adjust_args() may be called after args and/or env to allow an interpreter to be prepended to the argument list. To allow code reuse when adding arg and env variables, begin_envv should be accessed with the accessor exec_args_get_begin_envv() which handles the case when no environment entries have been added. Use these functions to simplify exec_copyin_args() and freebsd32_exec_copyin_args(). Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15468
|
#
f5cf7589 |
|
23-Nov-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Provide storage for the process feature control flags in struct proc. The flags are cleared on exec, it is up to the image activator to set them. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
12e69f96 |
|
02-Nov-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Add const to input-only char * arguments. These arguments are mostly paths handled by NAMEI*() macros which already take const char * arguments. This change improves the match between syscalls.master and the public declerations of system calls. Reviewed by: kib (prior version) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17812
|
#
2fe6aeff |
|
17-Aug-2018 |
Mariusz Zaborski <oshogbo@FreeBSD.org> |
capsicum: allow the setproctitle(3) function in capability mode Capsicum in past allowed to change the process title. This was broken with r335939. PR: 230584 Submitted by: Yuichiro NAITO <naito.yuichiro@gmail.com> Reported by: ian@niw.com.au MFC after: 1 week
|
#
d92da759 |
|
13-Jul-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Round down the location of execpathp to slightly improve copyout speed. In practice, this moves the padding from below the canary to above execpathp has no impact on stack consumption. Submitted by: Wuyang-Chung (via github pull request #159) MFC after: 1 week
|
#
2bf95012 |
|
05-Jul-2018 |
Andrew Turner <andrew@FreeBSD.org> |
Create a new macro for static DPCPU data. On arm64 (and possible other architectures) we are unable to use static DPCPU data in kernel modules. This is because the compiler will generate PC-relative accesses, however the runtime-linker expects to be able to relocate these. In preparation to fix this create two macros depending on if the data is global or static. Reviewed by: bz, emaste, markj Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D16140
|
#
3e7cb27c |
|
04-Jun-2018 |
Alan Cox <alc@FreeBSD.org> |
Use a single, consistent approach to returning success versus failure in vm_map_madvise(). Previously, vm_map_madvise() used a traditional Unix- style "return (0);" to indicate success in the common case, but Mach- style return values in the edge cases. Since KERN_SUCCESS equals zero, the only problem with this inconsistency was stylistic. vm_map_madvise() has exactly two callers in the entire source tree, and only one of them cares about the return value. That caller, kern_madvise(), can be simplified if vm_map_madvise() consistently uses Unix-style return values. Since vm_map_madvise() uses the variable modify_map as a Boolean, make it one. Eliminate a redundant error check from kern_madvise(). Add a comment explaining where the check is performed. Explicitly note that exec_release_args_kva() doesn't care about vm_map_madvise()'s return value. Since MADV_FREE is passed as the behavior, the return value will always be zero. Reviewed by: kib, markj MFC after: 7 days
|
#
b8d908b7 |
|
01-Jun-2018 |
Ed Maste <emaste@FreeBSD.org> |
ANSIfy sys/kern
|
#
5f77b8a8 |
|
24-May-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Avoid two suword() calls per auxarg entry. Instead, construct an auxargs array and copy it out all at once. Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent the array. This is the correct type where pairs of words just happend to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act on this array rather than introducing a new macro. Return errors on copyout() and suword() failures and handle them in the caller. Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64 which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32 (now removed due to the use of proper types). Reviewed by: kib Comments from: emaste, jhb Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15485
|
#
cbd92ce6 |
|
09-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
Eliminate the overhead of gratuitous repeated reinitialization of cap_rights - Add macros to allow preinitialization of cap_rights_t. - Convert most commonly used code paths to use preinitialized cap_rights_t. A 3.6% speedup in fstat was measured with this change. Reported by: mjg Reviewed by: oshogbo Approved by: sbruno MFC after: 1 month
|
#
1302eea7 |
|
20-Apr-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Rename PROC_PDEATHSIG_SET -> PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_GET -> PROC_PDEATHSIG_STATUS for consistency with other procctl(2) operations names. Requested by: emaste Sponsored by: The FreeBSD Foundation MFC after: 13 days
|
#
73c8686e |
|
19-Apr-2018 |
John Baldwin <jhb@FreeBSD.org> |
Simplify the code to allocate stack for auxv, argv[], and environment vectors. Remove auxarg_size as it was only used once right after a confusing assignment in each of the variants of exec_copyout_strings(). Reviewed by: emaste MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D15123
|
#
b9408863 |
|
18-Apr-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Add PROC_PDEATHSIG_SET to procctl interface. Allow processes to request the delivery of a signal upon death of their parent process. Supposed consumer of the feature is PostgreSQL. Submitted by: Thomas Munro Reviewed by: jilles, mjg MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D15106
|
#
6469bdcd |
|
06-Apr-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941
|
#
b2387aa6 |
|
07-Feb-2018 |
Andriy Gapon <avg@FreeBSD.org> |
exec_map_first_page: fix an inverse condition introduced in r254138 While the bug itself was serious, as we could either pass a non-busied page to vm_pager_get_pages() or leak a busy page, it could only be triggered under a very rare condition where the page is already inserted into the object, but it is not valid yet. Reviewed by: kib MFC after: 2 weeks
|
#
d821d364 |
|
21-Jan-2018 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
Unsign some values related to allocation. When allocating memory through malloc(9), we always expect the amount of memory requested to be unsigned as a negative value would either stand for an error or an overflow. Unsign some values, found when considering the use of mallocarray(9), to avoid unnecessary casting. Also consider that indexes should be of at least the same size/type as the upper limit they pretend to index. MFC after: 3 weeks
|
#
8a36da99 |
|
27-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
|
#
814629dd |
|
24-Nov-2017 |
Ed Schouten <ed@FreeBSD.org> |
Don't let cpu_set_syscall_retval() clobber exec_setregs(). Upon successful completion, the execve() system call invokes exec_setregs() to initialize the registers of the initial thread of the newly executed process. What is weird is that when execve() returns, it still goes through the normal system call return path, clobbering the registers with the system call's return value (td->td_retval). Though this doesn't seem to be problematic for x86 most of the times (as the value of eax/rax doesn't matter upon startup), this can be pretty frustrating for architectures where function argument and return registers overlap (e.g., ARM). On these systems, exec_setregs() also needs to initialize td_retval. Even worse are architectures where cpu_set_syscall_retval() sets registers to values not derived from td_retval. On these architectures, there is no way cpu_set_syscall_retval() can set registers to the way it wants them to be upon the start of execution. To get rid of this madness, let sys_execve() return EJUSTRETURN. This will cause cpu_set_syscall_retval() to leave registers intact. This makes process execution easier to understand. It also eliminates the difference between execution of the initial process and successive ones. The initial call to sys_execve() is not performed through a system call context. Reviewed by: kib, jhibbits Differential Revision: https://reviews.freebsd.org/D13180
|
#
2ca45184 |
|
09-Nov-2017 |
Matt Joras <mjoras@FreeBSD.org> |
Introduce EVENTHANDLER_LIST and some users. This introduces a facility to EVENTHANDLER(9) for explicitly defining a reference to an event handler list. This is useful since previously all invokers of events had to do a locked traversal of the global list of event handler lists in order to find the appropriate event handler list. By keeping a pointer to the appropriate list an invoker can avoid this traversal completely. The pointer is initialized with SYSINIT(9) during the eventhandler stage. Users registering interest in events do not need to know if the event is backed by such a list, since the list is added to the global list of lists. As with lists that are not pre-defined it is safe to register for the events before the list has been created. This converts the process_* and thread_* events to using the new facility, as these are events whose locked traversals end up showing up significantly in ports build workflows (and presumably other workflows with many short lived threads/procs). It may be advantageous to convert other events to using the new facility. The el_flags field is now unused, but leave it be so that this revision can be MFC'd. Reviewed by: bdrewery, markj, mjg Approved by: rstone (mentor) In collaboration with: ian MFC after: 4 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12814
|
#
e6b645ae |
|
18-Oct-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
execve: avoid one proc lock/unlock trip unless PTRACE_EXEC is set MFC after: 1 week
|
#
80a2397a |
|
18-Oct-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
Tidy up pmc support at execve. The proc-specific check is inherently racy, so the code can just unlock beforehand. MFC after: 1 week
|
#
467b3975 |
|
03-Jul-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Resolve confusion between different error code spaces. The vm_map_fixed() and vm_map_stack() VM functions return Mach error codes. Convert them into errno values before returning result from exec_new_vmspace(). While there, modernize the comment and do minor style adjustments. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
8056df6e |
|
30-Jun-2017 |
Alan Cox <alc@FreeBSD.org> |
Clear the MAP_WIREFUTURE flag on the vm map in exec_new_vmspace() when it recycles the current vm space. Otherwise, an mlockall(MCL_FUTURE) could still be in effect on the process after an execve(2), which violates the specification for mlockall(2). It's pointless for vm_map_stack() to check the MEMLOCK limit. It will never be asked to wire the stack. Moreover, it doesn't even implement wiring of the stack. Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11421
|
#
3e85b721 |
|
16-May-2017 |
Ed Maste <emaste@FreeBSD.org> |
Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193
|
#
c6a4ba5a |
|
14-Feb-2017 |
Mark Johnston <markj@FreeBSD.org> |
Apply MADV_FREE to exec_map entries only after a lowmem event. This effectively provides the same benefit as applying MADV_FREE inline upon every execve, since the page daemon invokes lowmem handlers prior to scanning the inactive queue. It also has less overhead; the cost of applying MADV_FREE is very noticeable on many-CPU systems since it includes that of a TLB shootdown of global PTEs. For instance, this change nearly halves the system CPU usage during a buildkernel on a 128-vCPU EC2 instance (with some other patches applied). Benchmarked by: cperciva (earlier version) Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9586
|
#
6e89d383 |
|
06-Jan-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Explicitely add "opt_compat.h" to kern_exec.c: fix powerpc LINT builds. sys/ptrace.h includes sys/signal.h, which includes sys/_sigset.h. Note that sys/_sigset.h only defines osigset_t if COMPAT_43 was defined. Two lines later, sys/ptrace.h includes machine/reg.h, which in case of powerpc, includes opt_compat.h. After the include headers reordering in r311345, we have sys/ptrace.h included before sys/sysproto.h. If COMPAT_43 was requested in the kernel config, the result is that sys/_sigset.h does not define osigset_t, but sys/sysproto.h sees COMPAT_43 and uses osigset_t. Fix this by explicitely including opt_compat.h to cover the whole kern/kern_exec.c scope. Sponsored by: The FreeBSD Foundation
|
#
ec492b13 |
|
04-Jan-2017 |
Mark Johnston <markj@FreeBSD.org> |
Add a small allocator for exec_map entries. Upon each execve, we allocate a KVA range for use in copying data to the new image. Pages must be faulted into the range, and when the range is freed, the backing pages are freed and their mappings are destroyed. This is a lot of needless overhead, and the exec_map management becomes a bottleneck when many CPUs are executing execve concurrently. Moreover, the number of available ranges is fixed at 16, which is insufficient on large systems and potentially excessive on 32-bit systems. The new allocator reduces overhead by making exec_map allocations persistent. When a range is freed, pages backing the range are marked clean and made easy to reclaim. With this change, the exec_map is sized based on the number of CPUs. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8921
|
#
eeeaa7ba |
|
04-Jan-2017 |
Mark Johnston <markj@FreeBSD.org> |
Sort includes in kern_exec.c. MFC after: 1 week
|
#
7667839a |
|
15-Nov-2016 |
Alan Cox <alc@FreeBSD.org> |
Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497
|
#
53dc58f2 |
|
19-Oct-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
Mark a bunch of mpsafe sysctls as such. This gives me a sysctl Giant-free buildworld.
|
#
ce3ee09b |
|
14-Aug-2016 |
Alan Cox <alc@FreeBSD.org> |
Eliminate unneeded vm_page_xbusy() and vm_page_xunbusy() operations when neither vm_pager_has_page() nor vm_pager_get_pages() is called. Reviewed by: kib, markj MFC after: 3 weeks
|
#
fc9bbf27 |
|
13-Aug-2016 |
Alan Cox <alc@FreeBSD.org> |
Eliminate two calls to vm_page_xunbusy() that are both unnecessary and incorrect from the error cases in exec_map_first_page(). They are unnecessary because we automatically unbusy the page in vm_page_free() when we remove it from the object. The calls are incorrect because they happen after the page is freed, so we might actually unbusy the page after it has been reallocated to a different object. (This error was introduced in r292373.) Reviewed by: kib MFC after: 1 week
|
#
77d68094 |
|
18-Jul-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
The assertion re-added in r302614 was triggered when stopping signal is delivered to vforked child. Issue is that we avoid stopping such children in issignal() to not block parents. But executed AST, which ignored stops, leaves the child with the signal pending but no AST pending. On first exec after vfork(), call signotify() to handle pending reenabled signals. Adjust the assert to not check vfork children until exec. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
8d570f64 |
|
15-Jul-2016 |
John Baldwin <jhb@FreeBSD.org> |
Add a mask of optional ptrace() events. ptrace() now stores a mask of optional events in p_ptevents. Currently this mask is a single integer, but it can be expanded into an array of integers in the future. Two new ptrace requests can be used to manipulate the event mask: PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK sets the current event mask. The current set of events include: - PTRACE_EXEC: trace calls to execve(). - PTRACE_SCE: trace system call entries. - PTRACE_SCX: trace syscam call exits. - PTRACE_FORK: trace forks and auto-attach to new child processes. - PTRACE_LWP: trace LWP events. The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS. The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for compatibility but now simply toggle corresponding flags in the event mask. While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both modify the event mask and continue the traced process. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7044
|
#
9e590ff0 |
|
27-Jun-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
When filt_proc() removes event from the knlist due to the process exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking. Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed. Other changes: In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports. Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process. In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN). Some minor and incomplete style fixes. Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859
|
#
3fc292d5 |
|
07-Jun-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Old process credentials for setuid execve must not be dereferenced when the process credentials were not changed. This can happen if an error occured trying to activate the setuid binary. And on error, if new credentials were not yet assigned, they must be freed to not create the leak. Use oldcred == NULL as the predicate to detect credential reassignment. Reported and tested by: pho Sponsored by: The FreeBSD Foundation
|
#
cda68844 |
|
27-May-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
exec: get rid of one vnode lock/unlock pair in do_execve The lock was temporarily dropped for vrele calls, but they can be postponed to a point where the lock is not held in the first place. While here shuffle other code not needing the lock.
|
#
1afd78b3 |
|
26-May-2016 |
Bryan Drewery <bdrewery@FreeBSD.org> |
exec: Provide execpath in imgp for the process_exec hook. This was previously set after the hook and only if auxargs were present. Now always provide it if possible. MFC after: 2 weeks Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6546
|
#
881010f0 |
|
26-May-2016 |
Bryan Drewery <bdrewery@FreeBSD.org> |
exec: Add credential change information into imgp for process_exec hook. This allows an EVENTHANDLER(process_exec) hook to see if the new image will cause credentials to change whether due to setgid/setuid or because of POSIX saved-id semantics. This adds 3 new fields into image_params: struct ucred *newcred Non-null if the credentials will change. bool credential_setid True if the new image is setuid or setgid. This will pre-determine the new credentials before invoking the image activators, where the process_exec hook is called. The new credentials will be installed into the process in the same place as before, after image activators are done handling the image. MFC after: 2 weeks Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6544
|
#
e3043798 |
|
29-Apr-2016 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/kern: spelling fixes in comments. No functional change.
|
#
35030a5d |
|
29-Mar-2016 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove some NULL checks for M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation
|
#
3b77ac5f |
|
01-Mar-2016 |
Bryan Drewery <bdrewery@FreeBSD.org> |
Correct a comment.
|
#
36160958 |
|
16-Dec-2015 |
Mark Johnston <markj@FreeBSD.org> |
Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week
|
#
b0cd2017 |
|
16-Dec-2015 |
Gleb Smirnoff <glebius@FreeBSD.org> |
A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES(). o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix
|
#
e6b95927 |
|
06-Oct-2015 |
Conrad Meyer <cem@FreeBSD.org> |
Fix core corruption caused by race in note_procstat_vmmap This fix is spiritually similar to r287442 and was discovered thanks to the KASSERT added in that revision. NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to the length of filenames corresponding to vnodes in the process' vm map via vn_fullpath. As vnodes may move during coredump, this is racy. We do not remove the race, only prevent it from causing coredump corruption. - Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption and truncation, even if names change, at the cost of up to PATH_MAX bytes per mapped object. The new sysctl is documented in core.5. - Fix note_procstat_vmmap to self-limit in the second pass. This addresses corruption, at the cost of sometimes producing a truncated result. - Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste) to grok the new zero padding. Reported by: pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3824
|
#
2f2f522b |
|
27-Sep-2015 |
Andriy Gapon <avg@FreeBSD.org> |
save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days
|
#
bcb60d52 |
|
07-Sep-2015 |
Conrad Meyer <cem@FreeBSD.org> |
Follow-up to r287442: Move sysctl to compiled-once file Avoid duplicate sysctl nodes. Found by: tijl Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3586
|
#
39f5ebb7 |
|
03-Aug-2015 |
Ed Schouten <ed@FreeBSD.org> |
Add sysent flag to switch to capabilities mode on startup. CloudABI processes should run in capabilities mode automatically. There is no need to switch manually (e.g., by calling cap_enter()). Add a flag, SV_CAPSICUM, that can be used to call into cap_enter() during execve(). Reviewed by: kib
|
#
b4490c6e |
|
18-Jul-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
457f7e23 |
|
16-Jul-2015 |
Ed Schouten <ed@FreeBSD.org> |
Implement CloudABI's exec() call. Summary: In a runtime that is purely based on capability-based security, there is a strong emphasis on how programs start their execution. We need to make sure that we execute an new program with an exact set of file descriptors, ensuring that credentials are not leaked into the process accidentally. Providing the right file descriptors is just half the problem. There also needs to be a framework in place that gives meaning to these file descriptors. How does a CloudABI mail server know which of the file descriptors corresponds to the socket that receives incoming emails? Furthermore, how will this mail server acquire its configuration parameters, as it cannot open a configuration file from a global path on disk? CloudABI solves this problem by replacing traditional string command line arguments by tree-like data structure consisting of scalars, sequences and mappings (similar to YAML/JSON). In this structure, file descriptors are treated as a first-class citizen. When calling exec(), file descriptors are passed on to the new executable if and only if they are referenced from this tree structure. See the cloudabi-run(1) man page for more details and examples (sysutils/cloudabi-utils). Fortunately, the kernel does not need to care about this tree structure at all. The C library is responsible for serializing and deserializing, but also for extracting the list of referenced file descriptors. The system call only receives a copy of the serialized data and a layout of what the new file descriptor table should look like: int proc_exec(int execfd, const void *data, size_t datalen, const int *fds, size_t fdslen); This change introduces a set of fd*_remapped() functions: - fdcopy_remapped() pulls a copy of a file descriptor table, remapping all of the file descriptors according to the provided mapping table. - fdinstall_remapped() replaces the file descriptor table of the process by the copy created by fdcopy_remapped(). - fdescfree_remapped() frees the table in case we aborted before fdinstall_remapped(). We then add a function exec_copyin_data_fds() that builds on top these functions. It copies in the data and constructs a new remapped file descriptor. This is used by cloudabi_sys_proc_exec(). Test Plan: cloudabi-run(1) is capable of spawning processes successfully, providing it data and file descriptors. procstat -f seems to confirm all is good. Regular FreeBSD processes also work properly. Reviewers: kib, mjg Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3079
|
#
61617058 |
|
13-Jul-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
exec: textvp -> oldtextvp; binvp -> newtextvp This makes it consistent with the rest of the naming in do_execve. No functional changes.
|
#
853be5ff |
|
13-Jul-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
exec plug a redundant vref + vrele of the image vnode
|
#
6ef12002 |
|
30-Jun-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not calculate the stack's bottom address twice. Submitted by: Olivц╘r Pintц╘r Review: https://reviews.freebsd.org/D2953 MFC after: 1 week
|
#
093c7f39 |
|
12-Jun-2015 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make KPI of vm_pager_get_pages() more strict: if a pager changes a page in the requested array, then it is responsible for disposition of previous page and is responsible for updating the entry in the requested array. Now consumers of KPI do not need to re-lookup the pages after call to vm_pager_get_pages(). Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
f6f6d240 |
|
10-Jun-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
|
#
7b445033 |
|
10-May-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
On exec, single-threading must be enforced before arguments space is allocated from exec_map. If many threads try to perform execve(2) in parallel, the exec map is exhausted and some threads sleep uninterruptible waiting for the map space. Then, the thread which won the race for the space allocation, cannot single-thread the process, causing deadlock. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
fe8a824c |
|
23-Apr-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
Handle incorrect ELF images specifying size for PT_GNU_STACK not being multiple of page size. Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
316b3843 |
|
15-Apr-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement support for binary to requesting specific stack size for the initial thread. It is read by the ELF image activator as the virtual size of the PT_GNU_STACK program header entry, and can be specified by the linker option -z stack-size in newer binutils. The soft RLIMIT_STACK is auto-increased if possible, to satisfy the binary' request. Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
3d653db0 |
|
21-Mar-2015 |
Alan Cox <alc@FreeBSD.org> |
Introduce vm_object_color() and use it in mmap(2) to set the color of named objects to zero before the virtual address is selected. Previously, the color setting was delayed until after the virtual address was selected. In rtld, this delay effectively prevented the mapping of a shared library's code section using superpages. Now, for example, we see the first 1 MB of libc's code on armv6 mapped by a superpage after we've gotten through the initial cold misses that bring the first 1 MB of code into memory. (With the page clustering that we perform on read faults, this happens quickly.) Differential Revision: https://reviews.freebsd.org/D2013 Reviewed by: jhb, kib Tested by: Svatopluk Kraus (armv6) MFC after: 6 weeks
|
#
daf63fd2 |
|
15-Mar-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
cred: add proc_set_cred helper The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed.
|
#
677258f7 |
|
18-Jan-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
6ddcc233 |
|
13-Dec-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Add facility to stop all userspace processes. The supposed use of the feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
8a5177cc |
|
31-Oct-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
filedesc: fix missed comments about fdsetugidsafety While here just note that both fdsetugidsafety and fdcheckstd take sleepable locks.
|
#
0a2c94b8 |
|
28-Oct-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks
|
#
11888da8 |
|
21-Oct-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
filedesc: cleanup setugidsafety a little Rename it to fdsetugidsafety for consistency with other functions. There is no need to take filedesc lock if not closing any files. The loop has to verify each file and we are guaranteed fdtable has space for at least 20 fds. As such there is no need to check fd_lastfile. While here tidy up is_unsafe.
|
#
5c37b305 |
|
20-Oct-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Plug unnecessary binvp NULL initialization and test. Reported by: Coverity CID: 1018889
|
#
8e572983 |
|
29-Sep-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Use bzero instead of explicitly zeroing stuff in do_execve. While strictly speaking this is not correct since some fields are pointers, it makes no difference on all supported archs and we already rely on it doing the right thing in other places. No functional changes.
|
#
70978c93 |
|
12-Aug-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
If vm_page_grab() allocates a new page, the page is not inserted into page queue even when the allocation is not wired. It is responsibility of the vm_page_grab() caller to ensure that the page does not end on the vm_object queue but not on the pagedaemon queue, which would effectively create unpageable unwired page. In exec_map_first_page() and vm_imgact_hold_page(), activate the page immediately after unbusying it, to avoid leak. In the uiomove_object_page(), deactivate page before the object is unlocked. There is no leak, since the page is deactivated after uiomove_fromphys() finished. But allowing non-queued non-wired page in the unlocked object queue makes it impossible to assert that leak does not happen in other places. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
965d0860 |
|
14-Jul-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Plug p_pptr null test in do_execve. It is always true.
|
#
5e2554b7 |
|
07-Jul-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Don't call crdup nor uifind under vnode lock. A locked vnode can get into the way of satisyfing malloc with M_WATOK. This is a fixup to r268087. Suggested by: kib MFC after: 1 week
|
#
e7d939bd |
|
06-Jul-2014 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Remove ia64. This includes: o All directories named *ia64* o All files named *ia64* o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan
|
#
a6bad85e |
|
01-Jul-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Plug gcc warning after r268074 about unitialized newsigacts Reported by: Gary Jennejohn <gljennjohn gmail.com>
|
#
350d5181 |
|
01-Jul-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Don't call crcopysafe or uifind unnecessarily in execve. MFC after: 1 week
|
#
d00c8ea4 |
|
01-Jul-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Perform a lockless check in sigacts_shared. It is used only during execve (i.e. singlethreaded), so there is no fear of returning 'not shared' which soon becomes 'shared'. While here reorganize the code a little to avoid proc lock/unlock in shared case. MFC after: 1 week
|
#
b0bc0cad |
|
27-Jun-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Call fdcloseexec right after fdunshare. No functional changes. MFC after: 1 week
|
#
b9d32c36 |
|
27-Jun-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Make fdunshare accept only td parameter. Proc had to match the thread anyway and 2 parameters were inconsistent with the rest. MFC after: 1 week
|
#
af3b2549 |
|
27-Jun-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Pull in r267961 and r267973 again. Fix for issues reported will follow.
|
#
37a107a4 |
|
27-Jun-2014 |
Glen Barber <gjb@FreeBSD.org> |
Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
|
#
3da1cf1e |
|
27-Jun-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
|
#
e16406c7 |
|
26-Jun-2014 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Remove duplicated includes. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org>
|
#
78960940 |
|
08-Jun-2014 |
Alan Cox <alc@FreeBSD.org> |
Refresh a comment. The VM_STACK option was eliminated in r43209. Sponsored by: EMC / Isilon Storage Division
|
#
7032434e |
|
20-May-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
When exec_new_vmspace() decides that current vmspace cannot be reused on execve(2), it calls vmspace_exec(), which frees the current vmspace. The thread executing an exec syscall gets new vmspace assigned, and old vmspace is freed if only referenced by the current process. The free operation includes pmap_release(), which de-constructs the paging structures used by hardware. If the calling process is multithreaded, other threads are suspended in the thread_suspend_check(), and need to be unsuspended and run to be able to exit on successfull exec. Now, since the old vmspace is destroyed, paging structures are invalid, threads are resumed on the non-existent pmaps (page tables), which leads to triple fault on x86. To fix, postpone the free of old vmspace until the threads are resumed and exited. To avoid modifications to all image activators all of which use exec_new_vmspace(), memoize the current (old) vmspace in kern_execve(), and notify it about the need to call vmspace_free() with a thread-private flag TDP_EXECVMSPC. http://bugs.debian.org/743141 Reported by: Ivo De Decker <ivo.dedecker@ugent.be> through secteam Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
88b124ce |
|
18-Mar-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Make the array pointed to by AT_PAGESIZES auxv properly aligned. Also, remove the expression which calculated the location of the strings for a new image and grown over the time to be non-comprehensible. Instead, calculate the offsets by steps, which also makes fixing the alignments much cleaner. Reported and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
4a144410 |
|
16-Mar-2014 |
Robert Watson <rwatson@FreeBSD.org> |
Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
|
#
d9fae5ab |
|
26-Nov-2013 |
Andriy Gapon <avg@FreeBSD.org> |
dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks
|
#
54366c0b |
|
25-Nov-2013 |
Attilio Rao <attilio@FreeBSD.org> |
- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
|
#
eda6009c |
|
15-Oct-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Add a sysctl kern.disallow_high_osrel which disables executing the images compiled on the world with higher major version number than the high version number of the booted kernel. Default to disable. Sponsored by: The FreeBSD Foundation Discussed with: bapt MFC after: 1 week
|
#
7008be5b |
|
04-Sep-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
|
#
5944de8e |
|
22-Aug-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation
|
#
7b77e1fe |
|
14-Aug-2013 |
Mark Johnston <markj@FreeBSD.org> |
Specify SDT probe argument types in the probe definition itself rather than using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks
|
#
c7aebda8 |
|
09-Aug-2013 |
Attilio Rao <attilio@FreeBSD.org> |
The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
|
#
5df87b21 |
|
07-Aug-2013 |
Jeff Roberson <jeff@FreeBSD.org> |
Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
#
be996836 |
|
05-Aug-2013 |
Attilio Rao <attilio@FreeBSD.org> |
Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib
|
#
3b6714ca |
|
04-Aug-2013 |
Attilio Rao <attilio@FreeBSD.org> |
The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho
|
#
27a18d6a |
|
06-Jun-2013 |
Alan Cox <alc@FreeBSD.org> |
Don't busy the page unless we are likely to release the object lock. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division
|
#
1e65d73c |
|
02-Jun-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not map the shared page COW. If the process wired its address space, fork(2) would cause shadowing of the physical object and copying of the shared page into private copy, effectively preventing updates for the exported timehands structure and stopping the clock. Specify the maximum allowed permissions for the page to be read and execute, preventing write from the user mode. Reported and tested by: <huanghwh@yahoo.com> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
89f6b863 |
|
08-Mar-2013 |
Attilio Rao <attilio@FreeBSD.org> |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
#
2609222a |
|
01-Mar-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
#
888d4d4f |
|
07-Feb-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
When vforked child is traced, the debugging events are not generated until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks
|
#
140dedb8 |
|
02-Nov-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks
|
#
5050aa86 |
|
22-Oct-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
|
#
c331c970 |
|
06-Oct-2012 |
Andriy Gapon <avg@FreeBSD.org> |
ktrace/kern_exec: check p_tracecred instead of p_cred .. when deciding whether to continue tracing across suid/sgid exec. Otherwise if root ktrace-d an unprivileged process and the processed exec-ed a suid program, then tracing didn't continue across exec. Reviewed by: bde, kib MFC after: 22 days
|
#
877d24ac |
|
28-Sep-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
|
#
c8e781f6 |
|
27-Sep-2012 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Revert r240931, as the previous comment was actually in sync with POSIX. I have to note that POSIX is simply stupid in how it describes O_EXEC/fexecve and friends. Yes, not only inconsistent, but stupid. In the open(2) description, O_RDONLY flag is described as: O_RDONLY Open for reading only. Taken from: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html Note "for reading only". Not "for reading or executing"! In the fexecve(2) description you can find: The fexecve() function shall fail if: [EBADF] The fd argument is not a valid file descriptor open for executing. Taken from: http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html As you can see the function shall fail if the file was not open with O_EXEC! And yet, if you look closer you can find this mess in the exec.html: Since execute permission is checked by fexecve(), the file description fd need not have been opened with the O_EXEC flag. Yes, O_EXEC flag doesn't have to be specified after all. You can open a file with O_RDONLY and you still be able to fexecve(2) it.
|
#
3a038c4d |
|
25-Sep-2012 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
We cannot open file for reading and executing (O_RDONLY | O_EXEC). Well, in theory we can pass those two flags, because O_RDONLY is 0, but we won't be able to read from a descriptor opened with O_EXEC. Update the comment. Sponsored by: FreeBSD Foundation MFC after: 2 weeks
|
#
28a7f607 |
|
07-Jul-2012 |
Mateusz Guzik <mjg@FreeBSD.org> |
Unbreak handling of descriptors opened with O_EXEC by fexecve(2). While here return EBADF for descriptors opened for writing (previously it was ETXTBSY). Add fgetvp_exec function which performs appropriate checks. PR: kern/169651 In collaboration with: kib Approved by: trasz (mentor) MFC after: 1 week
|
#
a665ed98 |
|
23-Jun-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Move the code dealing with shared page into a dedicated kern_sharedpage.c source file from kern_exec.c. MFC after: 29 days
|
#
21c295ef |
|
23-Jun-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Stop updating the struct vdso_timehands from even handler executed in the scheduled task from tc_windup(). Do it directly from tc_windup in interrupt context [1]. Establish the permanent mapping of the shared page into the kernel address space, avoiding the potential need to sleep waiting for allocation of sf buffer during vdso_timehands update. As a consequence, shared_page_write_start() and shared_page_write_end() functions are not needed anymore. Guess and memorize the pointers to native host and compat32 sysentvec during initialization, to avoid the need to get shared_page_alloc_sx lock during the update. In tc_fill_vdso_timehands(), do not loop waiting for timehands generation to stabilize, since vdso_timehands is written in the same interrupt context which wrote timehands. Requested by: mav [1] MFC after: 29 days
|
#
aea81038 |
|
22-Jun-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month
|
#
a9d8437c |
|
22-Jun-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Enchance the shared page chunk allocator. Do not rely on the busy state of the page from which we allocate the chunk, to protect allocator state. Use statically allocated sx lock instead. Provide more flexible KPI. In particular, allow to allocate chunk without providing initial data, and allow writes into existing allocation. Allow to get an sf buf which temporary maps the chunk, to allow sequential updates to shared page content without unmapping in between. Reviewed by: jhb Tested by: flo MFC after: 1 month
|
#
44ad5475 |
|
08-Mar-2012 |
John Baldwin <jhb@FreeBSD.org> |
Add a new sched_clear_name() method to the scheduler interface to clear the cached name used for KTR_SCHED traces when a thread's name changes. This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's name changes, most commonly via execve(). MFC after: 2 weeks
|
#
2974cc36 |
|
19-Jan-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Use shared lock for the executable vnode in the exec path after the VV_TEXT changes are handled. Assert that vnode is exclusively locked at the places that modify VV_TEXT. Discussed with: alc MFC after: 3 weeks
|
#
ce8bd78b |
|
27-Sep-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not deliver SIGTRAP on exec as the normal signal, use ptracestop() on syscall exit path. Otherwise, if SIGTRAP is ignored, that tdsendsignal() do not want to deliver the signal, and debugger never get a notification of exec. Found and tested by: Anton Yuzhaninov <citrin citrin ru> Discussed with: jhb MFC after: 2 weeks
|
#
8451d0dd |
|
16-Sep-2011 |
Kip Macy <kmacy@FreeBSD.org> |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
|
#
a9d2f8d8 |
|
10-Aug-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
#
ff66f6a4 |
|
17-Jul-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Define two new sysctl node flags: CTLFLAG_CAPRD and CTLFLAG_CAPRW, which may be jointly referenced via the mask CTLFLAG_CAPRW. Sysctls with these flags are available in Capsicum's capability mode; other sysctl nodes are not. Flag several useful sysctls as available in capability mode, such as memory layout sysctls required by the run-time linker and malloc(3). Also expose access to randomness and available kernel features. A few sysctls are enabled to support name->MIB conversion; these may leak information to capability mode by virtue of providing resolution on names not flagged for access in capability mode. This is, generally, not a huge problem, but might be something to resolve in the future. Flag these cases with XXX comments. Submitted by: jonathan Sponsored by: Google, Inc.
|
#
12bc222e |
|
30-Jun-2011 |
Jonathan Anderson <jonathan@FreeBSD.org> |
Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c. Approved by: mentor(rwatson), re(bz)
|
#
7705d4b2 |
|
25-Feb-2011 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Introduce preliminary support of the show description of the ABI of traced process by adding two new events which records value of process sv_flags to the trace file at process creation/execing/exiting time. MFC after: 1 Month.
|
#
6297a3d8 |
|
08-Jan-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Create shared (readonly) page. Each ABI may specify the use of page by setting SV_SHP flag and providing pointer to the vm object and mapping address. Provide simple allocator to carve space in the page, tailored to put the code with alignment restrictions. Enable shared page use for amd64, both native and 32bit FreeBSD binaries. Page is private mapped at the top of the user address space, moving a start of the stack one page down. Move signal trampoline code from the top of the stack to the shared page. Reviewed by: alc
|
#
d680caab |
|
21-Oct-2010 |
John Baldwin <jhb@FreeBSD.org> |
- When disabling ktracing on a process, free any pending requests that may be left. This fixes a memory leak that can occur when tracing is disabled on a process via disabling tracing of a specific file (or if an I/O error occurs with the tracefile) if the process's next system call is exit(). The trace disabling code clears p_traceflag, so exit1() doesn't do any KTRACE-related cleanup leading to the leak. I chose to make the free'ing of pending records synchronous rather than patching exit1(). - Move KTRACE-specific logic out of kern_(exec|exit|fork).c and into kern_ktrace.c instead. Make ktrace_mtx private to kern_ktrace.c as a result. MFC after: 1 month
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
de478dd4 |
|
30-Aug-2010 |
Jaakko Heinonen <jh@FreeBSD.org> |
execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege. Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately. PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)
|
#
79856499 |
|
22-Aug-2010 |
Rui Paulo <rpaulo@FreeBSD.org> |
Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]
|
#
ee235bef |
|
17-Aug-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month
|
#
a14a9498 |
|
27-Jul-2010 |
Alan Cox <alc@FreeBSD.org> |
The interpreter name should no longer be treated as a buffer that can be overwritten. (This change should have been included in r210545.) Submitted by: kib
|
#
2af6e14d |
|
27-Jul-2010 |
Alan Cox <alc@FreeBSD.org> |
Introduce exec_alloc_args(). The objective being to encapsulate the details of the string buffer allocation in one place. Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string. Reviewed by: kib
|
#
9e4e5114 |
|
25-Jul-2010 |
Alan Cox <alc@FreeBSD.org> |
Change the order in which the file name, arguments, environment, and shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space. Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string. Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.) Reviewed by: kib
|
#
69a8f9e3 |
|
23-Jul-2010 |
Alan Cox <alc@FreeBSD.org> |
Eliminate a little bit of duplicated code.
|
#
e113db82 |
|
09-Jul-2010 |
John Baldwin <jhb@FreeBSD.org> |
Accidentally committed an older version of this comment rather than the final one.
|
#
07b18338 |
|
09-Jul-2010 |
John Baldwin <jhb@FreeBSD.org> |
Refine a comment. Reviewed by: bde
|
#
41890423 |
|
02-Jul-2010 |
Alan Cox <alc@FreeBSD.org> |
Use vm_page_next() instead of vm_page_lookup() in exec_map_first_page() because vm_page_next() is faster.
|
#
ad6eec7b |
|
29-Jun-2010 |
John Baldwin <jhb@FreeBSD.org> |
Tweak the in-kernel API for sending signals to threads: - Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c. - Add tdsignal() and tdksignal() routines that mirror psignal() and pksignal() except that they accept a thread as an argument instead of a process. They send a signal to a specific thread rather than to an individual process. Reviewed by: kib
|
#
afe1a688 |
|
23-May-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_*syscall* pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
|
#
eb00b276 |
|
06-May-2010 |
Alan Cox <alc@FreeBSD.org> |
Eliminate page queues locking around most calls to vm_page_free().
|
#
5ac59343 |
|
05-May-2010 |
Alan Cox <alc@FreeBSD.org> |
Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib
|
#
2965a453 |
|
29-Apr-2010 |
Kip Macy <kmacy@FreeBSD.org> |
On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
|
#
a0ea661f |
|
25-Mar-2010 |
Nathan Whitehorn <nwhitehorn@FreeBSD.org> |
Add the ELF relocation base to struct image_params. This will be required to correctly relocate the executable entry point's function descriptor on powerpc64.
|
#
a107d8aa |
|
25-Mar-2010 |
Nathan Whitehorn <nwhitehorn@FreeBSD.org> |
Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb
|
#
f4e26ade |
|
23-Mar-2010 |
Nathan Whitehorn <nwhitehorn@FreeBSD.org> |
The nargvstr and nenvstr properties of arginfo are ints, not longs, so should be copied to userspace with suword32() instead of suword(). This alleviates problems on 64-bit big-endian architectures, and is a no-op on all 32-bit architectures. Tested on: amd64, sparc64, powerpc64
|
#
49cc1344 |
|
21-Jan-2010 |
John Baldwin <jhb@FreeBSD.org> |
MFC 198411: - Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that created shadow copies of these arrays were just using MAXCOMLEN. - Prefer using sizeof() of an array type to explicit constants for the array length in a few places. - Ensure that all of p_comm[] and td_name[] is always zero'd during execve() to guard against any possible information leaks. Previously trailing garbage in p_comm[] could be leaked to userland in ktrace record headers via td_name[].
|
#
5ca4819d |
|
23-Oct-2009 |
John Baldwin <jhb@FreeBSD.org> |
- Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that created shadow copies of these arrays were just using MAXCOMLEN. - Prefer using sizeof() of an array type to explicit constants for the array length in a few places. - Ensure that all of p_comm[] and td_name[] is always zero'd during execve() to guard against any possible information leaks. Previously trailing garbage in p_comm[] could be leaked to userland in ktrace record headers via td_name[]. Reviewed by: bde
|
#
bc7f0010 |
|
02-Oct-2009 |
Simon L. B. Nielsen <simon@FreeBSD.org> |
MFC r197711: Add no zero mapping feature. NOTE: Unlike in the other branches where this change will be "merged" to, the 'no zero mapping' is enabled by default in stable/8. Errata: FreeBSD-EN-09:05.null Approved by: re (kib)
|
#
878adb85 |
|
02-Oct-2009 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Add a mitigation feature that will prevent user mappings at virtual address 0, limiting the ability to convert a kernel NULL pointer dereference into a privilege escalation attack. If the sysctl is set to 0 a newly started process will not be able to map anything in the address range of the first page (0 to PAGE_SIZE). This is the default. Already running processes are not affected by this. You can either change the sysctl or the tunable from loader in case you need to map at a virtual address of 0, for example when running any of the extinct species of a set of a.out binaries, vm86 emulation, .. In that case set security.bsd.map_at_zero="1". Superseeds: r197537 In collaboration with: jhb, kib, alc
|
#
d4c8e5ac |
|
12-Sep-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r197031: Unlock the image vnode around the call of pmc PMC_FN_PROCESS_EXEC hook. The hook calls vn_fullpath(9), that should not be executed with a vnode lock held. Approved by: re (kensmith)
|
#
1ef6ea9b |
|
09-Sep-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Unlock the image vnode around the call of pmc PMC_FN_PROCESS_EXEC hook. The hook calls vn_fullpath(9), that should not be executed with a vnode lock held. Reported by: Bruce Cran <bruce cran org uk> Tested by: pho MFC after: 3 days
|
#
87eca70e |
|
31-Jul-2009 |
John Baldwin <jhb@FreeBSD.org> |
Fix some LORs between vnode locks and filedescriptor table locks. - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week
|
#
b146fc1b |
|
28-Jul-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Rework vnode argument auditing to follow the same structure, in order to avoid exposing ARG_ macros/flag values outside of the audit code in order to name which one of two possible vnodes will be audited for a system call. Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 month
|
#
14961ba7 |
|
27-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
|
#
838d9858 |
|
19-Jun-2009 |
Brooks Davis <brooks@FreeBSD.org> |
Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
|
#
0a2e596a |
|
07-Jun-2009 |
Alan Cox <alc@FreeBSD.org> |
Eliminate unnecessary obfuscation when testing a page's valid bits.
|
#
d1a6e42d |
|
06-Jun-2009 |
Alan Cox <alc@FreeBSD.org> |
If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check the page's valid bits. The page is guaranteed to be fully valid. (For the record, this is documented in vm/vm_pager.h's comments.)
|
#
bcf11e8d |
|
05-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
|
#
3ff06357 |
|
16-Mar-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and compat32 binaries. Tested by: pho Reviewed by: kan
|
#
69c9eff8 |
|
26-Feb-2009 |
Ed Schouten <ed@FreeBSD.org> |
Remove unneeded pointer `ndp'. Inside do_execve(), we have a pointer `ndp', which always points to `&nd'. I can imagine a primitive (non-optimizing) compiler to really reserve space for such a pointer, so just remove the variable and use `&nd' directly.
|
#
c90c9021 |
|
26-Feb-2009 |
Ed Schouten <ed@FreeBSD.org> |
Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
|
#
aeb32571 |
|
05-Dec-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Several threads in a process may do vfork() simultaneously. Then, all parent threads sleep on the parent' struct proc until corresponding child releases the vmspace. Each sleep is interlocked with proc mutex of the child, that triggers assertion in the sleepq_add(). The assertion requires that at any time, all simultaneous sleepers for the channel use the same interlock. Silent the assertion by using conditional variable allocated in the child. Broadcast the variable event on exec() and exit(). Since struct proc * sleep wait channel is overloaded for several unrelated events, I was unable to remove wakeups from the places where cv_broadcast() is added, except exec(). Reported and tested by: ganbold Suggested and reviewed by: jhb MFC after: 2 week
|
#
e506f34b |
|
05-Nov-2008 |
Craig Rodrigues <rodrigc@FreeBSD.org> |
Merge latest DTrace changes from Perforce. Approved by: jb
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
dfa7fd1d |
|
10-Sep-2008 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove VSVTX, VSGID and VSUID. This should be a no-op, as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID. Approved by: rwatson (mentor)
|
#
0359a12e |
|
28-Aug-2008 |
Attilio Rao <attilio@FreeBSD.org> |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
#
3f397884 |
|
25-Aug-2008 |
Robert Watson <rwatson@FreeBSD.org> |
More fully audit fexecve(2) and its arguments. Obtained from: TrustedBSD Project Sponsored by: Google, Inc.
|
#
6356dba0 |
|
23-Aug-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Introduce two related changes to the TrustedBSD MAC Framework: (1) Abstract interpreter vnode labeling in execve(2) and mac_execve(2) so that the general exec code isn't aware of the details of allocating, copying, and freeing labels, rather, simply passes in a void pointer to start and stop functions that will be used by the framework. This change will be MFC'd. (2) Introduce a new flags field to the MAC_POLICY_SET(9) interface allowing policies to declare which types of objects require label allocation, initialization, and destruction, and define a set of flags covering various supported object types (MPC_OBJECT_PROC, MPC_OBJECT_VNODE, MPC_OBJECT_INPCB, ...). This change reduces the overhead of compiling the MAC Framework into the kernel if policies aren't loaded, or if policies require labels on only a small number or even no object types. Each time a policy is loaded or unloaded, we recalculate a mask of labeled object types across all policies present in the system. Eliminate MAC_ALWAYS_LABEL_MBUF option as it is no longer required. MFC after: 1 week ((1) only) Reviewed by: csjp Obtained from: TrustedBSD Project Sponsored by: Apple, Inc.
|
#
ded7d39c |
|
12-Aug-2008 |
Christian S.J. Peron <csjp@FreeBSD.org> |
Reduce the scope of the vnode lock such that it does not cover the various copyouts associated with initializing the process's argv/env data in userspace. It is possible that these copyout operations can fault under memory pressure, possibly resulting in dead locks. This is believed to be safe since none of the copyout_strings() operations need to interact with the vnode here. Submitted by: Zhouyi Zhou PR: kern/111260 Discussed with: kib MFC after: 3 weeks
|
#
58e8af1b |
|
25-Jul-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Call pargs_drop() unconditionally in do_execve(), the function correctly handles the NULL argument. Make pargs_free() static. MFC after: 1 week
|
#
9a75ea23 |
|
17-Jul-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Pair the VOP_OPEN call from do_execve() with the reciprocal VOP_CLOSE. This was unnoticed because local filesystems usually do nothing non-trivial in the close vop. Reported and tested by: Rick Macklem MFC after: 2 weeks
|
#
5d217f17 |
|
24-May-2008 |
John Birrell <jb@FreeBSD.org> |
Add DTrace 'proc' provider probes using the Statically Defined Trace (sdt) mechanism.
|
#
632dbc19 |
|
30-Mar-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement the fexecve(2) syscall. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
|
#
6617724c |
|
12-Mar-2008 |
Jeff Roberson <jeff@FreeBSD.org> |
Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
|
#
22db15c0 |
|
13-Jan-2008 |
Attilio Rao <attilio@FreeBSD.org> |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
#
cb05b60a |
|
09-Jan-2008 |
Attilio Rao <attilio@FreeBSD.org> |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
#
f8a47341 |
|
29-Dec-2007 |
Alan Cox <alc@FreeBSD.org> |
Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)
|
#
f231de47 |
|
03-Dec-2007 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement fetching of the __FreeBSD_version from the ELF ABI-tag note. The value is read into the p_osrel member of the struct proc. p_osrel is set to 0 for the binaries without the note. MFC after: 3 days
|
#
ca081fdb |
|
13-Nov-2007 |
Julian Elischer <julian@FreeBSD.org> |
Make sure there is a good default thread name for all threads.
|
#
89b57fcf |
|
05-Nov-2007 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
|
#
30d239bc |
|
24-Oct-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
#
7bfda801 |
|
25-Sep-2007 |
Alan Cox <alc@FreeBSD.org> |
Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
|
#
59d8f3ff |
|
12-Jul-2007 |
John Baldwin <jhb@FreeBSD.org> |
Fix a couple of issues with the stack limit for 32-bit processes on 64-bit kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk). Approved by: re (kensmith)
|
#
ce0be646 |
|
13-Jun-2007 |
John Baldwin <jhb@FreeBSD.org> |
Conditionally acquire Giant when dropping a reference on the ktrace vnode during execve() when turning off tracing due to executing a setuid binary as non-root. Previously this could fail to acquire Giant and fail an assertion if the ktrace file was on a non-MPSAFE filesystem and the executable was on an MPSAFE filesystem. MFC after: 3 days Reported by: kris
|
#
32f9753c |
|
11-Jun-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
|
#
9e223287 |
|
31-May-2007 |
Konstantin Belousov <kib@FreeBSD.org> |
Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
|
#
19059a13 |
|
14-May-2007 |
John Baldwin <jhb@FreeBSD.org> |
Rework the support for ABIs to override resource limits (used by 32-bit processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week
|
#
bd37fd72 |
|
25-Mar-2007 |
Kris Kennaway <kris@FreeBSD.org> |
Update a comment: we usually call exec_vmspace_new with Giant not held, but sometimes it is.
|
#
873fbcd7 |
|
05-Mar-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
|
#
0c14ff0e |
|
04-Mar-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
|
#
acd3428b |
|
06-Nov-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
#
2a53696f |
|
22-Oct-2006 |
Alan Cox <alc@FreeBSD.org> |
The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.
|
#
aed55708 |
|
22-Oct-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
#
9af80719 |
|
21-Oct-2006 |
Alan Cox <alc@FreeBSD.org> |
Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.
|
#
ae1078d6 |
|
01-Sep-2006 |
Wayne Salamon <wsalamon@FreeBSD.org> |
Audit the argv and env vectors passed in on exec: Add the argument auditing functions for argv and env. Add kernel-specific versions of the tokenizer functions for the arg and env represented as a char array. Implement the AUDIT_ARGV and AUDIT_ARGE audit policy commands to enable/disable argv/env auditing. Call the argument auditing from the exec system calls. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)
|
#
993182e5 |
|
14-Aug-2006 |
Alexander Leidinger <netchild@FreeBSD.org> |
- Change process_exec function handlers prototype to include struct image_params arg. - Change struct image_params to include struct sysentvec pointer and initialize it. - Change all consumers of process_exit/process_exec eventhandlers to new prototypes (includes splitting up into distinct exec/exit functions). - Add eventhandler to userret. Sponsored by: Google SoC 2006 Submitted by: rdivacky Parts suggested by: jhb (on hackers@)
|
#
4bb260ad |
|
28-May-2006 |
Robert Watson <rwatson@FreeBSD.org> |
In execve(), audit the path name being executed. In the future, it would also be good to audit the interpreter pathname, if any. Obtained from: TrustedBSD Project
|
#
d302786c |
|
05-May-2006 |
Tor Egge <tegge@FreeBSD.org> |
Temporarily unlock vnode for new image being executed to avoid lock order reversals that can lead to deadlocks. Normally vn_close(), namei() or vrele() should not be called while holding vnode locks.
|
#
b9eee07e |
|
03-Apr-2006 |
Peter Wemm <peter@FreeBSD.org> |
Remove the unused sva and eva arguments from pmap_remove_pages().
|
#
68ff3c24 |
|
08-Mar-2006 |
Stephan Uphoff <ups@FreeBSD.org> |
Fix exec_map resource leaks. Tested by: kris@
|
#
8917b8d2 |
|
06-Feb-2006 |
John Baldwin <jhb@FreeBSD.org> |
- Always call exec_free_args() in kern_execve() instead of doing it in all the callers if the exec either succeeds or fails early. - Move the code to call exit1() if the exec fails after the vmspace is gone to the bottom of kern_execve() to cut down on some code duplication.
|
#
68ce4375 |
|
02-Feb-2006 |
Jeff Roberson <jeff@FreeBSD.org> |
- textvp may have been from a different mountpoint than ndp->ni_vp and we may need to acquire giant to vrele it. Found by: mjacob MFC After: 3 days
|
#
05406e6f |
|
11-Dec-2005 |
Alan Cox <alc@FreeBSD.org> |
Remove unneeded calls to pmap_remove_all(). The given page is not mapped. Reviewed by: tegge
|
#
d26b1a1f |
|
08-Dec-2005 |
David Xu <davidxu@FreeBSD.org> |
Register itimers_event_hook as a kernel event handler, so I don't have to duplicate code to call it in exec() and exit1().
|
#
8ad398d0 |
|
06-Dec-2005 |
Alan Cox <alc@FreeBSD.org> |
Reduce the scope of the page queues lock in exec_map_first_page(). The vm object lock is sufficient for reading a page's PG_BUSY and busy flags. MFC after: 1 week
|
#
6d7b314b |
|
02-Nov-2005 |
David Xu <davidxu@FreeBSD.org> |
Cleanup some signal interfaces. Now the tdsignal function accepts both proc pointer and thread pointer, if thread pointer is NULL, tdsignal automatically finds a thread, otherwise it sends signal to given thread. Add utility function psignal_event to send a realtime sigevent to a process according to the delivery requirement specified in struct sigevent.
|
#
1471f287 |
|
02-Nov-2005 |
Paul Saab <ps@FreeBSD.org> |
Calling setrlimit from 32bit apps could potentially increase certain limits beyond what should be capiable in a 32bit process, so we must fixup the limits. Reviewed by: jhb
|
#
60354683 |
|
22-Oct-2005 |
David Xu <davidxu@FreeBSD.org> |
Make p_itimers as a pointer, so file sys/proc.h does not need to include sys/timers.h.
|
#
86857b36 |
|
22-Oct-2005 |
David Xu <davidxu@FreeBSD.org> |
Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.
|
#
9104847f |
|
13-Oct-2005 |
David Xu <davidxu@FreeBSD.org> |
1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
|
#
9f5c1d19 |
|
12-Oct-2005 |
Diomidis Spinellis <dds@FreeBSD.org> |
Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks
|
#
34ea500b |
|
03-Oct-2005 |
Don Lewis <truckman@FreeBSD.org> |
Add missing word to comment.
|
#
33812c06 |
|
02-Oct-2005 |
Colin Percival <cperciva@FreeBSD.org> |
If sufficiently bad things happen during a call to kern_execve(), it is possible for do_execve() to call exit1() rather than returning. As a result, the sequence "allocate memory; call kern_execve; free memory" can end up leaking memory. This commit documents this astonishing behaviour and adds a call to exec_free_args() before the exit1() call in do_execve(). Since all the users of kern_execve() in the tree use exec_free_args() to free the command-line arguments after kern_execve() returns, this should be safe, and it fixes the memory leak which can otherwise occur. Submitted by: Peter Holm MFC after: 3 days Security: Local denial of service
|
#
5997cae9 |
|
01-Oct-2005 |
Don Lewis <truckman@FreeBSD.org> |
Copy new process argument list in do_execve() before grabbing PROC_LOCK to avoid touching pageable memory while holding a mutex. Simplify argument list replacement logic. PR: kern/84935 Submitted by: "Antoine Pelisse" apelisse AT gmail.com (in a different form) MFC after: 3 days
|
#
15139246 |
|
30-Jun-2005 |
Joseph Koshy <jkoshy@FreeBSD.org> |
MFP4: - pmcstat(8) gprof output mode fixes: lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h: + Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event + Add an 'entryaddr' field to the PMCLOG_PROCEXEC event, so that pmcstat(8) can determine where the runtime loader /libexec/ld-elf.so.1 is getting loaded. sys/kern/kern_exec.c: + Use a local struct to group the entry address of the image being exec()'ed and the process credential changed flag to the exec handling hook inside hwpmc(4). usr.sbin/pmcstat/*: + Support "-k kernelpath", "-D sampledir". + Implement the ELF bits of 'gmon.out' profile generation in a new file "pmcstat_log.c". Move all log related functions to this file. + Move local definitions and prototypes to "pmcstat.h" - Other bug fixes: + lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read(). + sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all attached PMCs when a process exits. + sys/sys/pmc.h: correct a function prototype. + Improve usage checks in pmcstat(8). Approved by: re (blanket hwpmc)
|
#
4da0d332 |
|
23-Jun-2005 |
Peter Wemm <peter@FreeBSD.org> |
Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re
|
#
f263522a |
|
09-Jun-2005 |
Joseph Koshy <jkoshy@FreeBSD.org> |
MFP4: - Implement sampling modes and logging support in hwpmc(4). - Separate MI and MD parts of hwpmc(4) and allow sharing of PMC implementations across different architectures. Add support for P4 (EMT64) style PMCs to the amd64 code. - New pmcstat(8) options: -E (exit time counts) -W (counts every context switch), -R (print log file). - pmc(3) API changes, improve our ability to keep ABI compatibility in the future. Add more 'alias' names for commonly used events. - bug fixes & documentation.
|
#
6341095e |
|
31-May-2005 |
Ken Smith <kensmith@FreeBSD.org> |
This patch addresses a standards violation issue. The standards say a file's access time should be updated when it gets executed. A while ago the mechanism used to exec was changed to use a more mmap based mechanism and this behavior was broken as a side-effect of that. A new vnode flag is added that gets set when the file gets executed, and the VOP_SETATTR() vnode operation gets called. The underlying filesystem is expected to handle it based on its own semantics, some filesystems don't support access time at all. Those that do should handle it in a way that does not block, does not generate I/O if possible, etc. In particular vn_start_write() has not been called. The UFS code handles it the same way as it would normally handle the access time if a file was read - the IN_ACCESS flag gets set in the inode but no other action happens at this point. The actual time update will happen later during a sync (which handles all the necessary locking). Got me into this: cperciva Discussed with: a lot with bde, a little with kan Showed patches to: phk, jeffr, standards@, arch@ Minor discussion on: arch@
|
#
532c491b |
|
03-May-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Initialize vfslocked correctly early enough for MAC to compile. - Fix one place where we explicitly drop Giant! Pointy hat to: me Submitted by: Max Laier Warned by: Tinderbox
|
#
269576c8 |
|
03-May-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Use namei to acquire Giant for VFS if it is necessary. Drop the explicit Giant acquisition. - Remove GIANT_REQUIRED in the few remaining cases; the vm and vfs have both been locked.
|
#
4d7d2993 |
|
30-Apr-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Return EACCES if we're trying to exec on a vp with no object. Errno supplied by: cperciva
|
#
7625cbf3 |
|
27-Apr-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Pass the ISOPEN flag to namei so filesystems will know we're about to open them or otherwise access the data.
|
#
ebccf1e3 |
|
18-Apr-2005 |
Joseph Koshy <jkoshy@FreeBSD.org> |
Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)
|
#
90dc539b |
|
25-Feb-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
Welcome to the 21st century: increase MAXSHELLCMDLEN from 128 bytes to PAGE_SIZE. Unlike originator of the PR suggests retain MAXSHELLCMDLEN definition (he has been proposing to replace it with PAGE_SIZE everywhere), not only this reduced the diff significantly, but prevents code obfuscation and also allows to increase/decrease this parameter easily if needed. PR: kern/64196 Submitted by: Magnus Bäckström <b@etek.chalmers.se>
|
#
56c2262c |
|
29-Jan-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
Grrr, this committer needs to have a sleep. Remove lines from the previous delta not intended for public consumption. MFC after: 2 weeks
|
#
c30af532 |
|
29-Jan-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
Fix small non-conformance introduced in the previous commit: execve() is expected to return ENAMETOOLONG, not E2BIG if first argument doesn't fit into {PATH_MAX} bytes. MFC after: 2 weeks
|
#
610ecfe0 |
|
29-Jan-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
|
#
8516dd18 |
|
24-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Don't use VOP_GETVOBJECT, use vp->v_object directly.
|
#
9454b2d8 |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
c113083c |
|
14-Dec-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add new function fdunshare() which encapsulates the necessary light magic for ensuring that a process' filedesc is not shared with anybody. Use it in the two places which previously had private implmentations. This collects all fd_refcnt handling in kern_descrip.c
|
#
6004362e |
|
26-Nov-2004 |
David Schultz <das@FreeBSD.org> |
Don't include sys/user.h merely for its side-effect of recursively including other headers.
|
#
124e4c3b |
|
13-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST. Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.
|
#
598b7ec8 |
|
07-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Use more intuitive pointer for fdinit() and fdcopy(). Change fdcopy() to take unlocked filedesc.
|
#
a7bc3102 |
|
11-Oct-2004 |
Peter Wemm <peter@FreeBSD.org> |
Put on my peril sensitive sunglasses and add a flags field to the internal sysctl routines and state. Add some code to use it for signalling the need to downconvert a data structure to 32 bits on a 64 bit OS when requested by a 32 bit app. I tried to do this in a generic abi wrapper that intercepted the sysctl oid's, or looked up the format string etc, but it was a real can of worms that turned into a fragile mess before I even got it partially working. With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have it not abort. Things like netstat, ps, etc have a long way to go. This also fixes a bug in the kern.ps_strings and kern.usrstack hacks. These do matter very much because they are used by libc_r and other things.
|
#
84e0b075 |
|
07-Oct-2004 |
David Xu <davidxu@FreeBSD.org> |
Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX. Reviewed by: deischen
|
#
906ac69d |
|
05-Oct-2004 |
David Xu <davidxu@FreeBSD.org> |
In original kern_execve() code, at the start of the function, it forces all other threads to suicide, problem is execve() could be failed, and a failed execve() would change threaded process to unthreaded, this side effect is unexpected. The new code introduces a new single threading mode SINGLE_BOUNDARY, in the mode, all threads should suspend themself at user boundary except the singler. we can not use SINGLE_NO_EXIT because we want to start from a clean state if execve() is successful, suspending other threads at unknown point and later resuming them from there and forcing them to exit at user boundary may cause the process to start from a dirty state. If execve() is successful, current thread upgrades to SINGLE_EXIT mode and forces other threads to suicide at user boundary, otherwise, other threads will be resumed and their interrupted syscall will be restarted. Reviewed by: julian
|
#
63993cf0 |
|
23-Sep-2004 |
John Baldwin <jhb@FreeBSD.org> |
- Don't try to unlock Giant if single threading fails since we don't have it locked. - Unlock Giant before calling exit1() since exit1() does not require Giant.
|
#
2e2e32b2 |
|
21-Sep-2004 |
Julian Elischer <julian@FreeBSD.org> |
Revert the last change.. Better to kill all other threads than to panic the system if 2 threads call execve() at the same time. A better fix will be committed later. Note that this only affects the case where the execve fails.
|
#
29780059 |
|
21-Sep-2004 |
Julian Elischer <julian@FreeBSD.org> |
In a threaded process, don't kill off all the other threads until we have a reasonable chance that the eceve() is going to succeeed. I.e. wait until we've done the permission checks etc. MFC after: 1 week
|
#
ed062c8d |
|
04-Sep-2004 |
Julian Elischer <julian@FreeBSD.org> |
Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
|
#
ad3b9257 |
|
15-Aug-2004 |
John-Mark Gurney <jmg@FreeBSD.org> |
Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
|
#
56f21b9d |
|
26-Jul-2004 |
Colin Percival <cperciva@FreeBSD.org> |
Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
|
#
aa3c8c02 |
|
23-Jul-2004 |
Julian Elischer <julian@FreeBSD.org> |
White space fix.. diff reduction for upcoming commit.
|
#
ce8da309 |
|
12-Jul-2004 |
Alan Cox <alc@FreeBSD.org> |
Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
|
#
aa0aa7a1 |
|
02-Jun-2004 |
Tim J. Robbins <tjr@FreeBSD.org> |
Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it. Reviewed by: davidxu
|
#
702ac0f1 |
|
21-May-2004 |
David Xu <davidxu@FreeBSD.org> |
Clear KSE thread flags after KSE thread mode is ended. The side effect of not clearing the flags for execv() syscall will result that a new program runs in KSE thread mode without enabling it. Submitted by: tjr Modified by: davidxu
|
#
59c8bc40 |
|
22-Apr-2004 |
Alan Cox <alc@FreeBSD.org> |
Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
|
#
148b3f62 |
|
11-Apr-2004 |
Alan Cox <alc@FreeBSD.org> |
Use vm_page_hold() rather than vm_page_wire() for short-duration page wiring. The reason being that vm_page_hold() is cheaper.
|
#
2fc0588d |
|
31-Mar-2004 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Remove sysctl kern.ps_argsopen, it is not very useful, one should use security.bsd.see_other_uids instead. Discussed with: phk, rwatson
|
#
a5bdcb2a |
|
13-Mar-2004 |
Peter Wemm <peter@FreeBSD.org> |
Make the process_exit eventhandler run without Giant. Add Giant hooks in the two consumers that need it.. processes using AIO and netncp. Update docs. Say that process_exec is called with Giant, but not to depend on it. All our consumers can handle it without Giant.
|
#
37814395 |
|
13-Mar-2004 |
Peter Wemm <peter@FreeBSD.org> |
Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe
|
#
7700eb86 |
|
12-Mar-2004 |
Ruslan Ermilov <ru@FreeBSD.org> |
Do what the execve(2) manpage says and enforce what a Strictly Conforming POSIX application should do by disallowing the argv argument to be NULL. PR: kern/33738 Submitted by: Marc Olzheim, Serge van den Boom OK'ed by: nectar
|
#
8144e3b8 |
|
05-Mar-2004 |
John Baldwin <jhb@FreeBSD.org> |
Lock Giant around the single threading code in exec() to satisfy an assertion in the single threading code.
|
#
df7c361e |
|
17-Feb-2004 |
Peter Wemm <peter@FreeBSD.org> |
Checkpoint a hack to enable running i386 libc_r binaries on a 64 bit kernel. I'm not happy with it yet - refinements are to come. This hack allows the kern.ps_strings and kern.usrstack sysctls to respond to a 32 bit request, such as those coming from emulated i386 binaries.
|
#
d6c847f3 |
|
27-Dec-2003 |
Bruce Evans <bde@FreeBSD.org> |
Fixed some style bugs (mainly, try to always use explicit comparisons with NULL when checking for null pointers).
|
#
ca46e90e |
|
27-Dec-2003 |
Bruce Evans <bde@FreeBSD.org> |
Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall function back to near the beginning of the file. Rev.1.194 moved it into the middle of auxiliary functions following kern_execve(). Moved the __mac_execve() syscall function up together with execve(). It was new in rev1.1.196 and perfectly misplaced after execve().
|
#
34d26757 |
|
27-Dec-2003 |
Alan Cox <alc@FreeBSD.org> |
Remove GIANT_REQUIRED from exec_unmap_first_page().
|
#
eca8a663 |
|
11-Nov-2003 |
Robert Watson <rwatson@FreeBSD.org> |
Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
9ee99eb4 |
|
20-Oct-2003 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.
|
#
bab1f052 |
|
19-Oct-2003 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Put the RSE backing store at a fixed address. This change is triggered by libguile that needs to know the base of the RSE backing store. We currently do not export the fixed address to userland by means of a sysctl so user code needs to hardcode it for now. This will be revisited later. The RSE backing store is now at the bottom of region 4. The memory stack is at the top of region 4. This means that the whole region is usable for the stacks, giving a 61-bit stack space. Port: lang/guile (depended of x11/gnome2)
|
#
6ec2fca5 |
|
04-Oct-2003 |
Alan Cox <alc@FreeBSD.org> |
Eliminate some unnecessary uses of the vm page queues lock around the vm page's valid field. This field is being synchronized using the containing vm object's lock.
|
#
c31f2280 |
|
27-Sep-2003 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Remove the regstkpages sysctl variable. We have a growable register stack now.
|
#
fd75d710 |
|
27-Sep-2003 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Part 2 of implementing rstacks: add the ability to create rstacks and use the ability on ia64 to map the register stack. The orientation of the stack (i.e. its grow direction) is passed to vm_map_stack() in the overloaded cow argument. Since the grow direction is represented by bits, it is possible and allowed to create bi-directional stacks. This is not an advertised feature, more of a side-effect. Fix a bug in vm_map_growstack() that's specific to rstacks and which we could only find by having the ability to create rstacks: when the mapped stack ends at the faulting address, we have not actually mapped the faulting address. we need to include or cover the faulting address. Note that at this time mmap(2) has not been extended to allow the creation of rstacks by processes. If such a need arises, this can be done. Tested on: alpha, i386, ia64, sparc64
|
#
c460ac3a |
|
24-Sep-2003 |
Peter Wemm <peter@FreeBSD.org> |
Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
|
#
a8d43c90 |
|
26-Jul-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file *" since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/* For now pass -1 all over the place.
|
#
0e2a4d3a |
|
14-Jun-2003 |
David Xu <davidxu@FreeBSD.org> |
Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.
|
#
8630c117 |
|
12-Jun-2003 |
Alan Cox <alc@FreeBSD.org> |
Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.
|
#
677b542e |
|
10-Jun-2003 |
David E. O'Brien <obrien@FreeBSD.org> |
Use __FBSDID().
|
#
06fa71cd |
|
09-Jun-2003 |
Alan Cox <alc@FreeBSD.org> |
Update the vm object and page locking in exec_map_first_page(). Mark the one still anticipated change with XXX. Otherwise, this function is done.
|
#
fd0cc9a8 |
|
08-Jun-2003 |
Alan Cox <alc@FreeBSD.org> |
Lock the vm object when performing vm_page_grab().
|
#
90af4afa |
|
13-May-2003 |
John Baldwin <jhb@FreeBSD.org> |
- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
|
#
2c10d16a |
|
31-Mar-2003 |
Jeff Roberson <jeff@FreeBSD.org> |
- Borrow the KSE single threading code for exec and exit. We use the check if (p->p_numthreads > 1) and not a flag because action is only necessary if there are other threads. The rest of the system has no need to identify thr threaded processes. - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED is not set.
|
#
75b8b3b2 |
|
24-Mar-2003 |
John Baldwin <jhb@FreeBSD.org> |
Replace the at_fork, at_exec, and at_exit functions with the slightly more flexible process_fork, process_exec, and process_exit eventhandlers. This reduces code duplication and also means that I don't have to go duplicate the eventhandler locking three more times for each of at_fork, at_exec, and at_exit. Reviewed by: phk, jake, almost complete silence on arch@
|
#
a5881ea5 |
|
13-Mar-2003 |
John Baldwin <jhb@FreeBSD.org> |
- Cache a reference to the credential of the thread that starts a ktrace in struct proc as p_tracecred alongside the current cache of the vnode in p_tracep. This credential is then used for all later ktrace operations on this file rather than using the credential of the current thread at the time of each ktrace event. - Now that we have multiple ktrace-related items in struct proc that are pointers, rename p_tracep to p_tracevp to make it less ambiguous. Requested by: rwatson (1)
|
#
ac2e4153 |
|
26-Feb-2003 |
Julian Elischer <julian@FreeBSD.org> |
Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
|
#
a163d034 |
|
18-Feb-2003 |
Warner Losh <imp@FreeBSD.org> |
Back out M_* changes, per decision of the TRB. Approved by: trb
|
#
5215b187 |
|
16-Feb-2003 |
Jeff Roberson <jeff@FreeBSD.org> |
- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu
|
#
6f8132a8 |
|
31-Jan-2003 |
Julian Elischer <julian@FreeBSD.org> |
Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
|
#
0dbb100b |
|
26-Jan-2003 |
David Xu <davidxu@FreeBSD.org> |
Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
|
#
44956c98 |
|
21-Jan-2003 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
ec35c2af |
|
20-Jan-2003 |
Robert Watson <rwatson@FreeBSD.org> |
Perform VOP_GETATTR() before mac_check_vnode_exec() so that the cached attributes are available to MAC modules. Submitted by: mike halderman <mrh@nosc.mil> Obtained from: TrustedBSD Project
|
#
3db161e0 |
|
13-Jan-2003 |
Matthew Dillon <dillon@FreeBSD.org> |
It is possible for an active aio to prevent shared memory from being dereferenced when a process exits due to the vmspace ref-count being bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead of a process and call it in vmspace_dofree(). This way if it is missed in exit1()'s early-resource-free it will still be caught when the zombie is reaped. Also fix a potential race in shmexit_myhook() by NULLing out vmspace->vm_shm prior to calling shm_delete_mapping() and free(). MFC after: 7 days
|
#
45f603e2 |
|
06-Jan-2003 |
David Xu <davidxu@FreeBSD.org> |
Clear some KSE fields after kse mode was turned off.
|
#
5dadd17b |
|
04-Jan-2003 |
Jake Burkholder <jake@FreeBSD.org> |
Add a sysctl to get the vm protections for the stack of the current process. On architectures with a non-executable stack, eg sparc64, this is used by libgcc to determine at runtime if its necessary to enable execute permissions on a region of the stack which will be used to execute code, allowing the call to mprotect to be avoided if the kernel is configured to map the stack executable.
|
#
c522c1bf |
|
31-Dec-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
fdcopy() only needs a filedesc pointer.
|
#
ee113343 |
|
18-Dec-2002 |
Alan Cox <alc@FreeBSD.org> |
Hold the page queues lock when performing vm_page_busy().
|
#
b80521fe |
|
13-Dec-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
remove syscallarg(). Suggested by: peter
|
#
d1989db5 |
|
26-Nov-2002 |
Robert Drehmel <robert@FreeBSD.org> |
To avoid sleeping with all sorts of resources acquired (the reported problem was a locked directory vnode), do not give the process a chance to sleep in state "stopevent" (depends on the S_EXEC bit being set in p_stops) until most resources have been released again. Approved by: re
|
#
2d21129d |
|
24-Nov-2002 |
Alan Cox <alc@FreeBSD.org> |
Acquire and release the page queues lock around pmap_remove_pages() because it updates several of vm_page's fields.
|
#
a9a08882 |
|
17-Nov-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Release the imgp vnode prior to freeing exec_map resources to avoid deadlock.
|
#
4fec79be |
|
16-Nov-2002 |
Alan Cox <alc@FreeBSD.org> |
Now that pmap_remove_all() is exported by our pmap implementations use it directly.
|
#
d154fb4f |
|
10-Nov-2002 |
Alan Cox <alc@FreeBSD.org> |
When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than indirectly through vm_page_protect(). The one remaining page flag that is updated by vm_page_protect() is already being updated by our various pmap implementations. Note: A later commit will similarly change the VM_PROT_READ case and eliminate vm_page_protect().
|
#
0c93266b |
|
05-Nov-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Correct merge-o: disable the right execve() variation if !MAC
|
#
670cb89b |
|
05-Nov-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Bring in two sets of changes: (1) Permit userland applications to request a change of label atomic with an execve() via mac_execve(). This is required for the SEBSD port of SELinux/FLASK. Attempts to invoke this without MAC compiled in result in ENOSYS, as with all other MAC system calls. Complexity, if desired, is present in policy modules, rather than the framework. (2) Permit policies to have access to both the label of the vnode being executed as well as the interpreter if it's a shell script or related UNIX nonsense. Because we can't hold both vnode locks at the same time, cache the interpreter label. SEBSD relies on this because it supports secure transitioning via shell script executables. Other policies might want to take both labels into account during an integrity or confidentiality decision at execve()-time. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
ccafe7eb |
|
05-Nov-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Hook up the mac_will_execve_transition() and mac_execve_transition() entrypoints, #ifdef MAC. The supporting logic already existed in kern_mac.c, so no change there. This permits MAC policies to cause a process label change as the result of executing a binary -- typically, as a result of executing a specially labeled binary. For example, the SEBSD port of SELinux/FLASK uses this functionality to implement TE type transitions on processes using transitioning binaries, in a manner similar to setuid. Policies not implementing a notion of transition (all the ones in the tree right now) require no changes, since the old label data is copied to the new label via mac_create_cred() even if a transition does occur. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
450ffb44 |
|
04-Nov-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Remove reference to struct execve_args from struct imgact, which describes an image activation instance. Instead, make use of the existing fname structure entry, and introduce two new entries, userspace_argv, and userspace_envv. With the addition of mac_execve(), this divorces the image structure from the specifics of the execve() system call, removes a redundant pointer, etc. No semantic change from current behavior, but it means that the structure doesn't depend on syscalls.master-generated includes. There seems to be some redundant initialization of imgact entries, which I have maintained, but which could probably use some cleaning up at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
e1b1aa3b |
|
11-Oct-2002 |
John Baldwin <jhb@FreeBSD.org> |
- Move the 'done1' label down below the unlock of the proc lock and move the locking of the proc lock after the goto to done1 to avoid locking the lock in an error case just so we can turn around and unlock it. - Move the exec_setregs() stuff out from under the proc lock and after the p_args stuff. This allows exec_setregs() to be able to sleep or write things out to userland, etc. which ia64 does. Tested by: peter
|
#
05ba50f5 |
|
21-Sep-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
|
#
c1e2d386 |
|
14-Sep-2002 |
Nate Lawson <njl@FreeBSD.org> |
Move setugidsafety() call outside of process lock. This prevents a lock recursion when closef() calls pfind() which also wants the proc lock. This case only occurred when setugidsafety() needed to close unsafe files. Reviewed by: truckman
|
#
28b325aa |
|
13-Sep-2002 |
Don Lewis <truckman@FreeBSD.org> |
Drop the proc lock while calling fdcheckstd() which may block to allocate memory. Reviewed by: jhb
|
#
1279572a |
|
05-Sep-2002 |
David Xu <davidxu@FreeBSD.org> |
s/SGNL/SIG/ s/SNGL/SINGLE/ s/SNGLE/SINGLE/ Fix abbreviation for P_STOPPED_* etc flags, in original code they were inconsistent and difficult to distinguish between them. Approved by: julian (mentor)
|
#
f36ba452 |
|
01-Sep-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to sysentvec. Initialized all fields of all sysentvecs, which will allow them to be used instead of constants in more places. Provided stack fixup routines for emulations that previously used the default.
|
#
bafbd492 |
|
29-Aug-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Renamed poorly named setregs to exec_setregs. Moved its prototype to imgact.h with the other exec support functions.
|
#
f3bec5d7 |
|
28-Aug-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Don't require that sysentvec.sv_szsigcode be non-NULL.
|
#
81f223ca |
|
25-Aug-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Fixed most indentation bugs.
|
#
ca0387ef |
|
25-Aug-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Fixed placement of operators. Wrapped long lines.
|
#
fd559a8a |
|
24-Aug-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Fixed white space around operators, casts and reserved words. Reviewed by: md5
|
#
a7cddfed |
|
24-Aug-2002 |
Jake Burkholder <jake@FreeBSD.org> |
return x; -> return (x); return(x); -> return (x); Reviewed by: md5
|
#
49539972 |
|
22-Aug-2002 |
Julian Elischer <julian@FreeBSD.org> |
slight cleanup of single-threading code for KSE processes
|
#
619eb6e5 |
|
13-Aug-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Hold the vnode lock throughout execve. - Set VV_TEXT in the top level execve code. - Fixup the image activators to deal with the newly locked vnode.
|
#
339b79b9 |
|
01-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Introduce support for Mandatory Access Control and extensible kernel access control. Invoke an appropriate MAC entry point to authorize execution of a file by a process. The check is placed slightly differently than it appears in the trustedbsd_mac tree so that it prevents a little more information leakage about the target of the execve() operation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
89ab9307 |
|
30-Jul-2002 |
Jacques Vidrine <nectar@FreeBSD.org> |
For processes which are set-user-ID or set-group-ID, the kernel performs a few special actions for safety. One of these is to make sure that file descriptors 0..2 are in use, by opening /dev/null for those that are not already open. Another is to close any file descriptors 0..2 that reference procfs. However, these checks were made out of order, so that it was still possible for a set-user-ID or set-group-ID process to be started with some of the file descriptors 0..2 unused. Submitted by: Georgi Guninski <guninski@guninski.com>
|
#
d06c0d4d |
|
27-Jul-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Slight restructuring of the logic for credential change case identification during execve() to use a 'credential_changing' variable. This makes it easier to have outstanding patchsets against this code, as well as to add conditionally defined clauses. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
3ebc1248 |
|
19-Jul-2002 |
Peter Wemm <peter@FreeBSD.org> |
Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable handler in the kernel at the same time. Also, allow for the exec_new_vmspace() code to build a different sized vmspace depending on the executable environment. This is a big help for execing i386 binaries on ia64. The ELF exec code grows the ability to map partial pages when there is a page size difference, eg: emulating 4K pages on 8K or 16K hardware pages. Flesh out the i386 emulation support for ia64. At this point, the only binary that I know of that fails is cvsup, because the cvsup runtime tries to execute code in pages not marked executable. Obtained from: dfr (mostly, many tweaks from me).
|
#
b3afd20d |
|
14-Jul-2002 |
Alan Cox <alc@FreeBSD.org> |
In execve(), delay the acquisition of Giant until after kmem_alloc_wait(). (Operations on the exec_map don't require Giant.)
|
#
63c9e754 |
|
12-Jul-2002 |
John Baldwin <jhb@FreeBSD.org> |
We don't need to clear oldcred here since newcred is not NULL yet.
|
#
97646f56 |
|
11-Jul-2002 |
Alan Cox <alc@FreeBSD.org> |
o Lock accesses to the page queues.
|
#
0b2ed1ae |
|
06-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
Clean up execve locking: - Grab the vnode object early in exec when we still have the vnode lock. - Cache the object in the image_params. - Make use of the cached object in imgact_*.c
|
#
c781aea8 |
|
01-Jul-2002 |
Peter Wemm <peter@FreeBSD.org> |
#include <sys/ktrace.h> would be useful too. (for ktrace_mtx)
|
#
1e9b3d91 |
|
01-Jul-2002 |
Peter Wemm <peter@FreeBSD.org> |
Add #include "opt_ktrace.h"
|
#
e602ba25 |
|
29-Jun-2002 |
Julian Elischer <julian@FreeBSD.org> |
Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
7f05b035 |
|
28-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.
|
#
366838dd |
|
25-Jun-2002 |
Alan Cox <alc@FreeBSD.org> |
o Eliminate vmspace::vm_minsaddr. It's initialized but never used. o Replace stale comments in vmspace by "const until freed" annotations on some fields.
|
#
69be5db9 |
|
20-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Don't leak resources if fdcheckstd() fails during exec. Submitted by: Mike Makonnen <makonnen@pacbell.net>
|
#
1419eacb |
|
19-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Squish the "could sleep with process lock" messages caused by calling uifind() with a proc lock held. change_ruid() and change_euid() have been modified to take a uidinfo structure which will be pre-allocated by callers, they will then call uihold() on the uidinfo structure so that the caller's logic is simplified. This allows one to call uifind() before locking the proc struct and thereby avoid a potential blocking allocation with the proc lock held. This may need revisiting, perhaps keeping a spare uidinfo allocated per process to handle this situation or re-examining if the proc lock needs to be held over the entire operation of changing real or effective user id. Submitted by: Don Lewis <dl-freebsd@catspoiler.org>
|
#
6c84de02 |
|
06-Jun-2002 |
John Baldwin <jhb@FreeBSD.org> |
Properly lock accesses to p_tracep and p_traceflag. Also make a few ktrace-only things #ifdef KTRACE that were not before.
|
#
9b3b1c5f |
|
02-May-2002 |
John Baldwin <jhb@FreeBSD.org> |
- Reorder execve() so that it performs blocking operations before it locks the process. - Defer other blocking operations such as vrele()'s until after we release locks. - execsigs() now requires the proc lock to be held when it is called rather than locking the process internally.
|
#
e983a376 |
|
18-Apr-2002 |
Jacques Vidrine <nectar@FreeBSD.org> |
When exec'ing a set[ug]id program, make sure that the stdio file descriptors (0, 1, 2) are allocated by opening /dev/null for any which are not already open. Reviewed by: alfred, phk MFC after: 2 days
|
#
911fc923 |
|
04-Apr-2002 |
Peter Wemm <peter@FreeBSD.org> |
Increase the size of the register stack storage on ia64 from 32K to 2MB so that we can compile gcc. This is a hack because it adds a fixed 2MB to each process's VSIZE regardless of how much is really being used since there is no grow-up stack support. At least it isn't physical memory. Sigh. Add a sysctl to enable tweaking it for new processes.
|
#
44731cab |
|
01-Apr-2002 |
John Baldwin <jhb@FreeBSD.org> |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
|
#
5e20c11f |
|
30-Mar-2002 |
Alan Cox <alc@FreeBSD.org> |
Add a local proc *p in exec_new_vmspace() to avoid repeated dereferencing to obtain it.
|
#
8899023f |
|
27-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Make the reference counting of 'struct pargs' SMP safe. There is still some locations where the PROC lock should be held in order to prevent inconsistent views from outside (like the proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be fixed later. Submitted by: Jonathan Mini <mini@haikugeek.com>
|
#
cb100b25 |
|
26-Mar-2002 |
Alan Cox <alc@FreeBSD.org> |
Remove an unnecessary and inconsistently used variable from exec_new_vmspace().
|
#
4d77a549 |
|
19-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove __P.
|
#
ac59490b |
|
16-Mar-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/ pmap_qremove. pmap_kenter is not safe to use in MI code because it is not guaranteed to flush the mapping from the tlb on all cpus. If the process in question is preempted and migrates cpus between the call to pmap_kenter and pmap_kremove, the original cpu will be left with stale mappings in its tlb. This is currently not a problem for i386 because we do not use PG_G on SMP, and thus all mappings are flushed from the tlb on context switches, not just user mappings. This is not the case on all architectures, and if PG_G is to be used with SMP on i386 it will be a problem. This was committed by peter earlier as part of his fine grained tlb shootdown work for i386, which was backed out for other reasons. Reviewed by: peter
|
#
0cf3c909 |
|
27-Feb-2002 |
Warner Losh <imp@FreeBSD.org> |
Remove now unused struct proc *p. Approved by: jhb
|
#
a854ed98 |
|
27-Feb-2002 |
John Baldwin <jhb@FreeBSD.org> |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
#
d1693e17 |
|
27-Feb-2002 |
Peter Wemm <peter@FreeBSD.org> |
Back out all the pmap related stuff I've touched over the last few days. There is some unresolved badness that has been eluding me, particularly affecting uniprocessor kernels. Turning off PG_G helped (which is a bad sign) but didn't solve it entirely. Userland programs still crashed.
|
#
bd1e3a0f |
|
26-Feb-2002 |
Peter Wemm <peter@FreeBSD.org> |
Jake further reduced IPI shootdowns on sparc64 in loops by using ranged shootdowns in a couple of key places. Do the same for i386. This also hides some physical addresses from higher levels and has it use the generic vm_page_t's instead. This will help for PAE down the road. Obtained from: jake (MI code, suggestions for MD part)
|
#
079b7bad |
|
07-Feb-2002 |
Julian Elischer <julian@FreeBSD.org> |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
#
6f5dafea |
|
13-Jan-2002 |
Alan Cox <alc@FreeBSD.org> |
o Call the functions registered with at_exec() from exec_new_vmspace() instead of execve(). Otherwise, the possibility still exists for a pending AIO to modify the new address space. Reviewed by: alfred
|
#
426da3bc |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file *fp); /* increments reference count on a file */ struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */ struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */ struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
21d56e9c |
|
29-Dec-2001 |
Alfred Perlstein <alfred@FreeBSD.org> |
Make AIO a loadable module. Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO will use at_exit(9). Add functions at_exec(9), rm_at_exec(9) which function nearly the same as at_exec(9) and rm_at_exec(9), these functions are called on behalf of modules at the time of execve(2) after the image activator has run. Use a modified version of tegge's suggestion via at_exec(9) to close an exploitable race in AIO. Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral, the problem was that one had to pass it a paramater indicating the number of arguments which were actually the number of "int". Fix it by using an inline version of the AS macro against the syscall arguments. (AS should be available globally but we'll get to that later.) Add a primative system for dynamically adding kqueue ops, it's really not as sophisticated as it should be, but I'll discuss with jlemon when he's around.
|
#
91f91617 |
|
09-Dec-2001 |
David E. O'Brien <obrien@FreeBSD.org> |
Repeat after me -- "Use of ANSI string concatenation can be bad." In this case, C99's __func__ is properly defined as: static const char __func__[] = "function-name"; and GCC 3.1 will not allow it to be used in bogus string concatenation.
|
#
c3699b5f |
|
07-Nov-2001 |
Peter Wemm <peter@FreeBSD.org> |
For what its worth, sync up the type of ps_arg_cache_max (unsigned long) with the sysctl type (signed long).
|
#
9ca45e81 |
|
27-Oct-2001 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Add a P_INEXEC flag that indicates that the process has called execve() and it has not yet returned. Use this flag to deny debugging requests while the process is execve()ing, and close once and for all any race conditions that might occur between execve() and various debugging interfaces. Reviewed by: jhb, rwatson
|
#
9a024fc5 |
|
24-Oct-2001 |
Robert Drehmel <robert@FreeBSD.org> |
Use vm_offset_t instead of caddr_t to fix a warning and remove two casts.
|
#
79deba82 |
|
23-Oct-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix ktrace enablement/disablement races that can result in a vnode ref count panic. Bug noticed by: ps Reviewed by: ps MFC after: 1 day
|
#
cbc89bfb |
|
10-Oct-2001 |
Paul Saab <ps@FreeBSD.org> |
Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable. Reviewed by: peter MFC after: 2 weeks
|
#
e913ca22 |
|
10-Oct-2001 |
Doug Rabson <dfr@FreeBSD.org> |
Move setregs() out from under the PROC_LOCK so that it can use functions list suword() which may trap.
|
#
8688bb93 |
|
09-Oct-2001 |
John Baldwin <jhb@FreeBSD.org> |
proces -> process in a comment.
|
#
b40ce416 |
|
12-Sep-2001 |
Julian Elischer <julian@FreeBSD.org> |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
|
#
116734c4 |
|
31-Aug-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Pushdown Giant for acct(), kqueue(), kevent(), execve(), fork(), vfork(), rfork(), jail().
|
#
b8c526df |
|
22-Aug-2001 |
Alexander Langer <alex@FreeBSD.org> |
Fix a simple typo I just happened to find.
|
#
b2c3fa70 |
|
10-Jul-2001 |
Dima Dorfman <dd@FreeBSD.org> |
Correct spelling in a comment and remove trailing newline from a panic() call (panic() adds it itself).
|
#
333ea485 |
|
09-Jul-2001 |
Guido van Rooij <guido@FreeBSD.org> |
Don't share sig handlers after an exec Reviewed by: Alfred Perlstein
|
#
0cddd8f0 |
|
04-Jul-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
#
fbd26f75 |
|
20-Jun-2001 |
John Baldwin <jhb@FreeBSD.org> |
Fix some lock order reversals where we called free() while holding a proc lock. We now use temporary variables to save the process argument pointer and just update the pointer while holding the lock. We then perform the free on the cached pointer after releasing the lock.
|
#
b85db196 |
|
16-Jun-2001 |
Peter Wemm <peter@FreeBSD.org> |
Move setugid() a little sooner to before we release tracing in case crdup() or change_e*id() block on malloc() or mutex.
|
#
7cb8e4d2 |
|
26-May-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o pcred-removal changes included modifications to optimize the setting of the saved uid and gid during execve(). Unfortunately, the optimizations were incorrect in the case where the credential was updated, skipping the setting of the saved uid and gid when new credentials were generated. This change corrects that problem by handling the newcred!=NULL case correctly. Reported/tested by: David Malone <dwmalone@maths.tcd.ie> Obtained from: TrustedBSD Project
|
#
b1fc0ec1 |
|
25-May-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
|
#
d8aad40c |
|
21-May-2001 |
John Baldwin <jhb@FreeBSD.org> |
Axe unneeded spl()'s.
|
#
23955314 |
|
18-May-2001 |
Alfred Perlstein <alfred@FreeBSD.org> |
Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
|
#
fb919e4d |
|
01-May-2001 |
Mark Murray <markm@FreeBSD.org> |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
|
#
60fb0ce3 |
|
28-Apr-2001 |
Greg Lehey <grog@FreeBSD.org> |
Revert consequences of changes to mount.h, part 2. Requested by: bde
|
#
d98dc34f |
|
23-Apr-2001 |
Greg Lehey <grog@FreeBSD.org> |
Correct #includes to work with fixed sys/mount.h.
|
#
e65897c3 |
|
06-Mar-2001 |
John Baldwin <jhb@FreeBSD.org> |
Proc locking.
|
#
1a6e52d0 |
|
06-Feb-2001 |
Jeroen Ruigrok van der Werven <asmodai@FreeBSD.org> |
Fix typo: seperate -> separate. Seperate does not exist in the english language.
|
#
98f03f90 |
|
23-Dec-2000 |
Jake Burkholder <jake@FreeBSD.org> |
Protect proc.p_pptr and proc.p_children/p_sibling with the proctree_lock. linprocfs not locked pending response from informal maintainer. Reviewed by: jhb, -smp@
|
#
cf64863a |
|
30-Nov-2000 |
Robert Watson <rwatson@FreeBSD.org> |
o Add a comment to exec_check_permissions() to indicate that the passed vnode must be locked; this is the case because of calls to VOP_GETATTR(), VOP_ACCESS(), and VOP_OPEN(). This becomes more of an issue when VOP_ACCESS() gets a bit more complicated, which it does when you introduce ACL, Capability, and MAC support. Obtained from: TrustedBSD Project
|
#
35e0e5b3 |
|
20-Oct-2000 |
John Baldwin <jhb@FreeBSD.org> |
Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
|
#
63c47a5c |
|
12-Oct-2000 |
Doug Rabson <dfr@FreeBSD.org> |
Add a gross hack for ia64 to allocate the backing store for a new program.
|
#
b9a22da4 |
|
25-Sep-2000 |
Takanori Watanabe <takawata@FreeBSD.org> |
Make size of dynamic loader argument variable to support various executable file format. Reviewed by: peter
|
#
eabc23ef |
|
21-Sep-2000 |
Don Lewis <truckman@FreeBSD.org> |
Remove unneeded #include that was a remnant of an earlier version of my uidinfo patch. Found by: phk
|
#
621dbe43 |
|
16-Sep-2000 |
Bruce Evans <bde@FreeBSD.org> |
Added used include of <sys/mutex.h> (don't depend on pollution in <sys/signalvar.h>).
|
#
9ff5ce6b |
|
12-Sep-2000 |
Boris Popov <bp@FreeBSD.org> |
Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
|
#
f535380c |
|
05-Sep-2000 |
Don Lewis <truckman@FreeBSD.org> |
Remove uidinfo hash table lookup and maintenance out of chgproccnt() and chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().
|
#
9701cd40 |
|
05-Jul-2000 |
John Baldwin <jhb@FreeBSD.org> |
Support for unsigned integer and long sysctl variables. Update the SYSCTL_LONG macro to be consistent with other integer sysctl variables and require an initial value instead of assuming 0. Update several sysctl variables to use the unsigned types. PR: 15251 Submitted by: Kelly Yancey <kbyanc@posi.net>
|
#
2c9b67a8 |
|
30-Apr-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
|
#
d323ddf3 |
|
26-Apr-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix #! script exec under linux emulation. If a script is exec'd from a program running under linux emulation, the script binary is checked for in /compat/linux first. Without this patch the wrong script binary (i.e. the FreeBSD binary) will be run instead of the linux binary. For example, #!/bin/sh, thus breaking out of linux compatibility mode. This solves a number of problems people have had installing linux software on FreeBSD boxes.
|
#
ed6aff73 |
|
18-Apr-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove unneeded <sys/buf.h> includes. Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks by 924 bytes.
|
#
cb679c38 |
|
16-Apr-2000 |
Jonathan Lemon <jlemon@FreeBSD.org> |
Introduce kqueue() and kevent(), a kernel event notification facility.
|
#
5e266442 |
|
20-Jan-2000 |
Warner Losh <imp@FreeBSD.org> |
When we are execing a setugid program, and we have a procfs filesystem file open in one of the special file descriptors (0, 1, or 2), close it before completing the exec. Submitted by: nergal@idea.avet.com.pl Constructive comments: deraadt@openbsd.org, sef, peter, jkh
|
#
654f6be1 |
|
27-Dec-1999 |
Bruce Evans <bde@FreeBSD.org> |
Changed the type used to represent the user stack pointer from `long *' to `register_t *'. This fixes bugs like misplacement of argc and argv on the user stack on i386's with 64-bit longs. We still use longs to represent "words" like argc and argv, and assume that they are on the stack (and that there is stack). The suword() and fuword() families should also use register_t.
|
#
762e6b85 |
|
15-Dec-1999 |
Eivind Eklund <eivind@FreeBSD.org> |
Introduce NDFREE (and remove VOP_ABORTOP)
|
#
a8704f89 |
|
26-Nov-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add a sysctl to control if argv is disclosed to the world: kern.ps_argsopen It defaults to 1 which means that all users can see all argvs in ps(1). Reviewed by: Warner
|
#
b9df5231 |
|
16-Nov-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce commandline caching in the kernel. This fixes some nasty procfs problems for SMP, makes ps(1) run much faster, and makes ps(1) even less dependent on /proc which will aid chroot and jails alike. To disable this facility and revert to previous behaviour: sysctl -w kern.ps_arg_cache_limit=0 For full details see the current@FreeBSD.org mail-archives.
|
#
923502ff |
|
29-Oct-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
#
c3aac50f |
|
27-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
fdf4e8b3 |
|
11-Aug-1999 |
Warner Losh <imp@FreeBSD.org> |
Stop profiling on exec. Obtained from: NetBSD
|
#
f711d546 |
|
27-Apr-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>. 3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.
|
#
db42d908 |
|
19-Apr-1999 |
Peter Wemm <peter@FreeBSD.org> |
unifdef -DVM_STACK - it's been on for a while for x86 and was checked and appeared to be working for the Alpha some time ago.
|
#
4fe88fe6 |
|
03-Apr-1999 |
John Polstra <jdp@FreeBSD.org> |
Restore support for executing BSD/OS binaries on the i386 by passing the address of the ps_strings structure to the process via %ebx. For other kinds of binaries, %ebx is still zeroed as before. Submitted by: Thomas Stephens <tas@stephens.org> Reviewed by: jdp
|
#
b1028ad1 |
|
19-Feb-1999 |
Luoqi Chen <luoqi@FreeBSD.org> |
Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper. Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
|
#
8aef1712 |
|
27-Jan-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
d254af07 |
|
27-Jan-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
2267af78 |
|
06-Jan-1999 |
Julian Elischer <julian@FreeBSD.org> |
Add (but don't activate) code for a special VM option to make downward growing stacks more general. Add (but don't activate) code to use the new stack facility when running threads, (specifically the linux threads support). This allows people to use both linux compiled linuxthreads, and also the native FreeBSD linux-threads port. The code is conditional on VM_STACK. Not using this will produce the old heavily tested system. Submitted by: Richard Seaman <dick@tar.com>
|
#
9c0fed3d |
|
30-Dec-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Various changes to support OSF1 emulation: * Move the user stack from VM_MAXUSER_ADDRESS to a place below the 32bit boundary (needed to support 32bit OSF programs). This should also save one pagetable per process. * Add cvtqlsv to the set of instructions handled by the floating point software completion code. * Disable all floating point exceptions by default. * A minor change to execve to allow the OSF1 image activator to support dynamic loading.
|
#
486bddb0 |
|
27-Dec-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Fix some 64bit truncation problems which crept into SYSCTL_LONG() with the last cleanup. Since the oid_arg2 field of struct sysctl_oid is not wide enough to hold a long, the SYSCTL_LONG() macro has been modified to only support exporting long variables by pointer instead of by value. Reviewed by: bde
|
#
4c56fcde |
|
16-Dec-1998 |
Bruce Evans <bde@FreeBSD.org> |
Removed the cast to a pointer in the definition of PS_STRINGS and adjusted related casts to match (only in the kernel in this commit). The pointer was only wanted in one place in kern_exec.c. Applications should use the kern.ps_strings sysctl instead of PS_STRINGS, so they shouldn't notice this change.
|
#
2caeccee |
|
16-Dec-1998 |
Bruce Evans <bde@FreeBSD.org> |
Removed all traces of SYSCTL_INTPTR(). Pointers can't really be passed across the kernel -> application interface, and for the one sysctl where they were passed and actually used (kern.ps_strings), the applications want addresses represented as u_longs anyway (the other sysctl that passed them, kern.usrstack, has never been used). Agreed to by: dfr, phk
|
#
73007561 |
|
28-Oct-1998 |
David Greenman <dg@FreeBSD.org> |
Added a second argument, "activate" to the vm_page_unwire() call so that the caller can select either inactive or active queue to put the page on.
|
#
aa855a59 |
|
15-Oct-1998 |
Peter Wemm <peter@FreeBSD.org> |
*gulp*. Jordan specifically OK'ed this.. This is the bulk of the support for doing kld modules. Two linker_sets were replaced by SYSINIT()'s. VFS's and exec handlers are self registered. kld is now a superset of lkm. I have converted most of them, they will follow as a seperate commit as samples. This all still works as a static a.out kernel using LKM's.
|
#
e69763a3 |
|
04-Sep-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.
|
#
069e9bc1 |
|
24-Aug-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Change various syscalls to use size_t arguments instead of u_int. Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>
|
#
aae0aa45 |
|
15-Jul-1998 |
Bruce Evans <bde@FreeBSD.org> |
Cast between longs and pointers via intptr_t. The results of fuword() should be checked before casting. The results of suword() should be checked.
|
#
ecbb00a2 |
|
07-Jun-1998 |
Doug Rabson <dfr@FreeBSD.org> |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
#
dc733423 |
|
17-Apr-1998 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.
|
#
eed2412e |
|
07-Mar-1998 |
John Dyson <dyson@FreeBSD.org> |
Free the first page also if it is not valid.
|
#
8f9110f6 |
|
07-Mar-1998 |
John Dyson <dyson@FreeBSD.org> |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
#
c8a79999 |
|
01-Mar-1998 |
Peter Wemm <peter@FreeBSD.org> |
Update the ELF image activator to use some of the exec resources rather than rolling it's own. This means that it now uses the "safe" exec_map_first_page() to get the ld.so headers rather than risking a panic on a page fault failure (eg: NFS server goes down). Since all the ELF tools go to a lot of trouble to make sure everything lives in the first page for executables, this is a win. I have not seen any ELF executable on any system where all the headers didn't fit in the first page with lots of room to spare. I have been running variations of this code for some time on my pure ELF systems.
|
#
5132080e |
|
25-Feb-1998 |
Bruce Evans <bde@FreeBSD.org> |
Removed unused #includes.
|
#
0b08f5f7 |
|
05-Feb-1998 |
Eivind Eklund <eivind@FreeBSD.org> |
Back out DIAGNOSTIC changes.
|
#
95461b45 |
|
04-Feb-1998 |
John Dyson <dyson@FreeBSD.org> |
1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.
|
#
47cfdb16 |
|
04-Feb-1998 |
Eivind Eklund <eivind@FreeBSD.org> |
Turn DIAGNOSTIC into a new-style option.
|
#
1616db3c |
|
11-Jan-1998 |
John Dyson <dyson@FreeBSD.org> |
Implement the first page access for object type determination more VM clean. Also, use vm_map_insert instead of vm_mmap. Reviewed by: dg@freebsd.org
|
#
95e5e988 |
|
05-Jan-1998 |
John Dyson <dyson@FreeBSD.org> |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
#
675ea6f0 |
|
26-Dec-1997 |
Bruce Evans <bde@FreeBSD.org> |
Unspammed nested include of <vm/vm_zone.h>.
|
#
d5f81602 |
|
19-Dec-1997 |
Sean Eric Fagan <sef@FreeBSD.org> |
Clear the p_stops field on change of user/group id, unless the correct flag is set in the p_pfsflags field. This, essentially, prevents an SUID proram from hanging after being traced. (E.g., "truss /usr/bin/rlogin" would fail, but leave rlogin in a stopevent state.) Yet another case where procctl is (hopefully ;)) no longer needed in the general case. Reviewed by: bde (thanks bruce :))
|
#
c7ce9e26 |
|
16-Dec-1997 |
David Greenman <dg@FreeBSD.org> |
Fix bug where a struct buf was free()'d back to the system malloc pool. Quite amazing that the system runs at all with this bug. Also present in 2.2.5. The bug appears to have come in with changes in rev 1.53. PR: might fix PR#5313 Submitted by: bde
|
#
2a024a2b |
|
05-Dec-1997 |
Sean Eric Fagan <sef@FreeBSD.org> |
Changes to allow event-based process monitoring and control.
|
#
cb226aaa |
|
06-Nov-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /*ARGSUSED*/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
#
d021ae3d |
|
15-Oct-1997 |
Guido van Rooij <guido@FreeBSD.org> |
On execing a sgid program, do not set P_SUGID when cr_gid and cr)_uid do not change. PR: 4755 Reviewed by: Bruce Evans
|
#
99448ed1 |
|
20-Sep-1997 |
John Dyson <dyson@FreeBSD.org> |
Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
|
#
e4ba6a82 |
|
02-Sep-1997 |
Bruce Evans <bde@FreeBSD.org> |
Removed unused #includes.
|
#
a78e8d2a |
|
03-Aug-1997 |
David Greenman <dg@FreeBSD.org> |
Fixed security hole with sharing the file descriptor table (via rfork) when execing a setuid/setgid binary. Code submitted by Sean Eric Fagan (sef@freebsd.org). Also consolidated the setuid/setgid checks into one place. Reviewed by: dyson,sef
|
#
5cf3d12c |
|
23-Apr-1997 |
Andrey A. Chernov <ache@FreeBSD.org> |
Don't clobber user space argv0 memory on shell exec, mainly for vfork() Fix another bug: if argv[0] is NULL, garbadge args might be added for shell script Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no> (with yet one fault detect from me)
|
#
1ebd0c59 |
|
17-Apr-1997 |
David Greenman <dg@FreeBSD.org> |
Brought fix from the 2.2 branch forward (see rev 1.47.2.7): serious bugs with reading the image header.
|
#
492da96c |
|
12-Apr-1997 |
John Dyson <dyson@FreeBSD.org> |
Correct the previous thread-fix commit. I made a clerical error.
|
#
5856e12e |
|
12-Apr-1997 |
John Dyson <dyson@FreeBSD.org> |
Fully implement vfork. Vfork is now much much faster than even our fork. (On my machine, fork is about 240usecs, vfork is 78usecs.) Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory from the other threads of a group. Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating possible existing shares with other threads/processes. Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a thread from the rest of the group. Fix the case where a thread does an exec. It is almost nonsense for a thread to modify the other threads address space by an exec, so we now automatically divorce the address space before modifying it.
|
#
c04b956c |
|
11-Apr-1997 |
John Dyson <dyson@FreeBSD.org> |
Effectively remove the previous commit to fix threads forking. The change was a false-start, and needs more work.
|
#
af9ec885 |
|
11-Apr-1997 |
John Dyson <dyson@FreeBSD.org> |
Allow a kernel-supported process thread to do an exec without blasting away the VM space of all of the other, associated threads.
|
#
66141753 |
|
04-Apr-1997 |
David Greenman <dg@FreeBSD.org> |
Killed unnecessary vp == NULL check after namei.
|
#
a3cf6eba |
|
04-Apr-1997 |
David Greenman <dg@FreeBSD.org> |
Oops, only free component name buffer if namei() didn't. This bug has been in here since I wrote the code 3 years ago! Thanks, Bruce! Submitted by: bde
|
#
6d5a0a8c |
|
03-Apr-1997 |
David Greenman <dg@FreeBSD.org> |
Various fixes: 1. imgp->image_header needs to be cleared for the bp == NULL && `goto interpret' case, else exec_fail_dealloc would free it twice after an error. 2. Moved the vp->v_writecount check in exec_check_permissions() to near the end. This fixes execve("/dev/null", ...) returning the bogus errno ETXTBSY. ETXTBSY is still returned for attempts to exec interpreted files that are open for writing. The man page is very old and wrong here. It says that ETXTBSY is for pure procedure (shared text) files that are open for writing or reading. 3. Moved the setuid disabling in exec_check_permissions() to the end. Cosmetic. It's more natural to dispose of all the error cases first. ...plus a couple of other cosmetic changes. Submitted by: bde
|
#
8677f509 |
|
03-Apr-1997 |
David Greenman <dg@FreeBSD.org> |
Lose the vnode lock on a permissions failure. Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
|
#
9caaadb6 |
|
31-Mar-1997 |
David Greenman <dg@FreeBSD.org> |
Changed the way that the exec image header is read to be filesystem- centric rather than VM-centric to fix a problem with errors not being detectable when the header is read. Killed exech_map as a result of these changes. There appears to be no performance difference with this change.
|
#
6875d254 |
|
22-Feb-1997 |
Peter Wemm <peter@FreeBSD.org> |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
e47bda07 |
|
18-Feb-1997 |
David Greenman <dg@FreeBSD.org> |
Fix from PR #2757: execve() clears the P_SUGID process flag in execve() if the binary executed does not have suid or sgid permission bits set. This also happens when the effective uid is different from the real uid or the effective gid is different from the real gid. Under these circumstances, the process still has set id privileges and the P_SUGID flag should not be cleared. Submitted by: Tor Egge <Tor.Egge@idt.ntnu.no>
|
#
996c772f |
|
09-Feb-1997 |
John Dyson <dyson@FreeBSD.org> |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
1130b656 |
|
14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
2cb544c3 |
|
08-Nov-1996 |
John Dyson <dyson@FreeBSD.org> |
Fix an ordering bug -- pmap_remove_pages should be called BEFORE vm_map_remove, not after... 2.2-RELEASE candidate.
|
#
9d3fbbb5 |
|
12-Oct-1996 |
John Dyson <dyson@FreeBSD.org> |
Performance optimizations. One of which was meant to go in before the previous snap. Specifically, kern_exit and kern_exec now makes a call into the pmap module to do a very fast removal of pages from the address space. Additionally, the pmap module now updates the PG_MAPPED and PG_WRITABLE flags. This is an optional optimization, but helpful on the X86.
|
#
67bf6868 |
|
29-Jul-1996 |
John Dyson <dyson@FreeBSD.org> |
Backed out the recent changes/enhancements to the VM code. The problem with the 'shell scripts' was found, but there was a 'strange' problem found with a 486 laptop that we could not find. This commit backs the code back to 25-jul, and will be re-entered after the snapshot in smaller (more easily tested) chunks.
|
#
4f4d35ed |
|
26-Jul-1996 |
John Dyson <dyson@FreeBSD.org> |
This commit is meant to solve a couple of VM system problems or performance issues. 1) The pmap module has had too many inlines, and so the object file is simply bigger than it needs to be. Some common code is also merged into subroutines. 2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls. Unfortunately, a few have needed to be added also. The removal caused the need for more vm_page_lookups. I added lookup hints to minimize the need for the page table lookup operations. 3) Removal of some bogus performance improvements, that mostly made the code more complex (tracking individual page table page updates unnecessarily). Those improvements actually hurt 386 processors perf (not that people who worry about perf use 386 processors anymore :-)). 4) Changed pv queue manipulations/structures to be TAILQ's. 5) The pv queue code has had some performance problems since day one. Some significant scalability issues are resolved by threading the pv entries from the pmap AND the physical address instead of just the physical address. This makes certain pmap operations run much faster. This does not affect most micro-benchmarks, but should help loaded system performance *significantly*. DG helped and came up with most of the solution for this one. 6) Most if not all pmap bit operations follow the pattern: pmap_test_bit(); pmap_clear_bit(); That made for twice the necessary pv list traversal. The pmap interface now supports only pmap_tc_bit type operations: pmap_[test/clear]_modified, pmap_[test/clear]_referenced. Additionally, the modified routine now takes a vm_page_t arg instead of a phys address. This eliminates a PHYS_TO_VM_PAGE operation. 7) Several rewrites of routines that contain redundant code to use common routines, so that there is a greater likelihood of keeping the cache footprint smaller.
|
#
6ab46d52 |
|
11-Jul-1996 |
Bruce Evans <bde@FreeBSD.org> |
Don't use NULL in non-pointer contexts.
|
#
86064318 |
|
02-Jun-1996 |
David Greenman <dg@FreeBSD.org> |
Use kmem_alloc_wait/kmem_free_wakeup() to avoid allocation failures from running out of string space in the exec_map.
|
#
6120fef1 |
|
02-Jun-1996 |
David Greenman <dg@FreeBSD.org> |
Fix declaration of ps_strings.
|
#
b18bfc3d |
|
17-May-1996 |
John Dyson <dyson@FreeBSD.org> |
This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me: More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.
|
#
a794e791 |
|
30-Apr-1996 |
Bruce Evans <bde@FreeBSD.org> |
Removed unnecessary #includes from <sys/imgact.h> so that it is self-sufficient and added explicit #includes where required.
|
#
24b34f09 |
|
29-Apr-1996 |
Sujal Patel <smpatel@FreeBSD.org> |
Fixed two typos in the comment. Pointed out by: davidg
|
#
39f70d45 |
|
07-Apr-1996 |
David Greenman <dg@FreeBSD.org> |
Killed sections 3 and 4 of my copyright as I don't agree with it (I believe it to be unnecessarily restrictive). For tty_subr.c, update to my standard copyright.
|
#
e1743d02 |
|
10-Mar-1996 |
Søren Schmidt <sos@FreeBSD.org> |
First attempt at FreeBSD & Linux ELF support. Compile and link a new kernel, that will give native ELF support, and provide the hooks for other ELF interpreters as well. To make native ELF binaries use John Polstras elf-kit-1.0.1.. For the time being also use his ld-elf.so.1 and put it in /usr/libexec. The Linux emulator has been enhanced to also run ELF binaries, it is however in its very first incarnation. Just get some Linux ELF libs (Slackware-3.0) and put them in the prober place (/compat/linux/...). I've ben able to run all the Slackware-3.0 binaries I've tried so far. (No it won't run quake yet :)
|
#
d66a5066 |
|
02-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Mega-commit for Linux emulator update.. This has been stress tested under netscape-2.0 for Linux running all the Java stuff. The scrollbars are now working, at least on my machine. (whew! :-) I'm uncomfortable with the size of this commit, but it's too inter-dependant to easily seperate out. The main changes: COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386 machine dependent section into the linux emulator itself. The int 0x80 syscall code was almost identical to the lcall 7,0 code and a minor tweak allows them to both be used with the same C code. All kernels can now just modload the lkm and it'll DTRT without having to rebuild the kernel first. Like IBCS2, you can statically compile it in with "options LINUX". A pile of new syscalls implemented, including getdents(), llseek(), readv(), writev(), msync(), personality(). The Linux-ELF libraries want to use some of these. linux_select() now obeys Linux semantics, ie: returns the time remaining of the timeout value rather than leaving it the original value. Quite a few bugs removed, including incorrect arguments being used in syscalls.. eg: mixups between passing the sigset as an int, vs passing it as a pointer and doing a copyin(), missing return values, unhandled cases, SIOC* ioctls, etc. The build for the code has changed. i386/conf/files now knows how to build linux_genassym and generate linux_assym.h on the fly. Supporting changes elsewhere in the kernel: The user-mode signal trampoline has moved from the U area to immediately below the top of the stack (below PS_STRINGS). This allows the different binary emulations to have their own signal trampoline code (which gets rid of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so that the emulator can provide the exact "struct sigcontext *" argument to the program's signal handlers. The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which have the same values as the re-used SA_DISABLE and SA_ONSTACK which are intended for sigaction only. This enables the support of a SA_RESETHAND flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal semantics where the signal handler is reset when it's triggered. makesyscalls.sh no longer appends the struct sysentvec on the end of the generated init_sysent.c code. It's a lot saner to have it in a seperate file rather than trying to update the structure inside the awk script. :-) At exec time, the dozen bytes or so of signal trampoline code are copied to the top of the user's stack, rather than obtaining the trampoline code the old way by getting a clone of the parent's user area. This allows Linux and native binaries to freely exec each other without getting trampolines mixed up.
|
#
99ac3bc8 |
|
24-Feb-1996 |
Peter Wemm <peter@FreeBSD.org> |
Add two sysctl variables that can be read by libutil and libkvm so that they can adapt to simple kernel VM layout changes.
|
#
9f29a577 |
|
20-Jan-1996 |
Bruce Evans <bde@FreeBSD.org> |
Removed stale #includes of "opt_sysvipc.h".
|
#
bd7e5f99 |
|
18-Jan-1996 |
John Dyson <dyson@FreeBSD.org> |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
81090119 |
|
07-Jan-1996 |
Peter Wemm <peter@FreeBSD.org> |
(gulp!) reran makesyscalls.. sysv_ipc.c: add stub functions that either simply return (for the hooks in kern_fork/kern_exit) or log() a messgae and call enosys() (for the syscalls). sysv_ipc.c will become "standard" in conf/files and has #ifs for all the permutations.
|
#
50c73f36 |
|
04-Jan-1996 |
Garrett Wollman <wollman@FreeBSD.org> |
Convert SYSV IPC to new-style options. (I hope I got everything...) The LKMs will need an extra file, to come later.
|
#
87b6de2b |
|
14-Dec-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
|
#
1ed012f9 |
|
08-Dec-1995 |
Peter Wemm <peter@FreeBSD.org> |
Reorganise ps_strings in order to gain BSD/OS 2.0 binary compatability. This is now in line with NetBSD as well.. Note that once this series of commits is finished, you must recompile libkvm, then ps and maybe 'w'. If you are running the recently imported sendmail-8.7, you should recompile that too (src/conf.c at least).
|
#
efeaf95a |
|
06-Dec-1995 |
David Greenman <dg@FreeBSD.org> |
Untangled the vm.h include file spaghetti.
|
#
c2f9f36b |
|
13-Nov-1995 |
David Greenman <dg@FreeBSD.org> |
Use kmem_alloc_pageable/kmem_free to allocate memory instead of individual VM map functions.
|
#
d2d3e875 |
|
11-Nov-1995 |
Bruce Evans <bde@FreeBSD.org> |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
#
c52007c2 |
|
05-Nov-1995 |
David Greenman <dg@FreeBSD.org> |
All: Changed vnodep -> vp for consistency with the rest of the kernel, and changed iparams -> imgp for brevity. kern_exec.c: Explicitly initialized some additional parts of the image_params struct to avoid bzeroing it. Rewrote the set-id code to reduce the number of logical tests. The rewrite exposed a mostly benign bug in the algorithm: traced set-id images would get ktracing disabled even if the set-id didn't happen for other reasons.
|
#
079cc25b |
|
21-Oct-1995 |
David Greenman <dg@FreeBSD.org> |
Killed a few gratuitous #include's.
|
#
ad7507e2 |
|
07-Oct-1995 |
Steven Wallace <swallace@FreeBSD.org> |
Remove prototype definitions from <sys/systm.h>. Prototypes are located in <sys/sysproto.h>. Add appropriate #include <sys/sysproto.h> to files that needed protos from systm.h. Add structure definitions to appropriate files that relied on sys/systm.h, right before system call definition, as in the rest of the kernel source. In kern_prot.c, instead of using the dummy structure "args", create individual dummy structures named <syscall>_args. This makes life easier for prototype generation.
|
#
c0e5de7d |
|
24-Aug-1995 |
David Greenman <dg@FreeBSD.org> |
Moved setting of VTEXT flag into the appropriate image activators. This fixes a bug where linux binaries would get the flag set inappropriately.
|
#
9b2e5354 |
|
30-May-1995 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
Remove trailing whitespace.
|
#
3ed8a403 |
|
24-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Use 'p' rather than 'curproc' when appropriate.
|
#
b35ba931 |
|
24-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Use NDINIT macro to initialize fields for namei.
|
#
9022e4ad |
|
19-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Fixed bug introduced in the previous commit - the lock must be held until after the call to exec_check_permissions().
|
#
f5277ae7 |
|
19-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Lose the lock on the vnode. Changes to implement proper locking in the vnode pager now require this. Submitted by: John Dyson
|
#
b5e8ce9f |
|
16-Mar-1995 |
Bruce Evans <bde@FreeBSD.org> |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
#
a74a73d9 |
|
10-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Removed some #include's of unnecessary include files.
|
#
3aed948b |
|
28-Feb-1995 |
David Greenman <dg@FreeBSD.org> |
No longer assume that a process's address space can be directly written to.
|
#
68940ac1 |
|
20-Feb-1995 |
David Greenman <dg@FreeBSD.org> |
Use of vm_allocate() and vm_deallocate() has been deprecated.
|
#
1e1e0b44 |
|
14-Feb-1995 |
Søren Schmidt <sos@FreeBSD.org> |
First attempt to run linux binaries. This is only the changes needed to the generic kernel. The actual emulator is a separate LKM. (not finished yet, sorry). Submitted by: sos@freebsd.org & sef@kithrup.com
|
#
0d94caff |
|
09-Jan-1995 |
David Greenman <dg@FreeBSD.org> |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman
|
#
797f2d22 |
|
02-Oct-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
|
#
bb56ec4a |
|
25-Sep-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
|
#
2a531c80 |
|
24-Sep-1994 |
David Greenman <dg@FreeBSD.org> |
Added support for p_textvp which stores the vnode pointer of the execed binary.
|
#
d74643df |
|
13-Sep-1994 |
David Greenman <dg@FreeBSD.org> |
Added missing #ifdef SYSVSHM.
|
#
3d903220 |
|
13-Sep-1994 |
Doug Rabson <dfr@FreeBSD.org> |
Added SYSV ipcs. Obtained from: NetBSD and FreeBSD-1.1.5
|
#
0fd1a014 |
|
24-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Pay attention to *all* errors from copyinstr(). This patch fixes a bug that causes a no-panic instant reboot when bogus argv/envvs are fed to execve().
|
#
f23b4c91 |
|
18-Aug-1994 |
Garrett Wollman <wollman@FreeBSD.org> |
Fix up some sloppy coding practices: - Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above. NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
#
93f6448c |
|
06-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Implemented support for the "ps_strings" structure (grrrr...) for use in the userland library libkvm.
|
#
26f9a767 |
|
25-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
df8bae1d |
|
24-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
BSD 4.4 Lite Kernel Sources
|