Cross Reference: /freebsd-current/sys/kern/kern

History log of /freebsd-current/sys/kern/kern_ktrace.c
Revision	Date	Author	Comments
# 9bec8413	06-Apr-2024	Jake Freeland <jfree@FreeBSD.org>	ktrace: Record detailed ECAPMODE violations When a Capsicum violation occurs in the kernel, ktrace will now record detailed information pertaining to the violation. For example: - When a namei lookup violation occurs, ktrace will record the path. - When a signal violation occurs, ktrace will record the signal number. - When a sendto(2) violation occurs, ktrace will record the recipient sockaddr. For all violations, the syscall and ABI is recorded. kdump is also modified to display this new information to the user. Reviewed by: oshogbo, markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40676
# 47ad4f2d	04-Mar-2024	Kyle Evans <kevans@FreeBSD.org>	ktrace: log genio events on failed write Visibility into the contents of the buffer when a write(2) has failed can be immensely useful in debugging IPC issues -- pushing this to discuss the idea, or maybe an alternative where we can set a flag like KTRFAC_ERRIO to enable it. When a genio event is potentially raised after an error, currently we'll just free the uio and return. However, such data can be useful when debugging communication between processes to, e.g., understand what the remote side should have grabbed before closing a pipe. Tap out the entire buffer on failure rather than simply discarding it. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D43799
# 61cc4830	18-Jan-2024	Alfredo Mazzinghi <am2419@cl.cam.ac.uk>	Abstract UIO allocation and deallocation. Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify the sub-allocation layout of struct uio and the corresponding iovec array. Obtained from: CheriBSD Reviewed by: kib, markj MFC after: 2 weeks Sponsored by: CHaOS, EPSRC grant EP/V000292/1 Differential Revision: https://reviews.freebsd.org/D43711
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 3080f82b	01-Jun-2023	Mark Johnston <markj@FreeBSD.org>	ktrace: Make the data lengths table const No functional change intended. MFC after: 1 week
# 4a662c90	28-Jul-2022	Konstantin Belousov <kib@FreeBSD.org>	ktrace: change AST handler to require AST flag set When it was inline it made sense to depend on the existing nested check in KTRUSERRET() rather than adding a new td_flags flag. However, since we now have a TDA_KTRACE flag anyway, we might as well check it and avoid the call. Suggested by: jhb Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
# c6d31b83	18-Jul-2022	Konstantin Belousov <kib@FreeBSD.org>	AST: rework Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For instance, signal delivery code on return to userspace is now moved to kern_sig.c. Also, it allows to have some handlers designated as the cleanup (kclear) type, which are called both at AST and on thread/process exit. For instance, ast(), exit1(), and NFS server no longer need to be aware about UFS softdep processing. The dynamic registration also allows third-party modules to register AST handlers if needed. There is one caveat with loadable modules: the code does not make any effort to ensure that the module is not unloaded before all threads processed through AST handler in it. In fact, this is already present behavior for hwpmc.ko and ufs.ko. I do not think it is worth the efforts and the runtime overhead to try to fix it. Reviewed by: markj Tested by: emaste (arm64), pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
# fc90f3a2	15-Jul-2022	Dmitry Chagin <dchagin@FreeBSD.org>	ktrace: Increase precision of timestamps. Replace struct timeval in header with struct timespec. To differentiate header formats, add a new KTR_VERSIONED flag set in the header type field similar to the existing KTRDROP flag. To make it easier to extend ktrace headers in the future, extend the existing header with a version field (version 0 is reserved for older records without KTR_VERSIONED) as well as new fields holding the thread ID and CPU ID. Reviewed by: jhb, pauamma Differential Revision: https://reviews.freebsd.org/D35774 MFC after: 2 weeks
# b1ad6a90	28-Mar-2022	Brooks Davis <brooks@FreeBSD.org>	syscallarg_t: Add a type for system call arguments This more clearly differentiates system call arguments from integer registers and return values. On current architectures it has no effect, but on architectures where pointers are not integers (CHERI) and may not even share registers (CHERI-MIPS) it is necessiary to differentiate between system call arguments (syscallarg_t) and integer register values (register_t). Obtained from: CheriBSD Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D33780
# bb92cd7b	24-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)
# 0910a41e	12-Jan-2022	Brooks Davis <brooks@FreeBSD.org>	Revert "syscallarg_t: Add a type for system call arguments" Missed issues in truss on at least armv7 and powerpcspe need to be resolved before recommit. This reverts commit 3889fb8af0b611e3126dc250ebffb01805152104. This reverts commit 1544e0f5d1f1e3b8c10a64cb899a936976ca7ea4.
# 1544e0f5	12-Jan-2022	Brooks Davis <brooks@FreeBSD.org>	syscallarg_t: Add a type for system call arguments This more clearly differentiates system call arguments from integer registers and return values. On current architectures it has no effect, but on architectures where pointers are not integers (CHERI) and may not even share registers (CHERI-MIPS) it is necessiary to differentiate between system call arguments (syscallarg_t) and integer register values (register_t). Obtained from: CheriBSD Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D33780
# 7e1d3eef	25-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
# 5c18bf9d	23-Jul-2021	Mark Johnston <markj@FreeBSD.org>	ktrace: Zero request structures when populating the pool Otherwise uninitialized pad bytes may be copied into the ktrace log file. Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation
# 283e60fb	01-Jun-2021	Mark Johnston <markj@FreeBSD.org>	ktrace: Fix an inverted comparison added in commit f3851b235 Fixes: f3851b235 ("ktrace: Fix a race with fork()") Reported by: dchagin, phk
# f3851b23	27-May-2021	Mark Johnston <markj@FreeBSD.org>	ktrace: Fix a race with fork() ktrace(2) may toggle trace points in any of 1. a single process 2. all members of a process group 3. all descendents of the processes in 1 or 2 In the first two cases, we do not permit the operation if the process is being forked or not visible. However, in case 3 we did not enforce this restriction for descendents. As a result, the assertions about the child in ktrprocfork() may be violated. Move these checks into ktrops() so that they are applied consistently. Allow KTROP_CLEAR for nascent processes. Otherwise, there is a window where we cannot clear trace points for a nascent child if they are inherited from the parent. Reported by: syzbot+d96676592978f137e05c@syzkaller.appspotmail.com Reported by: syzbot+7c98fcf84a4439f2817f@syzkaller.appspotmail.com Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30481
# f8851007	27-May-2021	Mark Johnston <markj@FreeBSD.org>	ktrace: Handle negative array sizes in ktrstructarray ktrstructarray() may be used to create copies of kevent(2) change and event arrays. It is called before parameter validation is done and so should check for bogus array lengths before allocating a copy. Reported by: syzkaller Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30479
# 6f6cd1e8	23-May-2021	Mark Johnston <markj@FreeBSD.org>	ktrace: Remove vrele() at the end of ktr_writerequest() As of commit fc369a353 we no longer ref the vnode when writing a record. Drop the corresponding vrele() call in the error case. Fixes: fc369a353 ("ktrace: fix a race between writes and close") Reported by: syzbot+9b96ea7a5ff8917d3fe4@syzkaller.appspotmail.com Reported by: syzbot+6120ebbb354cd52e5107@syzkaller.appspotmail.com Reviewed by: kib MFC after: 6 days Differential Revision: https://reviews.freebsd.org/D30404
# fc369a35	22-May-2021	Konstantin Belousov <kib@FreeBSD.org>	ktrace: fix a race between writes and close It was possible that termination of ktrace session occured during some record write, in which case write occured after the close of the vnode. Use ktr_io_params refcounting to avoid this situation, by taking the reference on the structure instead of vnode. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30400
# e4b16f2f	21-May-2021	Mark Johnston <markj@FreeBSD.org>	ktrace: Avoid recursion in namei() sys_ktrace() calls namei(), which may call ktrnamei(). But sys_ktrace() also calls ktrace_enter() first, so if the caller is itself being traced, the assertion in ktrace_enter() is triggered. And, ktrnamei() does not check for recursion like most other ktrace ops do. Fix the bug by simply deferring the ktrace_enter() call. Also make the parameter to ktrnamei() const and convert to ANSI. Reported by: syzbot+d0a4de45e58d3c08af4b@syzkaller.appspotmail.com Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30340
# ea2b64c2	18-May-2021	Konstantin Belousov <kib@FreeBSD.org>	ktrace: add a kern.ktrace.filesize_limit_signal knob When enabled, writes to ktrace.out that exceed the max file size limit cause SIGXFSZ as it should be, but note that the limit is taken from the process that initiated ktrace. When disabled, write is blocked, but signal is not send. Note that in either case ktrace for the affected process is stopped. Requested and reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30257
# 02645b88	14-May-2021	Konstantin Belousov <kib@FreeBSD.org>	ktrace: use the limit of the trace initiator for file size limit on writes Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30257
# 1762f674	14-May-2021	Konstantin Belousov <kib@FreeBSD.org>	ktrace: pack all ktrace parameters into allocated structure ktr_io_params Ref-count the ktr_io_params structure instead of vnode/cred. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30257
# a6144f71	14-May-2021	Konstantin Belousov <kib@FreeBSD.org>	ktrace: do not stop tracing other processes if our cannot write to this vnode Other processes might still be able to write, make the decision to stop based on the per-process situation. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30257
# 46588778	02-Oct-2020	Edward Tomasz Napierala <trasz@FreeBSD.org>	Move KTRUSERRET() from userret() to ast(). It's a really long detour - it writes ktrace entries to the filesystem - so the overhead of ast() won't make any difference. Reviewed by: kib Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26404
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# 0a1427c5	03-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	ktrace: provide ktrstat_error This eliminates a branch from its consumers trading it for an extra call if ktrace is enabled for curthread. Given that this is almost never true, the tradeoff is worth it.
# b249ce48	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
# ad738f37	08-May-2018	Matt Macy <mmacy@FreeBSD.org>	Reduce overhead of ktrace checks in the common case. KTRPOINT() checks both if we are tracing _and_ if we are recursing within ktrace. The second condition is only ever executed if ktrace is actually enabled. This change moves the check out of the hot path in to the functions themselves. Discussed with mjg@ Reported by: mjg@ Approved by: sbruno@
# ffb66079	24-Nov-2017	John Baldwin <jhb@FreeBSD.org>	Decode kevent structures logged via ktrace(2) in kdump. - Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of structures. The structure name in the record payload is preceded by a size_t containing the size of the individual structures. Use this to replace the previous code that dumped the kevent arrays dumped for kevent(). kdump is now able to decode the kevent structures rather than dumping their contents via a hexdump. One change from before is that the 'changes' and 'events' arrays are not marked with separate 'read' and 'write' annotations in kdump output. Instead, the first array is the 'changes' array, and the second array (only present if kevent doesn't fail with an error) is the 'events' array. For kevent(), empty arrays are denoted by an entry with an array containing zero entries rather than no record. - Move kevent decoding tables from truss to libsysdecode. This adds three new functions to decode members of struct kevent: sysdecode_kevent_filter, sysdecode_kevent_flags, and sysdecode_kevent_fflags. kdump uses these helper functions to pretty-print kevent fields. - Move structure definitions for freebsd11 and freebsd32 kevent structures to <sys/event.h> so that they can be shared with userland. The 32-bit structures are only exposed if _WANT_KEVENT32 is defined. The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is defined. The 32-bit freebsd11 structure requires both. - Decode freebsd11 kevent structures in truss for the compat11.kevent() system call. - Log 32-bit kevent structures via ktrace for 32-bit compat kevent() system calls. - While here, constify the 'void *data' argument to ktrstruct(). Reviewed by: kib (earlier version) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12470
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# 1e4296c9	12-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	Ktracing kevent(2) calls with unusual arguments might leads to an overly large allocation requests. When ktrace-ing io, sys_kevent() allocates memory to copy the requested changes and reported events. Allocations are sized by the incoming syscall lengths arguments, which are user-controlled, and might cause overflow in calculations or too large allocations. Since io trace chunks are limited by ktr_geniosize, there is no sense it even trying to satisfy unbounded allocations. Export ktr_geniosize and clamp the buffers sizes in advance. PR: 217435 Reported by: Tim Newsham <tim.newsham@nccgroup.trust> Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 039644ec	20-Jan-2017	Ed Maste <emaste@FreeBSD.org>	ANSYfy kern_ktrace.c and remove archaic register keyword Sponsored by: The FreeBSD Foundation
# 69a28758	15-Sep-2016	Ed Maste <emaste@FreeBSD.org>	Renumber license clauses in sys/kern to avoid skipping #3
# 7c34b35b	10-Aug-2016	Mateusz Guzik <mjg@FreeBSD.org>	ktrace: do a lockless check on fork to see if tracing is enabled This saves 2 lock acquisitions in the common case.
# 02abd400	19-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current
# 4dd3a21f	27-Jan-2016	Mateusz Guzik <mjg@FreeBSD.org>	ktrace: tidy up ktrstruct - minor style fixes - avoid doing strlen twice [1] PR: 206648 Submitted by: C Turt <ecturt gmail.com> (original version) [1]
# af3b2549	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Pull in r267961 and r267973 again. Fix for issues reported will follow.
# 37a107a4	27-Jun-2014	Glen Barber <gjb@FreeBSD.org>	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
# 3da1cf1e	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
# 093e059c	06-Jun-2014	Jilles Tjoelker <jilles@FreeBSD.org>	ktrace: Use designated initializers for the data_lengths array. In the .o file, this only changes some line numbers (head amd64) because element 0 is no longer explicitly initialized. This should make bugs like FreeBSD-SA-14:12.ktrace less likely. Discussed with: des MFC after: 1 week
# 4a144410	16-Mar-2014	Robert Watson <rwatson@FreeBSD.org>	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
# 3fded357	18-Sep-2013	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Fix panic in ktrcapfail() when no capability rights are passed. While here, correct all consumers to pass NULL instead of 0 as we pass capability rights as pointers now, not uint64_t. Reported by: Daniel Peyrolon Tested by: Daniel Peyrolon Approved by: re (marius)
# 7008be5b	04-Sep-2013	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
# 5050aa86	22-Oct-2012	Konstantin Belousov <kib@FreeBSD.org>	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 88bf5036	20-Apr-2012	John Baldwin <jhb@FreeBSD.org>	Include the associated wait channel message for context switch ktrace records. kdump supports both the old and new messages. Submitted by: Andrey Zonov andrey zonov org MFC after: 1 week
# 35818d2e	05-Apr-2012	John Baldwin <jhb@FreeBSD.org>	Add new ktrace records for the start and end of VM faults. This gives a pair of records similar to syscall entry and return that a user can use to determine how long page faults take. The new ktrace records are enabled via the 'p' trace type, and are enabled in the default set of trace points. Reviewed by: kib MFC after: 2 weeks
# 526d0bd5	20-Feb-2012	Konstantin Belousov <kib@FreeBSD.org>	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
# 5a01b726	07-Dec-2011	Eitan Adler <eadler@FreeBSD.org>	- Fix ktrace leakage if error is set PR: kern/163098 Submitted by: Loganaden Velvindron <loganaden@devio.us> Approved by: sbruno@ MFC after: 1 month
# e141be6f	18-Oct-2011	Dag-Erling Smørgrav <des@FreeBSD.org>	Revisit the capability failure trace points. The initial implementation only logged instances where an operation on a file descriptor required capabilities which the file descriptor did not have. By adding a type enum to struct ktr_cap_fail, we can catch other types of capability failures as well, such as disallowed system calls or attempts to wrap a file descriptor with more capabilities than it had to begin with.
# c601ad8e	11-Oct-2011	Dag-Erling Smørgrav <des@FreeBSD.org>	Add a new trace point, KTRFAC_CAPFAIL, which traces capability check failures. It is included in the default set for ktrace(1) and kdump(1).
# 8451d0dd	16-Sep-2011	Kip Macy <kmacy@FreeBSD.org>	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# e806d352	06-Apr-2011	John Baldwin <jhb@FreeBSD.org>	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week
# de60a5f3	05-Mar-2011	Dmitry Chagin <dchagin@FreeBSD.org>	Style(9) fix. Fix indentation in comment, double ';' in variable declaration. MFC after: 1 Week
# 22ec0406	05-Mar-2011	Dmitry Chagin <dchagin@FreeBSD.org>	Partially reworked r219042. The reason for this is a bug at ktrops() where process dereferenced without having a lock. This might cause a panic if ktrace was runned with -p flag and the specified process exited between the dropping a lock and writing sv_flags. Since it is impossible to acquire sx lock while holding mtx switch to use asynchronous enqueuerequest() instead of writerequest(). Rename ktr_getrequest_ne() to more understandable name [1]. Requested by: jhb [1] MFC after: 1 Week
# 7705d4b2	25-Feb-2011	Dmitry Chagin <dchagin@FreeBSD.org>	Introduce preliminary support of the show description of the ABI of traced process by adding two new events which records value of process sv_flags to the trace file at process creation/execing/exiting time. MFC after: 1 Month.
# b4c20e5e	25-Feb-2011	Dmitry Chagin <dchagin@FreeBSD.org>	ktrace_resize_pool() locking slightly reworked: 1) do not take a lock around the single atomic operation. 2) do not lose the invariant of lock by dropping/acquiring ktrace_mtx around free() or malloc(). MFC after: 1 Month.
# de5b1952	25-Feb-2011	Alexander Leidinger <netchild@FreeBSD.org>	Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
# d680caab	21-Oct-2010	John Baldwin <jhb@FreeBSD.org>	- When disabling ktracing on a process, free any pending requests that may be left. This fixes a memory leak that can occur when tracing is disabled on a process via disabling tracing of a specific file (or if an I/O error occurs with the tracefile) if the process's next system call is exit(). The trace disabling code clears p_traceflag, so exit1() doesn't do any KTRACE-related cleanup leading to the leak. I chose to make the free'ing of pending records synchronous rather than patching exit1(). - Move KTRACE-specific logic out of kern_(exec\|exit\|fork).c and into kern_ktrace.c instead. Make ktrace_mtx private to kern_ktrace.c as a result. MFC after: 1 month
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 2b3fb615	19-Aug-2010	John Baldwin <jhb@FreeBSD.org>	Fix a whitespace nit and remove a questioning comment. STAILQ_CONCAT() does require the STAILQ the existing list is being added to to already be initialized (it is CONCAT() vs MOVE()).
# fe41d17a	17-Aug-2010	John Baldwin <jhb@FreeBSD.org>	Keep the process locked when calling ktrops() or ktrsetchildren() instead of dropping the lock only to immediately reacquire it.
# a0c87b74	09-Aug-2010	Gavin Atkinson <gavin@FreeBSD.org>	Add descriptions to a handful of sysctl nodes. PR: kern/148580 Submitted by: Galimov Albert <wtfcrap mail.ru> MFC after: 1 week
# a3052d6e	14-Jul-2010	John Baldwin <jhb@FreeBSD.org>	- Document layout of KTR_STRUCT payload in a comment. - Simplify ktrstruct() calling convention by having ktrstruct() use strlen() rather than requiring the caller to hand-code the length of constant strings. MFC after: 1 month
# 49cc1344	21-Jan-2010	John Baldwin <jhb@FreeBSD.org>	MFC 198411: - Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that created shadow copies of these arrays were just using MAXCOMLEN. - Prefer using sizeof() of an array type to explicit constants for the array length in a few places. - Ensure that all of p_comm[] and td_name[] is always zero'd during execve() to guard against any possible information leaks. Previously trailing garbage in p_comm[] could be leaked to userland in ktrace record headers via td_name[].
# 5ca4819d	23-Oct-2009	John Baldwin <jhb@FreeBSD.org>	- Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that created shadow copies of these arrays were just using MAXCOMLEN. - Prefer using sizeof() of an array type to explicit constants for the array length in a few places. - Ensure that all of p_comm[] and td_name[] is always zero'd during execve() to guard against any possible information leaks. Previously trailing garbage in p_comm[] could be leaked to userland in ktrace record headers via td_name[]. Reviewed by: bde
# bcf11e8d	05-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
# 885868cd	10-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
# a56be37e	11-Mar-2009	John Baldwin <jhb@FreeBSD.org>	Add a new type of KTRACE record for sysctl(3) invocations. It uses the internal sysctl_sysctl_name() handler to map the MIB array to a string name and logs this name in the trace log. This can be useful to see exactly which sysctls a thread is invoking. MFC after: 1 month
# 118258f5	03-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Fix a credential reference leak. [1] Close subtle but relatively unlikely race conditions when propagating the vnode write error to other active sessions tracing to the same vnode, without holding a reference on the vnode anymore. [2] PR: kern/126368 [1] Submitted by: rwatson [2] Reviewed by: kib, rwatson MFC after: 4 weeks
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 60e15db9	22-Feb-2008	Dag-Erling Smørgrav <des@FreeBSD.org>	This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payload consists of the null-terminated name and the contents of any structure you wish to record. A new ktrstruct() function constructs and emits a KTR_STRUCT record. It is accompanied by convenience macros for struct stat and struct sockaddr. In kdump(1), KTR_STRUCT records are handled by a dispatcher function that runs stringent sanity checks on its contents before handing it over to individual decoding funtions for each type of structure. Currently supported structures are struct stat and struct sockaddr for the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK and AF_IPX is present but disabled, as I am unable to test it properly. Since 's' was already taken, the letter 't' is used by ktrace(1) to enable KTR_STRUCT trace points, and in kdump(1) to enable their decoding. Derived from patches by Andrew Li <andrew2.li@citi.com>. PR: kern/117836 MFC after: 3 weeks
# 22db15c0	13-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# cb05b60a	09-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# e01eafef	13-Nov-2007	Julian Elischer <julian@FreeBSD.org>	A bunch more files that should probably print out a thread name instead of a process name.
# 30d239bc	24-Oct-2007	Robert Watson <rwatson@FreeBSD.org>	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
# 57b7fe33	29-Aug-2007	John Baldwin <jhb@FreeBSD.org>	Partially revert the previous change. I failed to notice that where ktruserret() is invoked, an unlocked check of the per-process queue is performed inline, thus, we don't lock the ktrace_sx on every userret(). Pointy hat to: jhb Approved by: re (kensmith) Pointy hat recovered from: rwatson
# 34a9edaf	13-Jun-2007	John Baldwin <jhb@FreeBSD.org>	Improve the ktrace locking somewhat to reduce overhead: - Depessimize userret() in kernels where KTRACE is enabled by doing an unlocked check of the per-process queue of pending events before acquiring any locks. Previously ktr_userret() unconditionally acquired the global ktrace_sx lock on every return to userland for every thread, even if ktrace wasn't enabled for the thread. - Optimize the locking in exit() to first perform an unlocked read of p_traceflag to see if ktrace is enabled and only acquire locks and teardown ktrace if the test succeeds. Also, explicitly disable tracing before draining any pending events so the pending events actually get written out. The unlocked read is safe because proc lock is acquired earlier after single-threading so p_traceflag can't change between then and this check (well, it can currently due to a bug in ktrace I will fix next, but that race existed prior to this change as well). Reviewed by: rwatson
# 32f9753c	11-Jun-2007	Robert Watson <rwatson@FreeBSD.org>	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
# 9e223287	31-May-2007	Konstantin Belousov <kib@FreeBSD.org>	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
# 873fbcd7	05-Mar-2007	Robert Watson <rwatson@FreeBSD.org>	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
# 0c14ff0e	04-Mar-2007	Robert Watson <rwatson@FreeBSD.org>	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
# 51fd6380	12-Feb-2007	Mike Pritchard <mpp@FreeBSD.org>	Do not do a vn_close for all references to the ktraced file if we are doing a CLEARFILE option. Do a vrele instead. This prevents a panic later due to v_writecount being negative when the vnode is taken off the freelist. Submitted by: jhb
# 4f506694	17-Jan-2007	Xin LI <delphij@FreeBSD.org>	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.
# a12f193c	16-Dec-2006	Kip Macy <kmacy@FreeBSD.org>	ktrace_cv is no longer used - remove Submitted by: Attilio Rao
# acd3428b	06-Nov-2006	Robert Watson <rwatson@FreeBSD.org>	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
# aed55708	22-Oct-2006	Robert Watson <rwatson@FreeBSD.org>	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
# 53c9158f	31-Jul-2006	John Baldwin <jhb@FreeBSD.org>	Trim an obsolete comment. ktrgenio() stopped doing crazy gymnastics when ktrace was redone to be mostly synchronous again.
# 8838c276	27-Jun-2006	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Use suser_cred(9) instead of checking cr_uid directly. Reviewed by: rwatson
# 33f19bee	28-Mar-2006	John Baldwin <jhb@FreeBSD.org>	- Conditionalize Giant around VFS operations for ALQ, ktrace, and generating a coredump as the result of a signal. - Fix a bug where we could leak a Giant lock if vn_start_write() failed in coredump(). Reported by: jmg (2)
# 033eb86e	30-Jan-2006	Jeff Roberson <jeff@FreeBSD.org>	- Lock access to vrele() with VFS_LOCK_GIANT() rather than mtx_lock(&Giant). Sponsored by: Isilon Systems, Inc.
# 704c9f00	23-Jan-2006	John Baldwin <jhb@FreeBSD.org>	Fix a vnode reference leak in the ktrace code. We always grab a reference to the vnode at the start of ktr_writerequest() but were missing the corresponding vrele() after we finished the write operation. Reported by: jasone
# c5c9bd5b	14-Nov-2005	Robert Watson <rwatson@FreeBSD.org>	In ktr_getrequest(), acquire ktrace_mtx earlier -- while the race currently present is minor and offers no real semantic issues, it also doesn't make sense since an earlier lockless check has already occurred. Also hold the mutex longer, over a manipulation of per-process ktrace state, which requires synchronization. MFC after: 1 month Pointed out by: jhb
# 2c255e9d	13-Nov-2005	Robert Watson <rwatson@FreeBSD.org>	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
# 2bdeb3f9	01-Nov-2005	Robert Watson <rwatson@FreeBSD.org>	Reuse ktr_unused field in ktr_header structure as ktr_tid; populate ktr_tid as part of gathering of ktr header data for new ktrace records. The continued use of intptr_t is required for file layout reasons, and cannot be changed to lwpid_t at this point. MFC after: 1 month Reviewed by: davidxu
# d977a583	31-Oct-2005	Robert Watson <rwatson@FreeBSD.org>	Replace ktr_buffer pointer in struct ktr_header with a ktr_unused intptr_t. The buffer length needs to be written to disk as part of the trace log, but the kernel pointer for the buffer does not. Add a new ktr_buffer pointer to the kernel-only ktrace request structure to hold that pointer. This frees up an integer in the ktrace record format that can be used to hold the threadid, although older ktrace files will have a garbage ktr_buffer field (or more accurately, a kernel pointer value). MFC after: 2 weeks Space requested by: davidxu
# 400a74bf	23-Jun-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Close another information leak in ktrace(2): one was able to find active process groups outside a jail, etc. by using ktrace(2). OK'ed by: rwatson Approved by: re (scottl) MFC after: 1 week
# b0d9aedd	21-Jun-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Add missing unlock. Pointy hat to: pjd Approved by: re (dwhite)
# 4eb7c9f6	09-Jun-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Remove process information leak from inside a jail, when security.bsd.see_other_uids is set to 0, etc. One can check if invisible process is active, by doing: # ktrace -p <pid> If ktrace returns 'Operation not permitted' the process is alive and if returns 'No such process' there is no such process. MFC after: 1 week
# 5ece08f5	09-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Make a SYSCTL_NODE static
# 9454b2d8	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for copyright notices, minor format tweaks as necessary
# 56f21b9d	26-Jul-2004	Colin Percival <cperciva@FreeBSD.org>	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
# 552afd9c	10-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Clean up and wash struct iovec and struct uio handling. Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.
# 7f8a436f	05-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# f4114c3d	26-Feb-2004	John Baldwin <jhb@FreeBSD.org>	Replace the ktrace queue's semaphore with a condition variable instead as it is slightly more efficient since we already have a mutex to protect the queue. Ktrace originally used a semaphore more as a proof of concept.
# 679365e7	21-Jan-2004	Robert Watson <rwatson@FreeBSD.org>	Reduce gratuitous includes: don't include jail.h if it's not needed. Presumably, at some point, you had to include jail.h if you included proc.h, but that is no longer required. Result of: self injury involving adding something to struct prison
# a5896914	11-Nov-2003	Joseph Koshy <jkoshy@FreeBSD.org>	Bound the number of iterations a thread can perform inside ktr_resize_pool(); this eliminates a potential livelock. Return ENOSPC only if we encountered an out-of-memory condition when trying to increase the pool size. Reviewed by: jhb, bde (style)
# b10221ff	10-Nov-2003	Joseph Koshy <jkoshy@FreeBSD.org>	Have utrace(2) return ENOMEM if malloc() fails. Document this error return in its manual page. Reviewed by: jhb
# 8b149b51	07-Aug-2003	John Baldwin <jhb@FreeBSD.org>	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)
# 277576de	07-Aug-2003	John Baldwin <jhb@FreeBSD.org>	The ktrace mutex does not need to be locked around the post of the ktrace semaphore and doing so can lead to a possible reversal. WITNESS would have caught this if semaphores were used more often in the kernel. Submitted by: Ted Unangst <tedu@stanford.edu>, Dawson Engler
# 7c89f162	27-Jul-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 5e26dcb5	09-Jun-2003	John Baldwin <jhb@FreeBSD.org>	- Add a td_pflags field to struct thread for private flags accessed only by curthread. Unlike td_flags, this field does not need any locking. - Replace the td_inktr and td_inktrace variables with equivalent private thread flags. - Move TDF_OLDMASK over to the private flags field so it no longer requires sched_lock.
# 64cc6a13	25-Apr-2003	John Baldwin <jhb@FreeBSD.org>	- Push down Giant around vnode operations in ktrace(). - Mark the ktrace() and utrace() syscalls as being MP safe. - Validate the facs argument to ktrace() prior to doing any vnode operations or acquiring any locks. - Share lock the proctree lock over the entire section that calls ktrsetchildren() and ktrops(). We already did this for process groups. Doing it for the process case closes a small race where a process might go away after we look it up. As a result of this, ktrstchildren() now just asserts that the proctree lock is locked rather than acquiring the lock itself. - Add some missing comments to #else and #endif.
# 75768576	13-Mar-2003	John Baldwin <jhb@FreeBSD.org>	Add a new userland-visible ktrace flag KTR_DROP and an internal ktrace flag KTRFAC_DROP to track instances when ktrace events are dropped due to the request pool being exhausted. When a thread tries to post a ktrace event and is unable to due to no available ktrace request objects, it sets KTRFAC_DROP in its process' p_traceflag field. The next trace event to successfully post from that process will set the KTR_DROP flag in the header of the request going out and clear KTRFAC_DROP. The KTR_DROP flag is the high bit in the type field of the ktr_header structure. Older kdump binaries will simply complain about an unknown type when seeing an entry with KTR_DROP set. Note that KTR_DROP being set on a record in a ktrace file does not tell you anything except that at least one event from this process was dropped prior to this event. The user has no way of knowing what types of events were dropped nor how many were dropped. Requested by: phk
# a5881ea5	13-Mar-2003	John Baldwin <jhb@FreeBSD.org>	- Cache a reference to the credential of the thread that starts a ktrace in struct proc as p_tracecred alongside the current cache of the vnode in p_tracep. This credential is then used for all later ktrace operations on this file rather than using the credential of the current thread at the time of each ktrace event. - Now that we have multiple ktrace-related items in struct proc that are pointers, rename p_tracep to p_tracevp to make it less ambiguous. Requested by: rwatson (1)
# a163d034	18-Feb-2003	Warner Losh <imp@FreeBSD.org>	Back out M_* changes, per decision of the TRB. Approved by: trb
# 44956c98	21-Jan-2003	Alfred Perlstein <alfred@FreeBSD.org>	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 316ec49a	02-Oct-2002	Scott Long <scottl@FreeBSD.org>	Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates. Reviewed by: jake, peter, jhb
# 50c22331	30-Sep-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Plug memory leaks. Detected by: FlexeLint Approved by: jhb
# c9e7d28e	11-Sep-2002	John Baldwin <jhb@FreeBSD.org>	- Change utrace ktrace events to malloc the work buffer before getting a request structure. - Re-optimize the case of utrace being disabled by doing an explicit KTRPOINT check instead of relying on the one in ktr_getrequest() so that we don't waste time on a malloc in the non-tracing case. - Change utrace() to return an error if the copyin() fails. Before it would just ignore the request but still return success. This last is a change in behavior and can be backed out if necessary.
# 1d3ab182	11-Sep-2002	John Baldwin <jhb@FreeBSD.org>	Remove support for synchronous ktrace requests now that none exist anymore. They were an ugly, gross hack.
# b92584a6	11-Sep-2002	John Baldwin <jhb@FreeBSD.org>	- Change ktrace genio events to only copy up to ktr_geniosize bytes of a transfer to a malloc'd buffer and use that bufer for the ktrace event. This means that genio ktrace events no longer need to be synchronous. - Now that ktr_buffer isn't overloaded to sometimes point to a cached uio pointer for genio requests and always points to a malloc'd buffer if not NULL, free the buffer in ktr_freerequest() instead of in ktr_writerequest(). This closes a memory leak for ktrace events that used a malloc'd buffer that had their vnode ripped out from under them while they were on the todo list. Suggested by: bde (1, in principle)
# 12301fc3	11-Sep-2002	John Baldwin <jhb@FreeBSD.org>	- Add a kern.ktrace sysctl node. - Rename kern.ktrace_request_pool tunable/sysctl to kern.ktrace.request_pool. - Add a variable to control the max amount of data to log for genio events. This variable is tunable via the tunable/sysctl kern.ktrace.genio_size and defaults to one page.
# 4b3aac3d	11-Sep-2002	John Baldwin <jhb@FreeBSD.org>	Change namei and syscall ktrace events to malloc work buffers before obtaining a ktr_request structure from the free pool so we can avoid starving other threads of ktr_request structures.
# 177142e4	19-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	Pass active_cred and file_cred into the MAC framework explicitly for mac_check_vnode_{poll,read,stat,write}(). Pass in fp->f_cred when calling these checks with a struct file available. Otherwise, pass NOCRED. All currently MAC policies use active_cred, but could now offer the cached credential semantic used for the base system security model. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 7f724f8b	19-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	Break out mac_check_vnode_op() into three seperate checks: mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write(). This improves the consistency with other existing vnode checks, and allows policies to avoid implementing switch statements to determine what operations they do and do not want to authorize. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# fbd140c7	01-Aug-2002	John Baldwin <jhb@FreeBSD.org>	If we fail to write to a vnode during a ktrace write, then we drop all other references to that vnode as a trace vnode in other processes as well as in any pending requests on the todo list. Thus, it is possible for a ktrace request structure to have a NULL ktr_vp when it is destroyed in ktr_freerequest(). We shouldn't call vrele() on the vnode in that case. Reported by: bde
# 467a273c	31-Jul-2002	Robert Watson <rwatson@FreeBSD.org>	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the ktrace write operation so that it invokes the MAC framework's vnode write authorization check. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 7f05b035	28-Jun-2002	Alfred Perlstein <alfred@FreeBSD.org>	More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.
# ea3fc8e4	06-Jun-2002	John Baldwin <jhb@FreeBSD.org>	Overhaul the ktrace subsystem a bit. For the most part, the actual vnode operations to dump a ktrace event out to an output file are now handled asychronously by a ktrace worker thread. This enables most ktrace events to not need Giant once p_tracep and p_traceflag are suitably protected by the new ktrace_lock. There is a single todo list of pending ktrace requests. The various ktrace tracepoints allocate a ktrace request object and tack it onto the end of the queue. The ktrace kernel thread grabs requests off the head of the queue and processes them using the trace vnode and credentials of the thread triggering the event. Since we cannot assume that the user memory referenced when doing a ktrgenio() will be valid and since we can't access it from the ktrace worker thread without a bit of hassle anyways, ktrgenio() requests are still handled synchronously. However, in order to ensure that the requests from a given thread still maintain relative order to one another, when a synchronous ktrace event (such as a genio event) is triggered, we still put the request object on the todo list to synchronize with the worker thread. The original thread blocks atomically with putting the item on the queue. When the worker thread comes across an asynchronous request, it wakes up the original thread and then blocks to ensure it doesn't manage to write a later event before the original thread has a chance to write out the synchronous event. When the original thread wakes up, it writes out the synchronous using its own context and then finally wakes the worker thread back up. Yuck. The sychronous events aren't pretty but they do work. Since ktrace events can be triggered in fairly low-level areas (msleep() and cv_wait() for example) the ktrace code is designed to use very few locks when posting an event (currently just the ktrace_mtx lock and the vnode interlock to bump the refcoun on the trace vnode). This also means that we can't allocate a ktrace request object when an event is triggered. Instead, ktrace request objects are allocated from a pre-allocated pool and returned to the pool after a request is serviced. The size of this pool defaults to 100 objects, which is about 13k on an i386 kernel. The size of the pool can be adjusted at compile time via the KTRACE_REQUEST_POOL kernel option, at boot time via the kern.ktrace_request_pool loader tunable, or at runtime via the kern.ktrace_request_pool sysctl. If the pool of request objects is exhausted, then a warning message is printed to the console. The message is rate-limited in that it is only printed once until the size of the pool is adjusted via the sysctl. I have tested all kernel traces but have not tested user traces submitted by utrace(2), though they should work fine in theory. Since a ktrace request has several properties (content of event, trace vnode, details of originating process, credentials for I/O, etc.), I chose to drop the first argument to the various ktrfoo() functions. Currently the functions just assume the event is posted from curthread. If there is a great desire to do so, I suppose I could instead put back the first argument but this time make it a thread pointer instead of a vnode pointer. Also, KTRPOINT() now takes a thread as its first argument instead of a process. This is because the check for a recursive ktrace event is now per-thread instead of process-wide. Tested on: i386 Compiles on: sparc64, alpha
# f44d9e24	18-May-2002	John Baldwin <jhb@FreeBSD.org>	Change p_can{debug,see,sched,signal}()'s first argument to be a thread pointer instead of a proc pointer and require the process pointed to by the second argument to be locked. We now use the thread ucred reference for the credential checks in p_can*() as a result. p_canfoo() should now no longer need Giant.
# ba626c1d	16-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Lock proctree_lock instead of pgrpsess_lock.
# a7ff7443	13-Apr-2002	John Baldwin <jhb@FreeBSD.org>	- Change the first argument of ktrcanset(), ktrsetchildren(), and ktrops() to a thread pointer so that ktrcanset() can use td_ucred. - Add some proc locking to partially protect p_tracep and p_traceflag.
# 44731cab	01-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
# 4d77a549	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# 628abf6c	15-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Giant pushdown for read/write/pread/pwrite syscalls. kern/kern_descrip.c: Aquire Giant in fdrop_locked when file refcount hits zero, this removes the requirement for the caller to own Giant for the most part. kern/kern_ktrace.c: Aquire Giant in ktrgenio, simplifies locking in upper read/write syscalls. kern/vfs_bio.c: Aquire Giant in bwillwrite if needed. kern/sys_generic.c Giant pushdown, remove Giant for: read, pread, write and pwrite. readv and writev aren't done yet because of the possible malloc calls for iov to uio processing. kern/sys_socket.c Grab giant in the socket fo_read/write functions. kern/vfs_vnops.c Grab giant in the vnode fo_read/write functions.
# 6bd7ad69	27-Feb-2002	John Baldwin <jhb@FreeBSD.org>	Add a comment about an unlocked access to p_ucred that will go away in the near future.
# a854ed98	27-Feb-2002	John Baldwin <jhb@FreeBSD.org>	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
# f591779b	23-Feb-2002	Seigo Tanimura <tanimura@FreeBSD.org>	Lock struct pgrp, session and sigio. New locks are: - pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members. Please refer to sys/proc.h for the coverage of these locks. Changes on the pgrp/session interface: - pgfind() needs the pgrpsess_lock held. - The caller of enterpgrp() is responsible to allocate a new pgrp and session. - Call enterthispgrp() in order to enter an existing pgrp. - pgsignal() requires a pgrp lock held. Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)
# 79deba82	23-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	Fix ktrace enablement/disablement races that can result in a vnode ref count panic. Bug noticed by: ps Reviewed by: ps MFC after: 1 day
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 356861db	30-Aug-2001	Matthew Dillon <dillon@FreeBSD.org>	Remove the MPSAFE keyword from the parser for syscalls.master. Instead introduce the [M] prefix to existing keywords. e.g. MSTD is the MP SAFE version of STD. This is prepatory for a massive Giant lock pushdown. The old MPSAFE keyword made syscalls.master too messy. Begin comments MP-Safe procedures with the comment: /* * MPSAFE / This comments means that the procedure may be called without Giant held (The procedure itself may still need to obtain Giant temporarily to do its thing). sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE sv_transtrap() is now MP SAFE and assumed to be MP SAFE ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown) trapsignal() is now MP SAFE (Giant Pushdown) Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant) test in syscall[2]() in /*/trap.c now do not. Instead they explicitly unlock Giant if they previously obtained it, and then assert that it is no longer held to catch broken system calls. Rebuild syscall tables.
# a0f75161	05-Jul-2001	Robert Watson <rwatson@FreeBSD.org>	o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx(). The p_can(...) construct was a premature (and, it turns out, awkward) abstraction. The individual calls to p_canxxx() better reflect differences between the inter-process authorization checks, such as differing checks based on the type of signal. This has a side effect of improving code readability. o Replace direct credential authorization checks in ktrace() with invocation of p_candebug(), while maintaining the special case check of KTR_ROOT. This allows ktrace() to "play more nicely" with new mandatory access control schemes, as well as making its authorization checks consistent with other "debugging class" checks. o Eliminate "privused" construct for p_can*() calls which allowed the caller to determine if privilege was required for successful evaluation of the access control check. This primitive is currently unused, and as such, serves only to complicate the API. Approved by: ({procfs,linprocfs} changes) des Obtained from: TrustedBSD Project
# b1fc0ec1	25-May-2001	Robert Watson <rwatson@FreeBSD.org>	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
# fb919e4d	01-May-2001	Mark Murray <markm@FreeBSD.org>	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# 33a9ed9d	23-Apr-2001	John Baldwin <jhb@FreeBSD.org>	Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning. Reviewed by: -smp, dfr, jake
# 1005a129	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Convert the allproc and proctree locks from lockmgr locks to sx locks.
# 91421ba2	20-Feb-2001	Robert Watson <rwatson@FreeBSD.org>	o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project
# bdfa4f04	08-Jan-2001	Alfred Perlstein <alfred@FreeBSD.org>	Don't use SCARG. Pointed out by: bde
# 0bad156a	06-Jan-2001	Alfred Perlstein <alfred@FreeBSD.org>	Limit size of passed in data for utrace function. Requested by: rwatson Obtained from: NetBSD
# 98f03f90	23-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Protect proc.p_pptr and proc.p_children/p_sibling with the proctree_lock. linprocfs not locked pending response from informal maintainer. Reviewed by: jhb, -smp@
# c0c25570	12-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
# 553629eb	22-Nov-2000	Jake Burkholder <jake@FreeBSD.org>	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
# 46aa3347	27-Oct-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Convert all users of fldoff() to offsetof(). fldoff() is bad because it only takes a struct tag which makes it impossible to use unions, typedefs etc. Define __offsetof() in <machine/ansi.h> Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h> Remove myriad of local offsetof() definitions. Remove includes of <stddef.h> in kernel code. NB: Kernelcode should never include from /usr/include ! Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API. Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001. Paritials reviews by: various. Significant brucifications by: bde
# 62ae6c89	06-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Add KTR, a facility that logs kernel events in order to to facilitate debugging. Acquired from: BSDi (BSD/OS) Submitted by: dfr, grog, jake, jhb
# 387d2c03	29-Aug-2000	Robert Watson <rwatson@FreeBSD.org>	o Centralize inter-process access control, introducing: int p_can(p1, p2, operation, privused) which allows specification of subject process, object process, inter-process operation, and an optional call-by-reference privused flag, allowing the caller to determine if privilege was required for the call to succeed. This allows jail, kern.ps_showallprocs and regular credential-based interaction checks to occur in one block of code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL, and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a series of static function checks in kern_prot, which should not be invoked directly. o Commented out capabilities entries are included for some checks. o Update most inter-process authorization to make use of p_can() instead of manual checks, PRISON_CHECK(), P_TRESPASS(), and kern.ps_showallprocs. o Modify suser{,_xxx} to use const arguments, as it no longer modifies process flags due to the disabling of ASU. o Modify some checks/errors in procfs so that ENOENT is returned instead of ESRCH, further improving concealment of processes that should not be visible to other processes. Also introduce new access checks to improve hiding of processes for procfs_lookup(), procfs_getattr(), procfs_readdir(). Correct a bug reported by bp concerning not handling the CREATE case in procfs_lookup(). Remove volatile flag in procfs that caused apparently spurious qualifier warnigns (approved by bde). o Add comment noting that ktrace() has not been updated, as its access control checks are different from ptrace(), whereas they should probably be the same. Further discussion should happen on this topic. Reviewed by: bde, green, phk, freebsd-security, others Approved by: bde Obtained from: TrustedBSD Project
# f2a2857b	11-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
# 9d1cfdce	07-Jul-2000	Brian Feldman <green@FreeBSD.org>	Change that &@!$# UIO_READ to be UIO_WRITE. I tested the ktrace stuff, but somehow... pass the pointy hat, again!
# e6796b67	03-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Move the truncation code out of vn_open and into the open system call after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired. Obtained from: BSD/OS
# 42ebfbf2	02-Jul-2000	Brian Feldman <green@FreeBSD.org>	Modify ktrace's general I/O tracing, ktrgenio(), to use a struct uio * instead of a struct iovec * array and int len. Get rid of stupidly trying to allocate all of the memory and copyin()ing the entire iovec[], and instead just do the proper VOP_WRITE() in ktrwrite() using a copy of the struct uio that the syscall originally used. This solves the DoS which could easily be performed; to work around the DoS, one could also remove "options KTRACE" from the kernel. This is a very strong MFC candidate for 4.1. Found by: art@OpenBSD.org
# 2c9b67a8	30-Apr-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
# 762e6b85	15-Dec-1999	Eivind Eklund <eivind@FreeBSD.org>	Introduce NDFREE (and remove VOP_ABORTOP)
# 2e3c8fcb	16-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	This is a partial commit of the patch from PR 14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
# a93fdaac	04-Oct-1999	Marcel Moolenaar <marcel@FreeBSD.org>	Fix style bug. Submitted by: bde
# 2c42a146	29-Sep-1999	Marcel Moolenaar <marcel@FreeBSD.org>	sigset_t change (part 2 of 5) ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.
# 2b635927	20-Sep-1999	Brian Feldman <green@FreeBSD.org>	Kill some spammage that seems to have gotten in through diffs from marcel's local tree (which happens to have some things we don't :)
# 85fce0e4	20-Sep-1999	Marcel Moolenaar <marcel@FreeBSD.org>	When bcopying the program name into the ktrace header, make sure we include the terminating zero by copying MAXCOMLEN + 1 bytes. This fixes the garbage that occasionally appeared behind the programname when it is at least MAXCOMLEN bytes long (such as communicator-4.61-bin).
# 8c0abefa	30-Aug-1999	Dima Ruban <dima@FreeBSD.org>	ktrace should not follow symlinks either. Suggested by: bde
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 71ddfdbb	16-Jun-1999	Dmitrij Tejblum <dt@FreeBSD.org>	Make sure syscall arguments properly aligned in ktrace records. Make syscall return value a register_t. Based on a patch from Hidetoshi Shimokawa. Mostly reviewed by: Hidetoshi Shimokawa and Bruce Evans.
# 75c13541	28-Apr-1999	Poul-Henning Kamp <phk@FreeBSD.org>	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
# efb73e5a	09-Dec-1998	Robert V. Baron <rvb@FreeBSD.org>	In ktrwrite, use uio_procp = curproc vs 0
# 1c5bb3ea	10-Nov-1998	Peter Wemm <peter@FreeBSD.org>	add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()
# d68fa50c	20-Feb-1998	Bruce Evans <bde@FreeBSD.org>	Don't depend on "implicit int".
# 1cd52ec3	05-Dec-1997	Bruce Evans <bde@FreeBSD.org>	Don't include <sys/lock.h> in headers when only `struct simplelock' is required. Fixed everything that depended on the pollution.
# cb226aaa	06-Nov-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
# a1c995b6	12-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
# 55166637	11-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde
# 3ac4d1ef	22-Mar-1997	Bruce Evans <bde@FreeBSD.org>	Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 996c772f	09-Feb-1997	John Dyson <dyson@FreeBSD.org>	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# d920a829	22-Sep-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Remove the extra length field from the utrace entries. It's redundant.
# e6c4b9ba	19-Sep-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Add the utrace(caddr_t addr,size_t len) syscall, that will store the data pointed at in a ktrace file, if this process is being ktrace'ed. I'm using this to profile malloc usage. The advantage is that there is no context around this call, ie, no open file or socket, so it will work in any process, and you can decide if you want it to collect data or not.
# d1c4c866	04-Aug-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Add separate kmalloc classes for BIO buffers and Ktrace info.
# edbfedac	11-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]
# b75356e1	10-Mar-1996	Jeffrey Hsu <hsu@FreeBSD.org>	From Lite2: proc LIST changes. Reviewed by: david & bde
# db6a20e2	03-Jan-1996	Garrett Wollman <wollman@FreeBSD.org>	Converted two options over to the new scheme: USER_LDT and KTRACE.
# 87b6de2b	14-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
# 98d93822	02-Dec-1995	Bruce Evans <bde@FreeBSD.org>	Completed function declarations and/or added prototypes.
# d2d3e875	11-Nov-1995	Bruce Evans <bde@FreeBSD.org>	Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# 797f2d22	02-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
# f23b4c91	18-Aug-1994	Garrett Wollman <wollman@FreeBSD.org>	Fix up some sloppy coding practices: - Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above. NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources