#
366281 |
|
30-Sep-2020 |
kib |
MFC r366085, r366113: Do not leak oldvmspace if image activation failed
|
#
356634 |
|
11-Jan-2020 |
kevans |
MFC r356359-r356360: kern_mmap: add fpcheck invariant for linux_mmap
r356359: kern_mmap: add a variant that allows caller to inspect fp
Linux mmap rejects mmap() on a write-only file with EACCES. linux_mmap_common currently does a fun dance to grab the fp associated with the passed in fd, validates it, then drops the reference and calls into kern_mmap(). Doing so is perhaps both fragile and premature; there's still plenty of chance for the request to get rejected with a more appropriate error, and it's prone to a race where the file we ultimately mmap has changed after it drops its referenced.
This change alleviates the need to do this by providing a kern_mmap variant that allows the caller to inspect the fp just before calling into the fileop layer. The callback takes flags, prot, and maxprot as one could imagine scenarios where any of these, in conjunction with the file itself, may influence a caller's decision.
The file type check in the linux compat layer has been removed; EINVAL is seemingly not an appropriate response to the file not being a vnode or device. The fileop layer will reject the operation with ENODEV if it's not supported, which more closely matches the common linux description of mmap(2) return values.
If we discover that we're allowing an mmap() on a file type that Linux normally wouldn't, we should restrict those explicitly.
r356360: kern_mmap: restore character deleted in transit
|
#
331722 |
|
29-Mar-2018 |
eadler |
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes.
Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes.
Requested by: gjb (re)
|
#
330897 |
|
14-Mar-2018 |
eadler |
Partial merge of the SPDX changes
These changes are incomplete but are making it difficult to determine what other changes can/should be merged.
No objections from: pfg
|
#
328298 |
|
23-Jan-2018 |
jhb |
MFC 320900,323882,324224,324226,324228,326986,326988,326989,326990,326993, 326994,326995,327004: Various fixes for pathconf(2).
The original change to use vop_stdpathconf() more widely was motivated by a panic due to recent AIO-related changes. However, bde@ reported that vop_stdpathconf() contained too many settings that were not filesystem-independent. The end result of this set of patches is to fix the AIO-related panic via use of a trimmed-down vop_stdpathconf() while also adding support for missing pathconf variables in various filesystems (and removing a few settings incorrectly reported as supported).
320900: Consistently use vop_stdpathconf() for default pathconf values.
Update filesystems not currently using vop_stdpathconf() in pathconf VOPs to use vop_stdpathconf() for any configuration variables that do not have filesystem-specific values. vop_stdpathconf() is used for variables that have system-wide settings as well as providing default values for some values based on system limits. Filesystems can still explicitly override individual settings.
323882: Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.
Move handling of these three pathconf() variables out of vop_stdpathconf() and into devfs_pathconf() as TTY devices can only be devfs files. In addition, only return settings for these three variables for devfs devices whose device switch has the D_TTY flag set.
324224: Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX pathconf() requests in cd9660.
cd9660 only supports symlinks with Rock Ridge extensions, so _PC_SYMLINK_MAX is conditional on Rock Ridge.
324226: Return 64 for pathconf(_PC_FILESIZEBITS) on tmpfs.
324228: Flesh out pathconf() on UDF.
- Return 64 bits for _PC_FILESIZEBITS. - Handle _PC_SYMLINK_MAX. - Defer _PC_PATH_MAX to vop_stdpathconf().
326986: Add a custom VOP_PATHCONF method for fdescfs.
The method handles NAME_MAX and LINK_MAX explicitly. For all other pathconf variables, the method passes the request down to the underlying file descriptor. This requires splitting a kern_fpathconf() syscallsubr routine out of sys_fpathconf(). Also, to avoid lock order reversals with vnode locks, the fdescfs vnode is unlocked around the call to kern_fpathconf(), but with the usecount of the vnode bumped.
326988: Add a custom VOP_PATHCONF method for fuse.
This method handles _PC_FILESIZEBITS, _PC_SYMLINK_MAX, and _PC_NO_TRUNC. For other values it defers to vop_stdpathconf().
326989: Support _PC_FILESIZEBITS in msdosfs' VOP_PATHCONF().
326990: Handle _PC_FILESIZEBITS and _PC_NO_TRUNC for smbfs' VOP_PATHCONF().
326993: Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().
Having all filesystems fall through to default values isn't always correct and these values can vary for different filesystem implementations. Most of these changes just use the existing default values with a few exceptions: - Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact permissions check this claims for chown(). - Use NANDFS_NAME_LEN for NAME_MAX for nandfs. - Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to indicate hard links aren't supported.
326994: Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX for devfs' VOP_PATHCONF().
326995: Use FUSE_LINK_MAX for LINK_MAX in fuse' VOP_PATHCONF().
Should have included this in r326993.
327004: Rework pathconf handling for FIFOs.
On the one hand, FIFOs should respect other variables not supported by the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.). These values are fs-specific and must come from a fs-specific method. On the other hand, filesystems that support FIFOs are required to support _PC_PIPE_BUF on directory vnodes that can contain FIFOs. Given this latter requirement, once the fs-specific VOP_PATHCONF method supports _PC_PIPE_BUF for directories, it is also suitable for FIFOs permitting a single VOP_PATHCONF method to be used for both FIFOs and non-FIFOs.
To that end, retire all of the FIFO-specific pathconf methods from filesystems and change FIFO-specific vnode operation switches to use the existing fs-specific VOP_PATHCONF method. For fifofs, set it's VOP_PATHCONF to VOP_PANIC since it should no longer be used.
While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that only filesystems supporting FIFOs will report a value. In addition, only report a valid _PC_PIPE_BUF for directories and FIFOs.
PR: 219851 Sponsored by: Chelsio Communications
|
#
318244 |
|
12-May-2017 |
brooks |
MFC r317845-r317846
r317845: Provide a freebsd32 implementation of sigqueue()
The previous misuse of sys_sigqueue() was sending random register or stack garbage to 64-bit targets. The freebsd32 implementation preserves the sival_int member of value when signaling a 64-bit process.
Document the mixed ABI implementation of union sigval and the incompability of sival_ptr with pointer integrity schemes.
Reviewed by: kib, wblock Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10605
r317846: Regen post r317845.
MFC with: r317845 Sponsored by: DARPA, AFRL
|
#
317618 |
|
01-May-2017 |
vangyzen |
MFC r315526
Add clock_nanosleep()
Add a clock_nanosleep() syscall, as specified by POSIX. Make nanosleep() a wrapper around it.
Attach the clock_nanosleep test from NetBSD. Adjust it for the FreeBSD behavior of updating rmtp only when interrupted by a signal. I believe this to be POSIX-compliant, since POSIX mentions the rmtp parameter only in the paragraph about EINTR. This is also what Linux does. (NetBSD updates rmtp unconditionally.)
Copy the whole nanosleep.2 man page from NetBSD because it is complete and closely resembles the POSIX description. Edit, polish, and reword it a bit, being sure to keep any relevant text from the FreeBSD page.
Regenerate syscall files.
Relnotes: yes Sponsored by: Dell EMC
|
#
317588 |
|
29-Apr-2017 |
dchagin |
MFC r316288:
Add kern_mincore() helper for mincore() syscall.
|
#
315555 |
|
19-Mar-2017 |
trasz |
MFC r313281:
Add kern_cpuset_getaffinity() and kern_cpuset_getaffinity(), and use it in compats instead of their sys_*() counterparts.
Sponsored by: DARPA, AFRL
|
#
315554 |
|
19-Mar-2017 |
trasz |
MFC r313015:
Add kern_cpuset_getid() and kern_cpuset_setid(), and use them in compat32 instead of their sub_*() counterparts.
Sponsored by: DARPA, AFRL
|
#
315553 |
|
19-Mar-2017 |
trasz |
MFC r313018:
Add kern_pread() and kern_pwrite(), and use it in compats instead of their sys_*() counterparts. The svr4 is left unchanged.
Sponsored by: DARPA, AFRL
|
#
315549 |
|
19-Mar-2017 |
trasz |
MFC r312988:
Add kern_listen(), kern_shutdown(), and kern_socket(), and use them instead of their sys_*() counterparts in various compats. The svr4 is left untouched, because there's no point.
Sponsored by: DARPA, AFRL
|
#
314334 |
|
27-Feb-2017 |
kib |
MFC kern_mmap(9) and related helpers.
MFC r302514 (by rwatson): Audit file-descriptor arguments to I/O system calls such as read(2), write(2), dup(2), and mmap(2).
MFC r302524 (by rwatson): When mmap(2) is used with a vnode, capture vnode attributes in the audit trail.
MFC r313352 (by trasz): Add kern_vm_mmap2(), kern_vm_mprotect(), kern_vm_msync(), kern_vm_munlock(), kern_vm_munmap(), and kern_vm_madvise().
MFC r313655: Change type of the prot parameter for kern_vm_mmap() from vm_prot_t to int.
MFC r313696: Rework r313352.
|
#
313843 |
|
17-Feb-2017 |
kib |
MFC r313715: Order alphabetically.
|
#
313450 |
|
08-Feb-2017 |
jhb |
MFC 310638: Rename the 'flags' argument to getfsstat() to 'mode' and validate it.
This argument is not a bitmask of flags, but only accepts a single value. Fail with EINVAL if an invalid value is passed to 'flag'. Rename the 'flags' argument to getmntinfo(3) to 'mode' as well to match.
This is a followup to r308088.
|
#
304987 |
|
29-Aug-2016 |
kib |
MFC r304182 (by ed): Let CloudABI use fdatasync() as well.
MFC r304185 (by ed): Eliminate use of sys_fsync() and sys_fdatasync().
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
296060 |
|
25-Feb-2016 |
markj |
Improve error handling for posix_fallocate(2) and posix_fadvise(2).
- Set td_errno so that ktrace and dtrace can obtain the syscall error number in the usual way. - Pass negative error numbers directly to the syscall layer, as they're not intended to be returned to userland.
Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5425
|
#
286631 |
|
11-Aug-2015 |
ed |
Add support for anonymous kqueues.
CloudABI's polling system calls merge the concept of one-shot polling (poll, select) and stateful polling (kqueue). They share the same data structures.
Extend FreeBSD's kqueue to provide support for waiting for events on an anonymous kqueue. Unlike stateful polling, there is no need to support timeouts, as an additional timer event could be used instead. Furthermore, it makes no sense to use a different number of input and output kevents. Merge this into a single argument.
Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3307
|
#
286618 |
|
11-Aug-2015 |
ed |
Introduce kern_cap_rights_limit().
The existing sys_cap_rights_limit() expects that a cap_rights_t object lives in userspace. It is therefore hard to call into it from kernelspace.
Move the interesting bits of sys_cap_rights_limit() into kern_cap_rights_limit(), so that we can call into it from the CloudABI compatibility layer.
Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3314
|
#
286309 |
|
05-Aug-2015 |
ed |
Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set of Capsicum rights in place. The reason for this is that it makes it easier to obtain uniform behaviour across different operating systems.
By explicitly whitelisting the operations, we can return consistent error codes, but also prevent applications from depending OS-specific behaviour.
Extend kern_kqueue() to take an additional struct filecaps that is passed on to falloc_caps(). Update the existing consumers to pass in NULL.
Differential Revision: https://reviews.freebsd.org/D3259
|
#
286146 |
|
01-Aug-2015 |
ed |
Add kern_shm_open().
This allows you to specify the capabilities that the new file descriptor should have. This allows us to create shared memory objects that only have the rights we're interested in.
The idea behind restricting the rights is that it makes it a lot easier for CloudABI to get consistent behaviour across different operating systems. We only need to make sure that a shared memory implementation consistently implements the operations that are whitelisted.
Approved by: kib Obtained from: https://github.com/NuxiNL/freebsd
|
#
286021 |
|
29-Jul-2015 |
ed |
Make pipes in CloudABI work.
Summary: Pipes in CloudABI are unidirectional. The reason for this is that CloudABI attempts to provide a uniform runtime environment across different flavours of UNIX.
Instead of implementing a custom pipe that is unidirectional, we can simply reuse Capsicum permission bits to support this. This is nice, because CloudABI already attempts to restrict permission bits to correspond with the operations that apply to a certain file descriptor.
Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes a pair of filecaps. These filecaps are passed to the newly introduced falloc_caps() function that creates the descriptors with rights in place.
Test Plan: CloudABI pipes seem to be created with proper rights in place:
https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44
Reviewers: jilles, mjg
Reviewed By: mjg
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3236
|
#
285358 |
|
10-Jul-2015 |
ed |
Add missing const keyword to kern_sigaction()'s 'act' parameter.
This structure is not modified by the function. Also add const to sigact_flag_test(), as it is called by kern_sigaction().
|
#
285356 |
|
10-Jul-2015 |
mjg |
fd: split kern_dup flags argument into actual flags and a mode
Tidy up the code inside to switch on the mode.
|
#
285323 |
|
09-Jul-2015 |
ed |
Add implementations for some of the CloudABI file descriptor system calls.
All of the CloudABI system calls that operate on file descriptors of an arbitrary type are prefixed with fd_. This change adds wrappers for most of these system calls around their FreeBSD equivalents.
The dup2() system call present on CloudABI deviates from POSIX, in the sense that it can only be used to replace existing file descriptor. It cannot be used to create new ones. The reason for this is that this is inherently thread-unsafe. Furthermore, there is no need on CloudABI to use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no special meaning.
This change exposes the kern_dup() through <sys/syscallsubr.h> and puts the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag, FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not allocated.
Differential Revision: https://reviews.freebsd.org/D3035 Reviewed by: mjg
|
#
283440 |
|
24-May-2015 |
dchagin |
For future use in the Linuxulator:
1. Add a kern_kqueue() counterpart for kqueue() with flags parameter.
2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp() counterpart for kern_kevent() with file pointer parameter instead of file descriptor an pass the buck to it.
Suggested by: mjg [2]
Differential Revision: https://reviews.freebsd.org/D1091 Reviewed by: trasz
|
#
283377 |
|
24-May-2015 |
dchagin |
In preparation for switching linuxulator to the use the native 1:1 threads split sys_sched_getparam(), sys_sched_setparam(), sys_sched_getscheduler(), sys_sched_setscheduler() to their kern_* counterparts and add targettd parameter to allow specify the target thread directly by callee.
Differential Revision: https://reviews.freebsd.org/D1034 Reviewed by: trasz
|
#
283374 |
|
24-May-2015 |
dchagin |
In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator).
Linuxulator temporarily uses first thread in proc.
Move linux_sched_rr_get_interval() to the MI part.
Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz
|
#
283373 |
|
24-May-2015 |
dchagin |
In preparation for switching linuxulator to the use the native 1:1 threads introduce kern_thr_alloc() which will be used later in the linux_clone().
Differential Revision: https://reviews.freebsd.org/D1029 Reviewed by: trasz
|
#
283372 |
|
24-May-2015 |
dchagin |
In preparation for switching linuxulator to the use the native 1:1 threads split sys_thr_exit() up into sys_thr_exit() and kern_thr_exit(). Move Where the second will be used in linux_exit() system call later.
Differential Revision: https://reviews.freebsd.org/D1028 Reviewed by: trasz
|
#
281829 |
|
21-Apr-2015 |
trasz |
Modify kern___getcwd() to take max pathlen limit as an additional argument. This will be used for the Linux emulation layer - for Linux, PATH_MAX is 4096 and not 1024.
Differential Revision: https://reviews.freebsd.org/D2335 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation
|
#
281551 |
|
15-Apr-2015 |
trasz |
Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This adds missing jail and MAC checks.
Differential Revision: https://reviews.freebsd.org/D2193 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation
|
#
277610 |
|
23-Jan-2015 |
jilles |
Add futimens and utimensat system calls.
The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support.
A new UTIME_* constant might allow setting birthtimes in future.
Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes
|
#
276654 |
|
04-Jan-2015 |
dchagin |
Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char *). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char * paths, except to copy them to a normal char * path.
These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path.
While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h.
Pointed out by: bde
MFC after: 1 week
|
#
274476 |
|
13-Nov-2014 |
kib |
Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat().
This removes one (sometimes two) levels of indirection and consolidates arguments checks.
Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
274462 |
|
13-Nov-2014 |
dchagin |
Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change.
Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month
|
#
273436 |
|
21-Oct-2014 |
mjg |
Eliminate unnecessary memory allocation in sys_getgroups and its ibcs2 counterpart.
|
#
272132 |
|
25-Sep-2014 |
kib |
Fix fcntl(2) compat32 after r270691. The copyin and copyout of the struct flock are done in the sys_fcntl(), which mean that compat32 used direct access to userland pointers.
Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which performs neccessary userland memory accesses, and use it from both native and compat32 fcntl syscalls.
Reported by: jhibbits Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
270444 |
|
24-Aug-2014 |
mjg |
Fix getppid for traced processes.
Traced processes always have the tracer set as the parent. Utilize proc_realparent to obtain the right process when needed.
Reviewed by: kib MFC after: 1 week
|
#
269669 |
|
07-Aug-2014 |
bz |
Split up sys_ktimer_getoverrun() into a sys_ and a kern_ variant and export the kern_ version needed by an upcoming linuxolator change.
MFC after: 3 days Sponsored by: DARPA,AFRL
|
#
255708 |
|
19-Sep-2013 |
jhb |
Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes.
Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
|
#
255675 |
|
18-Sep-2013 |
rdivacky |
Revert r255672, it has some serious flaws, leaking file references etc.
Approved by: re (delphij)
|
#
255672 |
|
18-Sep-2013 |
rdivacky |
Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file.
Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich.
Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>
|
#
254481 |
|
18-Aug-2013 |
pjd |
Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2) system calls as unsigned longs have different size on i386 and amd64.
Reported by: jilles Sponsored by: The FreeBSD Foundation
|
#
253530 |
|
21-Jul-2013 |
kib |
Implement compat32 wrappers for the ktimer_* syscalls.
Reported, reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
253494 |
|
20-Jul-2013 |
kib |
id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2).
Reported and tested by: Petr Salinger <Petr.Salinger@seznam.cz> PR: threads/180652 Sponsored by: The FreeBSD Foundation
|
#
250154 |
|
01-May-2013 |
jilles |
Add accept4() system call.
The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().)
The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code.
Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.
|
#
248951 |
|
31-Mar-2013 |
jilles |
Rename do_pipe() to kern_pipe2() and declare it properly.
|
#
243135 |
|
16-Nov-2012 |
kib |
Move the definition of the idtype_t from sys/types.h to sys/wait.h. Fix the bug, use #if __BSD_VISIBLE instead of #if defined(__BSD_VISIBLE), since __BSD_VISIBLE is always defined. Reformat the comments from the Solaris style to KNF.
Reported and reviewed by: bde MFC after: 28 days
|
#
243134 |
|
16-Nov-2012 |
kib |
Alphabetically reorder the forward-declarations of the structures. Add the declaration for enum idtype, to be used later.
Reported and reviewed by: bde MFC after: 28 days
|
#
242958 |
|
13-Nov-2012 |
kib |
Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage.
Allow to get the current rusage information for non-exited processes as well, similar to Solaris.
The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in.
Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state.
PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
|
#
235886 |
|
24-May-2012 |
gleb |
Add kern_fhstat(), adjust sys_fhstat() to use it.
Extend kern_getdirentries() to accept uio segflag and optionally return buffer residue.
Sponsored by: Google Summer of Code 2011
|
#
227502 |
|
14-Nov-2011 |
jhb |
- Split out a kern_posix_fadvise() from the posix_fadvise() system call so it can be used by in-kernel consumers. - Make kern_posix_fallocate() public. - Use kern_posix_fadvise() and kern_posix_fallocate() to implement the freebsd32 wrappers for the two system calls.
|
#
220238 |
|
01-Apr-2011 |
kib |
Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.
In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64.
The changes are hidden under COMPAT_43.
MFC after: 1 month
|
#
220158 |
|
30-Mar-2011 |
kib |
Provide compat32 shims for kldstat(2).
Requested and tested by: jpaetzel MFC after: 1 week
|
#
209613 |
|
30-Jun-2010 |
jhb |
Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
|
#
202113 |
|
11-Jan-2010 |
mckusick |
Background:
When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes.
Solution:
This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are:
setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it.
As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge.
Reported by: jeff Security issues: rwatson
|
#
198508 |
|
27-Oct-2009 |
kib |
Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic.
Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution.
Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
198506 |
|
27-Oct-2009 |
kib |
In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery.
Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop.
Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between.
Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
197049 |
|
09-Sep-2009 |
kib |
kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected.
Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI.
Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week
|
#
195458 |
|
08-Jul-2009 |
trasz |
There is an optimization in chmod(1), that makes it not to call chmod(2) if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.
This change adds lpathconf(3) to make it possible to solve that problem in a clean way.
Reviewed by: rwatson (earlier version) Approved by: re (kib)
|
#
193167 |
|
31-May-2009 |
dchagin |
Split native socketpair() syscall onto kern_socketpair() which should be used by kernel consumers and socketpair() itself.
Approved by: kib (mentor) MFC after: 1 month
|
#
192895 |
|
27-May-2009 |
jamie |
Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings.
Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge().
Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call.
Approved by: bz (mentor)
|
#
191673 |
|
29-Apr-2009 |
jamie |
Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2).
Approved by: bz (mentor)
|
#
188849 |
|
20-Feb-2009 |
ed |
Don't make Linux stat() open character devices to resolve its name.
The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made:
- Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode.
- Remove unneeded printf's from stat() and statfs().
- Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at().
- Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev.
Result:
crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2
Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers.
Reviewed by: kib, netchild, rdivacky (thanks!)
|
#
184849 |
|
11-Nov-2008 |
ed |
Several cleanups related to pipe(2).
- Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors.
- Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed.
- Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2).
- Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call.
Approved by: rdivacky Discussed on: arch
|
#
184183 |
|
22-Oct-2008 |
jhb |
Split the copyout of *base at the end of getdirentries() out leaving the rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland.
Submitted by: ups MFC after: 1 week
|
#
177997 |
|
08-Apr-2008 |
kib |
Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.
Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho
|
#
177786 |
|
31-Mar-2008 |
kib |
Implement the openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2), readlinkat(2), renameat(2), symlinkat(2) syscalls.
Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
|
#
176215 |
|
12-Feb-2008 |
ru |
Change readlink(2)'s return type and type of the last argument to match POSIX.
Prodded by: Alexey Lyashkov
|
#
175140 |
|
07-Jan-2008 |
jhb |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method.
Submitted by: rwatson
|
#
170404 |
|
07-Jun-2007 |
jhb |
- Remove unused variable from create_thread(). - Move kern_thr_*() prototype to <sys/syscallsubr.h> where all the other kern_*() prototypes live.
|
#
165403 |
|
20-Dec-2006 |
jkim |
MFP4: (part of) 110058
copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and it is done from its wrapper functions to support 32-bit emulations. After I implemented this, I have briefly referenced NetBSD and Darwin. NetBSD passes copyin()/copyout() function pointers from wrappers. Darwin passes size of message type as an argument, which is actually similar to my first implementation (P4 109706). We may revisit these implementations later.
|
#
160765 |
|
27-Jul-2006 |
jhb |
Fix a file descriptor race I reintroduced when I split accept1() up into kern_accept() and accept1(). If another thread closed the new file descriptor and the first thread later got an error trying to copyout the socket address, then it would attempt to close the wrong file object. To fix, add a struct file ** argument to kern_accept(). If it is non-NULL, then on success kern_accept() will store a pointer to the new file object there and not release any of the references. It is up to the calling code to drop the references appropriately (including a call to fdclose() in case of error to safely handle the aforementioned race). While I'm at it, go ahead and fix the svr4 streams code to not leak the accept fd if it gets an error trying to copyout the streams structures.
|
#
160249 |
|
10-Jul-2006 |
jhb |
- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for use by ABI emulators. - Alter the interface of kern_recvit() somewhat. Specifically, go ahead and hard code UIO_USERSPACE in the uio as that's what all the callers specify. In place, add a new uioseg to indicate what type of pointer is in mp->msg_name. Previously it was always a userland address, but ABI emulators may pass in kernel-side sockaddrs. Also, remove the namelenp field and instead require the two places that used it to explicitly copy mp->msg_namelen out to userland. - Use the patched kern_recvit() to replace svr4_recvit() and the stock kern_sendit() to replace svr4_sendit(). - Use kern_bind() instead of stackgap use in ti_bind(). - Use kern_getpeername() and kern_getsockname() instead of stackgap in svr4_stream_ti_ioctl(). - Use kern_connect() instead of stackgap in svr4_do_putmsg(). - Use kern_getpeername() and kern_accept() instead of stackgap in svr4_do_getmsg(). - Retire the stackgap from SVR4 compat as it is no longer used.
|
#
160192 |
|
08-Jul-2006 |
jhb |
- Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumes that the 'data' pointer is already setup to point to a valid KVM buffer or contains the copied-in data from userland as appropriate (ioctl(2) still does this). kern_ioctl() takes care of looking up a file pointer, implementing FIONCLEX and FIOCLEX, and calling fi_ioctl(). - Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap and mark xenix_rdchk() MPSAFE.
|
#
160190 |
|
08-Jul-2006 |
jhb |
Add a kern_close() so that the ABIs can close a file descriptor w/o having to populate a close_args struct and change some of the places that do.
|
#
160187 |
|
08-Jul-2006 |
jhb |
Rework kern_semctl a bit to always assume the UIO_SYSSPACE case. This mostly consists of pushing a few copyin's and copyout's up into __semctl() as all the other callers were already doing the UIO_SYSSPACE case. This also changes kern_semctl() to set the return value in a passed in pointer to a register_t rather than td->td_retval[0] directly so that callers can only set td->td_retval[0] if all the various copyout's succeed.
As a result of these changes, kern_semctl() no longer does copyin/copyout (except for GETALL/SETALL) so simplify the locking to acquire the semakptr mutex before the MAC check and hold it all the way until the end of the big switch statement. The GETALL/SETALL cases have to temporarily drop it while they do copyin/malloc and copyout. Also, simplify the SETALL case to remove handling for a non-existent race condition.
|
#
160139 |
|
06-Jul-2006 |
jhb |
Add kern_setgroups() and kern_getgroups() and use them to implement ibcs2_[gs]etgroups() rather than using the stackgap. This also makes ibcs2_[gs]etgroups() MPSAFE. Also, it cleans up one bit of weirdness in the old setgroups() where it allocated an entire credential just so it had a place to copy the group list into. Now setgroups just allocates a NGROUPS_MAX array on the stack that it copies into and then passes to kern_setgroups().
|
#
159991 |
|
27-Jun-2006 |
jhb |
- Add a kern_semctl() helper function for __semctl(). It accepts a pointer to a copied-in copy of the 'union semun' and a uioseg to indicate which memory space the 'buf' pointer of the union points to. This is then used in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap. - Mark linux_ipc() and svr4_sys_semsys() MPSAFE.
|
#
159588 |
|
13-Jun-2006 |
jhb |
- Add a kern_kldload() that is most of the previous kldload() and push Giant down in it. - Push Giant down in kern_kldunload() and reorganize it slightly to avoid using gotos. Also, expose this function to the rest of the kernel.
|
#
156114 |
|
28-Feb-2006 |
ps |
Fix 32bit sendfile by implementing kern_sendfile so that it takes the header and trailers as iovec arguments instead of copying them in inside of sendfile.
Reviewed by: jhb MFC after: 3 weeks
|
#
155401 |
|
06-Feb-2006 |
jhb |
Add a kern_eaccess() function and use it to implement xenix_eaccess() rather than kern_access().
Suggested by: rwatson
|
#
151909 |
|
31-Oct-2005 |
ps |
Reformat socket control messages on input/output for 32bit compatibility on 64bit systems.
Submitted by: ps, ups Reviewed by: jhb
|
#
151359 |
|
15-Oct-2005 |
ps |
Implement the 32bit versions of recvmsg, recvfrom, sendmsg
Partially obtained from: jhb
|
#
151357 |
|
15-Oct-2005 |
ps |
Implement 32bit wrappers for clock_gettime, clock_settime, and clock_getres.
|
#
147813 |
|
07-Jul-2005 |
jhb |
- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().
PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
|
#
147302 |
|
11-Jun-2005 |
pjd |
Do not allocate memory based on not-checked argument from userland. It can be used to panic the kernel by giving too big value. Fix it by moving allocation and size verification into kern_getfsstat(). This even simplifies kern_getfsstat() consumers, but destroys symmetry - memory is allocated inside kern_getfsstat(), but has to be freed by the caller.
Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/ Reported by: Peter Holm <peter@holm.cc>
|
#
147178 |
|
09-Jun-2005 |
pjd |
Avoid code duplication in serval places by introducing universal kern_getfsstat() function.
Obtained from: jhb
|
#
146950 |
|
03-Jun-2005 |
ps |
Wrap copyin/copyout for kevent so the 32bit wrapper does not have to malloc nchanges * sizeof(struct kevent) AND/OR nevents * sizeof(struct kevent) on every syscall.
Glanced at by: peter, jmg Obtained from: Yahoo! MFC after: 2 weeks
|
#
144445 |
|
31-Mar-2005 |
jhb |
Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), kern_settimeofday(), and kern_writev() to allow for further stackgap reduction in the compat ABIs.
|
#
142933 |
|
01-Mar-2005 |
ps |
regen
|
#
141815 |
|
13-Feb-2005 |
sobomax |
Backout previous change (disabling of security checks for signals delivered in emulation layers), since it appears to be too broad.
Requested by: rwatson
|
#
141812 |
|
13-Feb-2005 |
sobomax |
Split out kill(2) syscall service routine into user-level and kernel part, the former is callable from user space and the latter from the kernel one. Make kernel version take additional argument which tells if the respective call should check for additional restrictions for sending signals to suid/sugid applications or not.
Make all emulation layers using non-checked version, since signal numbers in emulation layers can have different meaning that in native mode and such protection can cause misbehaviour.
As a result remove LIBTHR from the signals allowed to be delivered to a suid/sugid application.
Requested (sorta) by: rwatson MFC after: 2 weeks
|
#
141484 |
|
07-Feb-2005 |
jhb |
Implement a kern_pathconf() wrapper for pathconf() which can take the filename from either a user space or a kernel space pointer.
|
#
141471 |
|
07-Feb-2005 |
jhb |
- Tweak kern_msgctl() to return a copy of the requested message queue id structure in the struct pointed to by the 3rd argument for IPC_STAT and get rid of the 4th argument. The old way returned a pointer into the kernel array that the calling function would then access afterwards without holding the appropriate locks and doing non-lock-safe things like copyout() with the data anyways. This change removes that unsafeness and resulting race conditions as well as simplifying the interface. - Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(), fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of code duplication for freebsd4 and netbsd compatability system calls. - Add a new lookup function kern_alternate_path() that looks up a filename under an alternate prefix and determines which filename should be used. This is basically a more general version of linux_emul_convpath() that can be shared by all the ABIs thus allowing for further reduction of code duplication.
|
#
141029 |
|
30-Jan-2005 |
sobomax |
Extend kern_sendit() to take another enum uio_seg argument, which specifies where the buffer to send lies and use it to eliminate yet another stackgap in linuxlator.
MFC after: 2 weeks
|
#
140992 |
|
29-Jan-2005 |
sobomax |
o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space;
o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386.
Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
|
#
140839 |
|
25-Jan-2005 |
sobomax |
Split out kernel side of msgctl(2) into two parts: the first that pops data from the userland and pushes results back and the second which does actual processing. Use the latter to eliminate stackgap in the linux wrapper of that syscall.
MFC after: 2 weeks
|
#
140836 |
|
25-Jan-2005 |
sobomax |
More kern_{get,set}itiver() where they belong.
Submitted by: dwmalone MFC after: 2 weeks
|
#
140483 |
|
19-Jan-2005 |
ps |
move kern_nanosleep to sys/syscallsubr.h
Requested by: jhb
|
#
139825 |
|
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
#
139739 |
|
05-Jan-2005 |
jhb |
- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.
|
#
136221 |
|
07-Oct-2004 |
davidxu |
Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX.
Reviewed by: deischen
|
#
136152 |
|
05-Oct-2004 |
jhb |
Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand.
- Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console.
Submitted by: bde (mostly) MFC after: 1 month
|
#
135764 |
|
24-Sep-2004 |
jhb |
Sort forward declared structures.
|
#
132313 |
|
17-Jul-2004 |
dwmalone |
Add a kern_setsockopt and kern_getsockopt which can read the option values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.
|
#
122088 |
|
04-Nov-2003 |
fjoe |
Back out the following revisions:
1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h
That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).
Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
MFC after: 1 week
|
#
114749 |
|
05-May-2003 |
dwmalone |
Split sendit into two parts. The first part, still called sendit, that does the copyin stuff and then calls the second part kern_sendit to do the hard work. Don't bother holding Giant during the copyin phase.
The intent of this is to allow the Linux emulator to impliment send* syscalls without using the stackgap.
|
#
114724 |
|
05-May-2003 |
mbr |
Change the semantics of sysv shm emulation to take a additional argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}. The BSD semantics didn't allow the usage of shared segment after being marked for removal through IPC_RMID.
The patch involves the following functions: - shmat - shmctl - shm_find_segment_by_shmid - shm_find_segment_by_shmidx - linux_shmat - linux_shmctl
Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it> Reviewed by: marcel
|
#
113685 |
|
18-Apr-2003 |
jhb |
Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol so that it can be used by binary emulators.
|
#
110294 |
|
03-Feb-2003 |
ume |
Break out the bind and connect syscalls to intend to make calling these syscalls internally easy. This is preparation for force coming IPv6 support for Linuxlator.
Submitted by: dwmalone MFC after: 10 days
|
#
105950 |
|
25-Oct-2002 |
peter |
Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound.
Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *' to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.
Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
|
#
102946 |
|
04-Sep-2002 |
iedowse |
Split up ptrace() into a wrapper that does the copying to and from user space and a kern_ptrace() implementation. Use the kern_*() version in the Linux emulation code to remove more stack gap uses.
Approved by: des
|
#
102870 |
|
02-Sep-2002 |
iedowse |
Split up __getcwd so that kernel callers of the internal version can specify whether the buffer is in user or system space.
|
#
102868 |
|
02-Sep-2002 |
iedowse |
Split fcntl() into a wrapper and a kernel-callable kern_fcntl() implementation. The wrapper is responsible for copying additional structure arguments (struct flock) to and from userland.
|
#
102779 |
|
01-Sep-2002 |
iedowse |
Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap.
Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel
|